当前位置:网站首页>Scrapy modifies the time in the statistics at the end of the crawler as the current system time
Scrapy modifies the time in the statistics at the end of the crawler as the current system time
2022-04-23 07:47:00 【Brother Bing】
Scrapy Modify the time in the statistics at the end of the crawler to the current system time
One 、 The problem background
scrapy
At the end of each run, a pile of statistical information will be displayed , Among them, there are statistical time data , however !!! What time was that UTC
Time (0 The time zone ), It's not the system local time we're used to , And the total running time of the crawler inside is calculated in seconds , Not in line with our daily habits , So I flipped scrapy
Source code , Find the relevant content and rewrite it , Feeling ok , Ladies and gentlemen, take it with you !
Two 、 Problem analysis
Through log information , Find the corresponding class that counts the running time of the crawler :scrapy.extensions.corestats.CoreStats
- The log information is displayed as follows :
# Extended configuration 2021-05-10 10:43:50 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', # Signal collector , There is information about the running time of the crawler 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] # Statistics 2021-05-10 10:44:10 [scrapy.statscollectors] INFO: Dumping Scrapy stats: { 'downloader/exception_count': 3, 'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 2, 'downloader/exception_type_count/twisted.internet.error.TimeoutError': 1, 'downloader/request_bytes': 1348, 'downloader/request_count': 4, 'downloader/request_method_count/GET': 4, 'downloader/response_bytes': 10256, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 18.806005, # The total time taken for the crawler to run 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 5, 10, 2, 44, 10, 418573), # Reptile end time 'httpcompression/response_bytes': 51138, 'httpcompression/response_count': 1, 'log_count/INFO': 10, 'response_received_count': 1, 'scheduler/dequeued': 4, 'scheduler/dequeued/memory': 4, 'scheduler/enqueued': 4, 'scheduler/enqueued/memory': 4, 'start_time': datetime.datetime(2021, 5, 10, 2, 43, 51, 612568)} # Reptile start time 2021-05-10 10:44:10 [scrapy.core.engine] INFO: Spider closed (finished)
- The screenshot of the source code is as follows :
3、 ... and 、 resolvent
-
rewrite
CoreStats
class# -*- coding: utf-8 -*- # Rewrite the signal collector import time from scrapy.extensions.corestats import CoreStats class MyCoreStats(CoreStats): def spider_opened(self, spider): """ The crawler starts running """ self.start_time = time.time() start_time_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(self.start_time)) # Convert format self.stats.set_value(' Reptile start time : ', start_time_str, spider=spider) def spider_closed(self, spider, reason): """ The crawler finished running """ # Reptile end time finish_time = time.time() # Convert time format finish_time_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(finish_time)) # Calculate the total running time of the crawler elapsed_time = finish_time - self.start_time m, s = divmod(elapsed_time, 60) h, m = divmod(m, 60) self.stats.set_value(' Reptile end time : ', finish_time_str, spider=spider) self.stats.set_value(' The total time taken for the crawler to run : ', '%d when :%02d branch :%02d second ' % (h, m, s), spider=spider) self.stats.set_value(' Reptile end reason : ', reason, spider=spider)
-
Modify profile information
EXTENSIONS = { 'scrapy.extensions.corestats.CoreStats': None, # Disable the default data collector ' Project name .extensions.corestats.MyCoreStats': 500, # Custom collector enabled signal }
Four 、 Effect display
2021-05-10 11:11:03 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{
'downloader/exception_count': 5,
'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 3,
'downloader/exception_type_count/twisted.internet.error.TimeoutError': 2,
'downloader/request_bytes': 1976,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'downloader/response_bytes': 10266,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'httpcompression/response_bytes': 51139,
'httpcompression/response_count': 1,
'log_count/INFO': 10,
'response_received_count': 1,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
' Reptile end reason ': 'finished',
' Reptile start time : ': '2021-05-10 11:10:39',
' Reptile end time : ': '2021-05-10 11:11:03',
' The total time taken for the crawler to run : ': '0 when :00 branch :24 second '}
版权声明
本文为[Brother Bing]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230625585327.html
边栏推荐
猜你喜欢
SQL针对字符串型数字进行排序
Simple random roll call lottery (written under JS)
Dropping Pixels for Adversarial Robustness
Install and configure Taobao image NPM (cnpm)
Teach-Repeat-Replan: A Complete and Robust System for Aggressive Flight in Complex Environments
js之什么是事件?事件三要素以及操作元素
Implementation of MySQL persistence
对js中argumens的简单理解
SAP pi / PO rfc2soap publishes RFC interface as WS example
Use of command line parameter passing library argparse
随机推荐
SVG中年月日相关的表达式
[self motivation series] what really hinders you?
NodeJS(六) 子进程操作
SAP ECC连接SAP PI系统配置
C# 多个矩形围成的多边形标注位置的问题
系统与软件安全研究(五)
Dropping Pixels for Adversarial Robustness
SAP Excel 已完成文件级验证和修复。此工作簿的某些部分可能已被修复或丢弃。
系统与软件安全研究(四)
设置了body的最大宽度,但是为什么body的背景颜色还铺满整个页面?
Two threads print odd and even numbers interactively
SAP RFC_CVI_EI_INBOUND_MAIN BP主数据创建示例(仅演示客户)
Page dynamic display time (upgraded version)
利用网页表格导出EXCEL表格加线框及表格内部间距的问题
Methods of database query optimization
防抖和节流
SAP SALV14 后台输出SALV数据可直接保存文件,发送Email(带排序、超链接、筛选格式)
C operation registry full introduction
Samplecamerafilter
二叉树的深度