当前位置:网站首页>Scrapy modifies the time in the statistics at the end of the crawler as the current system time
Scrapy modifies the time in the statistics at the end of the crawler as the current system time
2022-04-23 07:47:00 【Brother Bing】
Scrapy Modify the time in the statistics at the end of the crawler to the current system time
One 、 The problem background
scrapy
At the end of each run, a pile of statistical information will be displayed , Among them, there are statistical time data , however !!! What time was that UTC
Time (0 The time zone ), It's not the system local time we're used to , And the total running time of the crawler inside is calculated in seconds , Not in line with our daily habits , So I flipped scrapy
Source code , Find the relevant content and rewrite it , Feeling ok , Ladies and gentlemen, take it with you !
Two 、 Problem analysis
Through log information , Find the corresponding class that counts the running time of the crawler :scrapy.extensions.corestats.CoreStats
- The log information is displayed as follows :
# Extended configuration 2021-05-10 10:43:50 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', # Signal collector , There is information about the running time of the crawler 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] # Statistics 2021-05-10 10:44:10 [scrapy.statscollectors] INFO: Dumping Scrapy stats: { 'downloader/exception_count': 3, 'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 2, 'downloader/exception_type_count/twisted.internet.error.TimeoutError': 1, 'downloader/request_bytes': 1348, 'downloader/request_count': 4, 'downloader/request_method_count/GET': 4, 'downloader/response_bytes': 10256, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 18.806005, # The total time taken for the crawler to run 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 5, 10, 2, 44, 10, 418573), # Reptile end time 'httpcompression/response_bytes': 51138, 'httpcompression/response_count': 1, 'log_count/INFO': 10, 'response_received_count': 1, 'scheduler/dequeued': 4, 'scheduler/dequeued/memory': 4, 'scheduler/enqueued': 4, 'scheduler/enqueued/memory': 4, 'start_time': datetime.datetime(2021, 5, 10, 2, 43, 51, 612568)} # Reptile start time 2021-05-10 10:44:10 [scrapy.core.engine] INFO: Spider closed (finished)
- The screenshot of the source code is as follows :
3、 ... and 、 resolvent
-
rewrite
CoreStats
class# -*- coding: utf-8 -*- # Rewrite the signal collector import time from scrapy.extensions.corestats import CoreStats class MyCoreStats(CoreStats): def spider_opened(self, spider): """ The crawler starts running """ self.start_time = time.time() start_time_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(self.start_time)) # Convert format self.stats.set_value(' Reptile start time : ', start_time_str, spider=spider) def spider_closed(self, spider, reason): """ The crawler finished running """ # Reptile end time finish_time = time.time() # Convert time format finish_time_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(finish_time)) # Calculate the total running time of the crawler elapsed_time = finish_time - self.start_time m, s = divmod(elapsed_time, 60) h, m = divmod(m, 60) self.stats.set_value(' Reptile end time : ', finish_time_str, spider=spider) self.stats.set_value(' The total time taken for the crawler to run : ', '%d when :%02d branch :%02d second ' % (h, m, s), spider=spider) self.stats.set_value(' Reptile end reason : ', reason, spider=spider)
-
Modify profile information
EXTENSIONS = { 'scrapy.extensions.corestats.CoreStats': None, # Disable the default data collector ' Project name .extensions.corestats.MyCoreStats': 500, # Custom collector enabled signal }
Four 、 Effect display
2021-05-10 11:11:03 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{
'downloader/exception_count': 5,
'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 3,
'downloader/exception_type_count/twisted.internet.error.TimeoutError': 2,
'downloader/request_bytes': 1976,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'downloader/response_bytes': 10266,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'httpcompression/response_bytes': 51139,
'httpcompression/response_count': 1,
'log_count/INFO': 10,
'response_received_count': 1,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
' Reptile end reason ': 'finished',
' Reptile start time : ': '2021-05-10 11:10:39',
' Reptile end time : ': '2021-05-10 11:11:03',
' The total time taken for the crawler to run : ': '0 when :00 branch :24 second '}
版权声明
本文为[Brother Bing]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230625585327.html
边栏推荐
- Moment. Format of format method function in JS
- SAP SALV14 后台输出SALV数据可直接保存文件,发送Email(带排序、超链接、筛选格式)
- C#使用拉依达准则(3σ准则)剔除异常数据(.Net剔除一组数据中的奇异值)
- MySQL isolation level
- Thorough inquiry -- understanding and analysis of cocos2d source code
- js之DOM学习获取元素
- int a = 1存放在哪
- Django uses MySQL database to solve error reporting
- 3. Sort statement
- SampleCameraFilter
猜你喜欢
Dropping Pixels for Adversarial Robustness
中间人环境mitmproxy搭建
js之什么是事件?事件三要素以及操作元素
H5 local storage data sessionstorage, localstorage
SQL针对字符串型数字进行排序
ABAP 7.4 SQL Window Expression
js之DOM事件
The page displays the current time in real time
Rethink | open the girl heart mode of station B and explore the design and implementation of APP skin changing mechanism
SAP pi / PO rfc2restful publishing RFC interface is a restful example (proxy indirect method)
随机推荐
SAP Excel 已完成文件级验证和修复。此工作簿的某些部分可能已被修复或丢弃。
Understanding of STL container
C operation registry full introduction
SAP RFC_CVI_EI_INBOUND_MAIN BP主数据创建示例(仅演示客户)
Two threads print odd and even numbers interactively
Dropping Pixels for Adversarial Robustness
C# 多个矩形围成的多边形标注位置的问题
Unity screen adaptation
系统与软件安全研究(三)
King glory - unity learning journey
SampleCameraFilter
Install and configure Taobao image NPM (cnpm)
Methods of database query optimization
将指定路径下的所有SVG文件导出成PNG等格式的图片(缩略图或原图大小)
js之DOM学习三种创建元素的方式
The difference and application of VR, AR and MR, as well as some implementation principles of AR technology
给定区段范围内字符串自生成代码
事件系统(二)多播事件
TimelineWindow
Window10版MySQL设置远程访问权限后不起效果