当前位置:网站首页>Scrapy modifies the time in the statistics at the end of the crawler as the current system time
Scrapy modifies the time in the statistics at the end of the crawler as the current system time
2022-04-23 07:47:00 【Brother Bing】
Scrapy Modify the time in the statistics at the end of the crawler to the current system time
One 、 The problem background
scrapy At the end of each run, a pile of statistical information will be displayed , Among them, there are statistical time data , however !!! What time was that UTC Time (0 The time zone ), It's not the system local time we're used to , And the total running time of the crawler inside is calculated in seconds , Not in line with our daily habits , So I flipped scrapy Source code , Find the relevant content and rewrite it , Feeling ok , Ladies and gentlemen, take it with you !
Two 、 Problem analysis
Through log information , Find the corresponding class that counts the running time of the crawler :scrapy.extensions.corestats.CoreStats
- The log information is displayed as follows :
# Extended configuration 2021-05-10 10:43:50 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', # Signal collector , There is information about the running time of the crawler 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] # Statistics 2021-05-10 10:44:10 [scrapy.statscollectors] INFO: Dumping Scrapy stats: { 'downloader/exception_count': 3, 'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 2, 'downloader/exception_type_count/twisted.internet.error.TimeoutError': 1, 'downloader/request_bytes': 1348, 'downloader/request_count': 4, 'downloader/request_method_count/GET': 4, 'downloader/response_bytes': 10256, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 18.806005, # The total time taken for the crawler to run 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 5, 10, 2, 44, 10, 418573), # Reptile end time 'httpcompression/response_bytes': 51138, 'httpcompression/response_count': 1, 'log_count/INFO': 10, 'response_received_count': 1, 'scheduler/dequeued': 4, 'scheduler/dequeued/memory': 4, 'scheduler/enqueued': 4, 'scheduler/enqueued/memory': 4, 'start_time': datetime.datetime(2021, 5, 10, 2, 43, 51, 612568)} # Reptile start time 2021-05-10 10:44:10 [scrapy.core.engine] INFO: Spider closed (finished) - The screenshot of the source code is as follows :

3、 ... and 、 resolvent
-
rewrite
CoreStatsclass# -*- coding: utf-8 -*- # Rewrite the signal collector import time from scrapy.extensions.corestats import CoreStats class MyCoreStats(CoreStats): def spider_opened(self, spider): """ The crawler starts running """ self.start_time = time.time() start_time_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(self.start_time)) # Convert format self.stats.set_value(' Reptile start time : ', start_time_str, spider=spider) def spider_closed(self, spider, reason): """ The crawler finished running """ # Reptile end time finish_time = time.time() # Convert time format finish_time_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(finish_time)) # Calculate the total running time of the crawler elapsed_time = finish_time - self.start_time m, s = divmod(elapsed_time, 60) h, m = divmod(m, 60) self.stats.set_value(' Reptile end time : ', finish_time_str, spider=spider) self.stats.set_value(' The total time taken for the crawler to run : ', '%d when :%02d branch :%02d second ' % (h, m, s), spider=spider) self.stats.set_value(' Reptile end reason : ', reason, spider=spider) -
Modify profile information
EXTENSIONS = { 'scrapy.extensions.corestats.CoreStats': None, # Disable the default data collector ' Project name .extensions.corestats.MyCoreStats': 500, # Custom collector enabled signal }
Four 、 Effect display
2021-05-10 11:11:03 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{
'downloader/exception_count': 5,
'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 3,
'downloader/exception_type_count/twisted.internet.error.TimeoutError': 2,
'downloader/request_bytes': 1976,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'downloader/response_bytes': 10266,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'httpcompression/response_bytes': 51139,
'httpcompression/response_count': 1,
'log_count/INFO': 10,
'response_received_count': 1,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
' Reptile end reason ': 'finished',
' Reptile start time : ': '2021-05-10 11:10:39',
' Reptile end time : ': '2021-05-10 11:11:03',
' The total time taken for the crawler to run : ': '0 when :00 branch :24 second '}
版权声明
本文为[Brother Bing]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230625585327.html
边栏推荐
- 根据某一指定的表名、列名及列值来向前或向后N条查相关列值的SQL自定义标量值函数
- Mvcc (multi version concurrency control)
- unity UGUI判断点击在UI上和3D物体上的解决方案
- SAP CR传输请求顺序、依赖检查
- c#读取INI文件和向ini文件写入数据
- Learn to use search engines
- Nodejs (II) read files synchronously and asynchronously
- js之DOM学习获取元素
- Robust and Efficient Quadrotor Trajectory Generation for Fast Autonomous Flight
- Unity ugui determines the solution of clicking on the UI and 3D objects
猜你喜欢

Redis connection error err auth < password > called without any password configured for the default user

'NPM' is not an internal or external command, nor is it a runnable program or batch file

ABAP 7.4 SQL Window Expression

js之什么是事件?事件三要素以及操作元素

SAP PI/PO功能运行状态监控检查

Implementation of MySQL persistence

命令行参数传递库argparse的使用

SAP PI/PO rfc2RESTful 發布rfc接口為RESTful示例(Proxy間接法)

SAP 03-AMDP CDS Table Function using ‘WITH‘ Clause(Join子查询内容)

js之节点操作,为什么要学习节点操作
随机推荐
ABAP 从CDS VIEW 发布OData Service示例
H5 local storage data sessionstorage, localstorage
Mvcc (multi version concurrency control)
【NLP笔记】CRF原理初探
instanceof的实现原理
将单行文字自动适应到目标矩形框内
大学学习路线规划建议贴
驼峰命名对像
C# 多个矩形围成的多边形标注位置的问题
Install and configure Taobao image NPM (cnpm)
系统与软件安全研究(四)
Unityshader Foundation
反转链表练习
King glory - unity learning journey
js之函数的两种声明方式
unity 屏幕自适应
C# SmoothProgressBar自定义进度条控件
How to judge whether a point is within a polygon (including complex polygons or a large number of polygons)
事件系统(二)多播事件
One of event management