当前位置:网站首页>web crawler error
web crawler error
2022-08-10 02:53:00 【bamboogz99】
When using the request method in urllib, the system returns HTTPerror, but no error code is given
Re-wrote a piece of code to display specific errors:
# exception handlingfrom urllib import request,errortry:response = urllib.request.urlopen('https://movie.douban.com/top250')except error.HTTPError as e:print(e.reason,e.code,e.headers,sep='\n') # Use httperror to judge
I visited Douban here, and the result returned error 418. I checked that it was anti-crawling.
Processing method: Instead of requesting the entire webpage at one time, add the header option and only read the header, as follows:
The second question is, how to read the information of multiple pages. At this time, through observation, we know that the page link of douban contains page number information, and the for loop can be used to match the page number:
边栏推荐
猜你喜欢
随机推荐
进程管理和任务管理
[Turn] Typora_Markdown_ picture title (caption)
781. 森林中的兔子
JCMsuite—单模光纤传播模式
Open3D 中点细分(网格细分)
OOD论文:Revisit Overconfidence for OOD Detection
OpenCV图像处理学习三,Mat对象构造函数与常用方法
多线程之自定义线程池
grafana9配置邮箱告警
Screen 拆分屏幕
Unity3D创建道路插件EasyRoads的使用
小程序开发的报价为什么有差别?需要多少钱?
FusionConpute虚拟机的发放与管理
【wpf】拖拽的简单实现
具有多孔光纤的偏振分束器
DP 优化方法合集
浏览器中location详解
数组(一)
免费文档翻译软件电脑版软件
[Syntax sugar] About the mapping of category strings to category numeric ids