当前位置:网站首页>web crawler error
web crawler error
2022-08-10 02:53:00 【bamboogz99】
When using the request method in urllib, the system returns HTTPerror, but no error code is given
Re-wrote a piece of code to display specific errors:
# exception handlingfrom urllib import request,errortry:response = urllib.request.urlopen('https://movie.douban.com/top250')except error.HTTPError as e:print(e.reason,e.code,e.headers,sep='\n') # Use httperror to judge I visited Douban here, and the result returned error 418. I checked that it was anti-crawling.
Processing method: Instead of requesting the entire webpage at one time, add the header option and only read the header, as follows:

The second question is, how to read the information of multiple pages. At this time, through observation, we know that the page link of douban contains page number information, and the for loop can be used to match the page number:

边栏推荐
猜你喜欢
随机推荐
UXDB现在支持函数索引吗?
多线程之享元模式和final原理
Fusion Compute网络虚拟化
16. 最接近的三数之和
数据库治理利器:动态读写分离
Visual low-code system practice based on design draft identification
卷积神经网络识别验证码
月薪35K,靠八股文就能做到的事,你居然不知道
odoo公用变量或数组的使用
数据治理(五):元数据管理
FILE结构体在stdio.h头文件源码里的详细代码
【每日一题】1413. 逐步求和得到正数的最小值
FusionCompute产品介绍
微透镜阵列后光传播的研究
flask增删改查
Unity碰撞和触发
《GB39732-2020》PDF download
多线程之自定义线程池
你有对象类,我有结构体,Go lang1.18入门精炼教程,由白丁入鸿儒,go lang结构体(struct)的使用EP06
grafana9配置邮箱告警









