当前位置：网站首页>web crawler error

web crawler error

2022-08-10 02:53:00 【bamboogz99】

When using the request method in urllib, the system returns HTTPerror, but no error code is given

Re-wrote a piece of code to display specific errors:

# exception handlingfrom urllib import request,errortry:response = urllib.request.urlopen('https://movie.douban.com/top250')except error.HTTPError as e:print(e.reason,e.code,e.headers,sep='\n') # Use httperror to judge

I visited Douban here, and the result returned error 418. I checked that it was anti-crawling.

Processing method: Instead of requesting the entire webpage at one time, add the header option and only read the header, as follows:

The second question is, how to read the information of multiple pages. At this time, through observation, we know that the page link of douban contains page number information, and the for loop can be used to match the page number:

原网站

版权声明
本文为[bamboogz99]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/222/202208100134157094.html

当前位置：网站首页>web crawler error

web crawler error

边栏推荐

猜你喜欢

随机推荐