当前位置:网站首页>From simple to deep: learning journey of Xiaobai reptile (1)
From simple to deep: learning journey of Xiaobai reptile (1)
2022-04-22 06:13:00 【Pick your feet with vegetables】
Catalog
1. Google browser developer mode
2.requests Library to get static web page data
3.requests Library to get dynamic web page information
1. Google browser developer mode
How to open :F12/ Right click to check ( Or shortcut keys CTRL+Shift+i) Call out
The four most commonly used function modules of developer tools : Elements , Console , Source code , The Internet .
Elements (Elements): For viewing HTML Elements
The Internet (network): Web pages are mainly used to view header And other information related to network connection .
2.requests Library to get static web page data
requests library

Generate request :

In the use of get After obtaining the network request, you will get a response object
response Object's 5 Attributes :

Get Method to obtain online resources, there will be the following two states :

- Request header Settings
When visiting the website, if the website reports 400 Error of , It shows that the website has anti crawler strategy , Usually by setting headers Medium USER-Agent solve (header It's in the form of a dictionary , There are Key, And corresponding Value value ):

- Timeout Set up
Purpose : Avoid permanent loss of response due to waiting for a response from the server .( The setting should be reasonable , Reduce waiting time )
- Generate complete HTTP request
complete HTTP request : link , Request header , Timeout time , Status code , Correct coding format
3.requests Library to get dynamic web page information
- Static web page : Web content and HTML The source code is the same
- Dynamic web pages : Web content and HTML The source code is inconsistent

How to judge the type of web page :
- Right click , View page source code
- F12, Developer tools to view the source code
if 1,2 The same is a static web page , if 1,2 The difference is dynamic web pages
Determine the type of web page before obtaining web page data , Dynamic web page data acquisition needs to find the correct web page address .
4.urllib Get web data
- urllib library
urllib Kuo is python Built in HTTP Request Library

urllib Library in python2 and python3 It's different
stay python2 in :import urllib2
stay python3 in :import urllib.request
- urllib Library crawls a web page
- urlopen() Method parameters
use urlopen() Method to crawl web pages
- urlopen() Use of parameters

- data Parameters :
Must be a bytes object
Must be in standard format , Use urllib.parse.urlencode() Change the format
The default value is None
send out post When asked ,data Parameters must have
- timeout Parameters :
Timeout time
The unit is in seconds
When using agents , Check the status of the agent
版权声明
本文为[Pick your feet with vegetables]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220538522637.html
边栏推荐
猜你喜欢
随机推荐
Markdown 语法支持测试
QQueue使用介绍
Daily learning records - reading custom data sets
Jeecgboot online form development - control configuration
转义符\ 数据格式的拼接
主流SQL查询多条只取其中最新的一条数据
通过js创建单元格(while循环)
Geojson file ShapeFile file batch conversion gadget
jeecgboot-online在线开发2
列表渲染的三种方式
线程内容学习
Modbus Protocol
两指针相加?(合法or不合法)
opencv近期学习测试代码
指针所指的地址值及指针所指对象的值(学习笔记)
wbpack配置 生产-开发 环境
Ocilib library connected to Oracle
Rtl8367 learning note 1 - Basics
Oracle uses C language to write custom functions
高考志愿填报参考









