当前位置:网站首页>The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
2022-04-23 08:48:00 【Dried fish_】
Parsing web pages while writing crawlers , The most used parsing method is xpath analysis , But in use, in use xpath When parsing , I wrote it myself xpath The statement is correct , But the return value is still empty
The reason is usually some anti climbing measures taken by the front end , When writing a web page, you usually omit a layer of tags , But the omitted tags will be automatically added by the browser , Change to the correct structure ..
When we check through the browser , The code structure you see has been modified by the browser , What the crawler gets is the source code
So according to the revised xpath Parsing the source code will not find the corresponding element , The return value is naturally empty
give an example
Browser modified code
xpath sentence '/html/body/div[5]/div[3]/div[2]/table/tbody/tr[1]/td[2]/a/@href'
Source code
Lack of one tbody label ,
xpath/html/body/div[5]/div[3]/div[2]/table/tr[1]/td[2]/a/@href
take taody Delete
summary When using xpath When the corresponding element cannot be obtained , Look at the source code structure , Analyze according to the source code
版权声明
本文为[Dried fish_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230846268070.html
边栏推荐
- Type anonyme (Principes fondamentaux du Guide c)
- RCC introduction of Hal Library
- L2-024 tribe (25 points) (and check the collection)
- How much inventory recording does the intelligent system of external call system of okcc call center need?
- L2-022 rearrange linked list (25 points) (map + structure simulation)
- HAL库的RCC简介
- STM32 uses Hal library. The overall structure and function principle are introduced
- 还原二叉树 (25 分)
- MySQL查询两张表属性值非重复的数据
- uni-app和微信小程序中的getCurrentPages()
猜你喜欢
After a circle, I sorted out this set of interview questions..
Automatic differentiation and higher order derivative in deep learning framework
valgrind和kcachegrind使用運行分析
2022-04-22 OpenEBS云原生存储
PCTP考试经验分享
洋桃电子STM32物联网入门30步笔记二、CubeIDE下载、安装、汉化、设置
Notes on 30 steps of introduction to Internet of things of yangtao electronics STM32 III. Explanation of new cubeide project and setting
PLC的点表(寄存器地址和点表定义)破解探测方案--方便工业互联网数据采集
Enterprise wechat application authorization / silent login
MATLAB入门资料
随机推荐
L2-024 部落 (25 分)(并查集)
How much inventory recording does the intelligent system of external call system of okcc call center need?
Judgment on heap (25 points) two insertion methods
增强现实技术是什么?能用在哪些地方?
完全二叉搜索树 (30 分)
L2-3 浪漫侧影 (25 分)
Concave hull acquisition method based on convex hull of point cloud
请问中衍期货安全靠谱吗?
2021 Li Hongyi's adaptive learning rate of machine learning
Failed to prepare device for development
PCTP考试经验分享
Consensus Token:web3.0生态流量的超级入口
单片机数码管秒表
valgrind和kcachegrind使用運行分析
Consensus Token:web3. 0 super entrance of ecological flow
Restore binary tree (25 points)
00后最关注的职业:公务员排第二,第一是?
On time atom joins hands with oneos live broadcast, and the oneos system tutorial is fully launched
Basic usage of synchronized locks
企业微信应用授权/静默登录