当前位置:网站首页>The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
2022-04-23 08:48:00 【Dried fish_】
Parsing web pages while writing crawlers , The most used parsing method is xpath analysis , But in use, in use xpath When parsing , I wrote it myself xpath The statement is correct , But the return value is still empty
The reason is usually some anti climbing measures taken by the front end , When writing a web page, you usually omit a layer of tags , But the omitted tags will be automatically added by the browser , Change to the correct structure ..
When we check through the browser , The code structure you see has been modified by the browser , What the crawler gets is the source code
So according to the revised xpath Parsing the source code will not find the corresponding element , The return value is naturally empty
give an example
Browser modified code
xpath sentence '/html/body/div[5]/div[3]/div[2]/table/tbody/tr[1]/td[2]/a/@href'
Source code
Lack of one tbody label ,
xpath/html/body/div[5]/div[3]/div[2]/table/tr[1]/td[2]/a/@href
take taody Delete
summary When using xpath When the corresponding element cannot be obtained , Look at the source code structure , Analyze according to the source code
版权声明
本文为[Dried fish_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230846268070.html
边栏推荐
- Restore binary tree (25 points)
- 应纳税所得额
- Consensus Token:web3.0生态流量的超级入口
- LaTeX论文排版操作
- 错误: 找不到或无法加载主类
- Yangtao electronic STM32 Internet of things entry 30 step notes IV. engineering compilation and download
- idea底栏打开services
- Complete binary search tree (30 points)
- L2-023 graph coloring problem (25 points) (graph traversal)
- mycat配置
猜你喜欢
L2-022 rearrange linked list (25 points) (map + structure simulation)
BK3633 规格书
论文阅读《Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry》
Yangtao electronic STM32 Internet of things entry 30 step notes IV. engineering compilation and download
正点原子携手OneOS直播 OneOS系统教程全面上线
valgrind和kcachegrind使用运行分析
在sqli-liabs学习SQL注入之旅(第十一关~第二十关)
JVM工具之Arthas使用
2021李宏毅机器学习之Adaptive Learning Rate
Notes on 30 steps of introduction to Internet of things of yangtao electronics STM32 III. Explanation of new cubeide project and setting
随机推荐
洋桃电子STM32物联网入门30步笔记二、CubeIDE下载、安装、汉化、设置
扣缴义务人
还原二叉树 (25 分)
Study notes of deep learning (8)
Idea import commons-logging-1.2 Jar package
Star Trek强势来袭 开启元宇宙虚拟与现实的梦幻联动
LLVM之父Chris Lattner:编译器的黄金时代
uni-app和微信小程序中的getCurrentPages()
PLC point table (register address and point table definition) cracking detection scheme -- convenient for industrial Internet data acquisition
Learn SQL injection in sqli liabs (Level 11 ~ level 20)
经典题目刷一刷
php基于哈希算法出现的强弱比较漏洞
Wechat: get the owner of a single tag
《深度学习》学习笔记(八)
lgb,xgb,cat k折交叉验证
Valgrind et kcachegrind utilisent l'analyse d'exécution
Failed to prepare device for development
rembg 分割mask
L2-3 浪漫侧影 (25 分)
RCC introduction of Hal Library