当前位置:网站首页>The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
2022-04-23 08:48:00 【Dried fish_】
Parsing web pages while writing crawlers , The most used parsing method is xpath analysis , But in use, in use xpath When parsing , I wrote it myself xpath The statement is correct , But the return value is still empty
The reason is usually some anti climbing measures taken by the front end , When writing a web page, you usually omit a layer of tags , But the omitted tags will be automatically added by the browser , Change to the correct structure ..
When we check through the browser , The code structure you see has been modified by the browser , What the crawler gets is the source code
So according to the revised xpath Parsing the source code will not find the corresponding element , The return value is naturally empty
give an example
Browser modified code
xpath sentence '/html/body/div[5]/div[3]/div[2]/table/tbody/tr[1]/td[2]/a/@href'
Source code
Lack of one tbody label ,
xpath/html/body/div[5]/div[3]/div[2]/table/tr[1]/td[2]/a/@href
take taody Delete
summary When using xpath When the corresponding element cannot be obtained , Look at the source code structure , Analyze according to the source code
版权声明
本文为[Dried fish_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230846268070.html
边栏推荐
- idea底栏打开services
- Cadence process angle simulation, Monte Carlo simulation, PSRR
- Yangtao electronic STM32 Internet of things entry 30 step notes II. Cube ide download, installation, sinicization and setting
- Go语言自学系列 | golang结构体指针
- LLVM之父Chris Lattner:编译器的黄金时代
- L2-023 图着色问题 (25 分)(图的遍历)
- 玩转二叉树 (25 分)
- RCC introduction of Hal Library
- 根据字节码获取类的绝对路径
- 洋桃电子STM32物联网入门30步笔记一、HAL库和标准库的区别
猜你喜欢
Yangtao electronic STM32 Internet of things entry 30 step notes IV. engineering compilation and download
K210学习笔记(二) K210与STM32进行串口通信
PCTP考试经验分享
L2-024 部落 (25 分)(并查集)
面了一圈,整理了这套面试题。。
flask项目跨域拦截处理以及dbm数据库学习【包头文创网站开发】
经典题目刷一刷
cadence的工艺角仿真、蒙特卡洛仿真、PSRR
IDEA导入commons-logging-1.2.jar包
On time atom joins hands with oneos live broadcast, and the oneos system tutorial is fully launched
随机推荐
2021李宏毅机器学习之Adaptive Learning Rate
基于点云凸包的凹包获取方法
《深度学习》学习笔记(八)
错误: 找不到或无法加载主类
valgrind和kcachegrind使用运行分析
L2-022 重排链表 (25 分)(map+结构体模拟)
请问中衍期货安全靠谱吗?
根据字节码获取类的绝对路径
微信:获取单个标签所有人
Illegal character in scheme name at index 0:
Idea import commons-logging-1.2 Jar package
Talent Plan 学习营初体验:交流+坚持 开源协作课程学习的不二路径
Flink同时读取mysql与pgsql程序会卡住且没有日志
K210 learning notes (II) serial communication between k210 and stm32
Consensus Token:web3.0生态流量的超级入口
测试你的机器学习流水线
1099 建立二叉搜索树 (30 分)
求简单类型的矩阵和
企业微信应用授权/静默登录
Chris LATTNER, father of llvm: the golden age of compilers