当前位置:网站首页>The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
The crawler returns null when parsing with XPath. The reason why the crawler cannot get the corresponding element and the solution
2022-04-23 08:48:00 【Dried fish_】
Parsing web pages while writing crawlers , The most used parsing method is xpath analysis , But in use, in use xpath When parsing , I wrote it myself xpath The statement is correct , But the return value is still empty

The reason is usually some anti climbing measures taken by the front end , When writing a web page, you usually omit a layer of tags , But the omitted tags will be automatically added by the browser , Change to the correct structure ..
When we check through the browser , The code structure you see has been modified by the browser , What the crawler gets is the source code
So according to the revised xpath Parsing the source code will not find the corresponding element , The return value is naturally empty
give an example
Browser modified code
xpath sentence '/html/body/div[5]/div[3]/div[2]/table/tbody/tr[1]/td[2]/a/@href'

Source code
Lack of one tbody label ,
xpath/html/body/div[5]/div[3]/div[2]/table/tr[1]/td[2]/a/@href take taody Delete

summary When using xpath When the corresponding element cannot be obtained , Look at the source code structure , Analyze according to the source code
版权声明
本文为[Dried fish_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230846268070.html
边栏推荐
- Search tree judgment (25 points)
- Error: cannot find or load main class
- OneFlow学习笔记:从Functor到OpExprInterpreter
- rembg 分割mask
- 2021 Li Hongyi's adaptive learning rate of machine learning
- Latex paper typesetting operation
- L2-022 重排链表 (25 分)(map+结构体模拟)
- Use include in databinding
- Harbor企业级镜像管理系统实战
- Illegal character in scheme name at index 0:
猜你喜欢

On time atom joins hands with oneos live broadcast, and the oneos system tutorial is fully launched

STM32 uses Hal library. The overall structure and function principle are introduced

Noyer électronique stm32 Introduction à l'Internet des objets 30 étapes notes I. différences entre la Bibliothèque Hal et la Bibliothèque standard

正点原子携手OneOS直播 OneOS系统教程全面上线

Reference passing 1

OneFlow學習筆記:從Functor到OpExprInterpreter

引用传递1

idea打包 jar文件

Get the absolute path of the class according to the bytecode

Automatic differentiation and higher order derivative in deep learning framework
随机推荐
php基于哈希算法出现的强弱比较漏洞
Kubernetes如何使用harbor拉去私有镜像
L2-3 浪漫侧影 (25 分)
Navicat remote connection MySQL
单片机数码管秒表
Automatic differentiation and higher order derivative in deep learning framework
Share the office and improve the settled experience
计算神经网络推理时间的正确方法
okcc呼叫中心外呼系统智能系统需要用多大的盘存录音?
Single chip microcomputer nixie tube stopwatch
On time atom joins hands with oneos live broadcast, and the oneos system tutorial is fully launched
Swagger document export custom V2 / API docs interception
HAL库的RCC简介
Illegal character in scheme name at index 0:
Notes on 30 steps of introduction to Internet of things of yangtao electronics STM32 III. Explanation of new cubeide project and setting
2021李宏毅机器学习之Adaptive Learning Rate
还原二叉树 (25 分)
Noyer électronique stm32 Introduction à l'Internet des objets 30 étapes notes I. différences entre la Bibliothèque Hal et la Bibliothèque standard
调包求得每个样本的k个邻居
Valgrind et kcachegrind utilisent l'analyse d'exécution