当前位置:网站首页>Pyppeter crawler
Pyppeter crawler
2022-04-23 18:00:00 【Round programmer】
import asyncio
import pyppeteer
from user_agents import UA
from collections import namedtuple
Response = namedtuple("rs", "title url html cookies headers history status")
async def get_html(url, timeout=30):
browser = await pyppeteer.launch(headless=True, args=['--no-sandbox'])
page = await browser.newPage()
await page.setUserAgent(UA)
res = await page.goto(url, options={
'timeout': int(timeout * 1000)})
# stay while Forcibly query an element in the loop and wait
while not await page.querySelector('.share-box'):
pass
# Scroll to the bottom of the page
await page.evaluate('window.scrollBy(0, window.innerHeight)')
data = await page.content()
title = await page.title()
resp_cookies = await page.cookies()
resp_headers = res.headers
resp_history = None
resp_status = res.status
response = Response(
title=title,
url=url,
html=data,
cookies=resp_cookies,
headers=resp_headers,
history=resp_history,
status=resp_status
)
return response
if __name__ == '__main__':
url_list = [
"http://gxt.hunan.gov.cn//gxt/xxgk_71033/czxx/201005/t20100528_2069234.html",
"http://gxt.hunan.gov.cn//gxt/xxgk_71033/czxx/201005/t20100528_2069221.html",
"http://gxt.hunan.gov.cn//gxt/xxgk_71033/czxx/200811/t20081111_2069210.html"
]
task = (get_html(url) for url in url_list)
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*task))
for res in results:
print(res.title)
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545315893.html
边栏推荐
- 2022 Shanghai safety officer C certificate operation certificate examination question bank and simulation examination
- Build openstack platform
- C language implements memcpy, memset, strcpy, strncpy, StrCmp, strncmp and strlen
- [UDS unified diagnostic service] IV. typical diagnostic service (6) - input / output control unit (0x2F)
- Tensorflow tensor introduction
- ROS package NMEA_ navsat_ Driver reads GPS and Beidou Positioning Information Notes
- Summary of floating point double precision, single precision and half precision knowledge
- SystemVerilog (VI) - variable
- Nat commun | current progress and open challenges of applied deep learning in Bioscience
- Identification verification code
猜你喜欢
Visualization of residential house prices
MySQL 中的字符串函数
JS get link? The following parameter name or value, according to the URL? Judge the parameters after
Clion installation tutorial
Auto.js 自定义对话框
2022 tea artist (primary) examination simulated 100 questions and simulated examination
Implementation of object detection case based on SSD
Fashion classification case based on keras
C1 notes [task training chapter I]
ArcGIS license error -15 solution
随机推荐
re正則錶達式
2022江西光伏展,中国分布式光伏展会,南昌太阳能利用展
Cloud native Virtualization: building edge computing instances based on kubevirt
20222 return to the workplace
Client example analysis of easymodbustcp
Timestamp to formatted date
Nanotechnology + AI enabled proteomics | Luomi life technology completed nearly ten million US dollars of financing
C1小笔记【任务训练篇一】
How to install jsonpath package
.105Location
[UDS unified diagnostic service] v. diagnostic application example: Flash bootloader
Re expression régulière
ArcGIS license error -15 solution
Thirteen documents in software engineering
.105Location
Element calculation distance and event object
纳米技术+AI赋能蛋白质组学|珞米生命科技完成近千万美元融资
Yolov4 pruning [with code]
2022 judgment questions and answers for operation of refrigeration and air conditioning equipment
Arcpy adds fields and loop assignments to vector data