当前位置:网站首页>Pyppeter crawler
Pyppeter crawler
2022-04-23 18:00:00 【Round programmer】
import asyncio
import pyppeteer
from user_agents import UA
from collections import namedtuple
Response = namedtuple("rs", "title url html cookies headers history status")
async def get_html(url, timeout=30):
browser = await pyppeteer.launch(headless=True, args=['--no-sandbox'])
page = await browser.newPage()
await page.setUserAgent(UA)
res = await page.goto(url, options={
'timeout': int(timeout * 1000)})
# stay while Forcibly query an element in the loop and wait
while not await page.querySelector('.share-box'):
pass
# Scroll to the bottom of the page
await page.evaluate('window.scrollBy(0, window.innerHeight)')
data = await page.content()
title = await page.title()
resp_cookies = await page.cookies()
resp_headers = res.headers
resp_history = None
resp_status = res.status
response = Response(
title=title,
url=url,
html=data,
cookies=resp_cookies,
headers=resp_headers,
history=resp_history,
status=resp_status
)
return response
if __name__ == '__main__':
url_list = [
"http://gxt.hunan.gov.cn//gxt/xxgk_71033/czxx/201005/t20100528_2069234.html",
"http://gxt.hunan.gov.cn//gxt/xxgk_71033/czxx/201005/t20100528_2069221.html",
"http://gxt.hunan.gov.cn//gxt/xxgk_71033/czxx/200811/t20081111_2069210.html"
]
task = (get_html(url) for url in url_list)
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*task))
for res in results:
print(res.title)
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545315893.html
边栏推荐
- C1 notes [task training part 2]
- Laser slam theory and practice of dark blue College Chapter 3 laser radar distortion removal exercise
- ArcGIS license error -15 solution
- Selenium + phantom JS crack sliding verification 2
- Logic regression principle and code implementation
- Auto. JS custom dialog box
- Process management command
- Welcome to the markdown editor
- Tell the truth of TS
- 纳米技术+AI赋能蛋白质组学|珞米生命科技完成近千万美元融资
猜你喜欢

The JS timestamp of wechat applet is converted to / 1000 seconds. After six hours and one day, this Friday option calculates the time

2022年茶艺师(初级)考试模拟100题及模拟考试

Nat Commun|在生物科学领域应用深度学习的当前进展和开放挑战

Summary of floating point double precision, single precision and half precision knowledge
![[UDS unified diagnostic service] IV. typical diagnostic service (4) - online programming function unit (0x34-0x38)](/img/07/4814eb203dcca59416a7997bbedbf6.png)
[UDS unified diagnostic service] IV. typical diagnostic service (4) - online programming function unit (0x34-0x38)

.105Location

2022 tea artist (primary) examination simulated 100 questions and simulated examination

极致体验,揭晓抖音背后的音视频技术

Element calculation distance and event object

Operators in C language
随机推荐
C1 notes [task training chapter I]
Yolov4 pruning [with code]
Flash - Middleware
Crawl the product data of Xiaomi Youpin app
开源按键组件Multi_Button的使用,含测试工程
Romance in C language
C network related operations
Re expression régulière
C# 网络相关操作
How to read literature
云原生虚拟化:基于 Kubevirt 构建边缘计算实例
Anchor location - how to set the distance between the anchor and the top of the page. The anchor is located and offset from the top
Land cover / use data product download
JS get link? The following parameter name or value, according to the URL? Judge the parameters after
Gets the time range of the current week
2022 Jiangxi Photovoltaic Exhibition, China Distributed Photovoltaic Exhibition, Nanchang Solar Energy Utilization Exhibition
Go file operation
Auto.js 自定义对话框
Remember using Ali Font Icon Library for the first time
ES6