当前位置:网站首页>Notes on concurrent programming of vegetables (IX) asynchronous IO to realize concurrent crawler acceleration
Notes on concurrent programming of vegetables (IX) asynchronous IO to realize concurrent crawler acceleration
2022-04-23 10:34:00 【Ape knowledge】
Series index : Concurrent programming notes of vegetables | Python Concurrent programming details ( Continuous updating ~)
List of articles
- One 、 Mind mapping
- Two 、 What is a journey ?
- 3、 ... and 、Python asynchronous IO Library Introduction :asyncio
- Four 、 The power of asynchronous programming
- 5、 ... and 、 The core principle of asynchronous programming
- 6、 ... and 、 Asynchronous programming code example
- 7、 ... and 、 Application of semaphore mechanism
One 、 Mind mapping
Two 、 What is a journey ?
coroutines : My personal understanding is that when there is a need IO In operation ,CPU To perform other procedures , etc. IO Restore after completion CPU Use . When using a collaborative process, you must use loop = asyncio.get_event_loop()
Create a super loop .
3、 ... and 、Python asynchronous IO Library Introduction :asyncio
There are several important keywords , One is async
, The function used to declare is An asynchronous function
, also await
Is to meet IO Block pending to execute await Functions inside ,tasks You can also use... In the list ensure_future
To create . The last line is to execute the crawler until tasks Complete .
Four 、 The power of asynchronous programming
- Nginx As Web The server :
Beat the synchronization blocking server Apache, Support more concurrent connections with fewer resources , Higher efficiency , Be able to support up to 50,000 Responses for the number of concurrent connections , Use epoll and kqueue As a development model - Redis Why so soon? :
Single thread is used to process network requests + Using multiple channels I/O Reuse model , Non blocking IO , Unnecessary context switches and race conditions are avoided , There is also no switching consumption caused by multiple processes or multiple threads CPU, You don't have to worry about locks , There is no lock release operation , There is no performance penalty due to possible deadlocks ; - Node.js The advantages of :
Use event driven 、 Asynchronous programming , Designed for Web Services . Actually Javascript The anonymous function and closure feature of is very suitable for event driven 、 Asynchronous programming .Node.js Non blocking mode IO Processing brings high performance and outstanding load capacity under relatively low system resource consumption , It's perfect for relying on other IO The middle layer services of resources - Go One advantage of language :
Go Use Goroutine and channel It provides a lightweight syntax for generating co procedures and using channels , It makes it very easy to write high concurrent server software , In many cases, we don't need to consider the lock mechanism and all kinds of problems , comparison Python Single Go Applications can also effectively use multiple CPU nucleus , The performance of parallel execution is good
5、 ... and 、 The core principle of asynchronous programming
The core principle 1: Super cycle
Achieve concurrency in a single thread , With a super loop ( In fact, that is while true) loop , Inside, each poll handles all task
The core principle 2:IO Multiplexing
It's a kind of synchronization IO Model , A thread can monitor multiple file handles ;
Once a file handle is ready , The application can be informed of the corresponding read and write operations ;
No file handle is ready to block the application , hand over cpu
Multichannel means network connection , Reuse refers to the same thread
3 Type of implementation , Namely select,poll,epool
-
select
data structure :bitmap maximum connection :1024 fd Copy : Every time you call select Copy The work efficiency : polling O(N)
-
poll
data structure : Array maximum connection : There is no upper limit fd Copy : Every time you call poll Copy The work efficiency : polling O(N)
-
epool
data structure : Red and black trees maximum connection : There is no upper limit fd Copy :fd Call for the first time epool_ctl Copy , Every time you call epoll_wait No copy The work efficiency : Callback O(1)
6、 ... and 、 Asynchronous programming code example
Because the process does not support requests, So we use aiohttp Instead of .
In the coplanar function , Can pass await Syntax to suspend its own coroutine , And wait for another collaboration process to complete until the result is returned . It should be noted that ,await Grammar can only appear through async In the modified function , Otherwise it will be reported SyntaxError error .
import asyncio
import aiohttp
import blog_spider
async def async_craw(url):
print("craw url: ", url)
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
result = await resp.text()
print(f"craw url: {url}, {len(result)}")
loop = asyncio.get_event_loop()
tasks = [
loop.create_task(async_craw(url))
for url in blog_spider.urls]
import time
start = time.time()
loop.run_until_complete(asyncio.wait(tasks))
end = time.time()
print("use time seconds: ", end - start)
7、 ... and 、 Application of semaphore mechanism
We mainly use semaphore mechanism to control the number of processes , The next three add # The line is the difference from the previous code , You can learn by comparison .
import asyncio
import aiohttp
import blog_spider
semaphore = asyncio.Semaphore(10)#
async def async_craw(url):
async with semaphore:#
print("craw url: ", url)
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
result = await resp.text()
await asyncio.sleep(5)#
print(f"craw url: {url}, {len(result)}")
loop = asyncio.get_event_loop()
tasks = [
loop.create_task(async_craw(url))
for url in blog_spider.urls]
import time
start = time.time()
loop.run_until_complete(asyncio.wait(tasks))
end = time.time()
print("use time seconds: ", end - start)
In the next article, we will analyze Gevent
and asyncio
The difference between , And explain how to use Gevent Transform an Asynchronous Server .
Python Advanced concurrent programming is continuously updated , welcome
Like collection
+Focus on
Last one : Concurrent programming notes of vegetables |( 8、 ... and ) Using multiple processes multiprocessing Project development
Next : Concurrent programming notes of vegetables |( Ten ) Asynchronous programming library Asyncio and Gevent Comparison of 、 Use Gevent Transform asynchronous server
My level is limited , Please comment and correct the deficiencies in the article in the comment area below ~If feelings help you , Point a praise Give me a hand ~
Share... From time to time Interesting 、 Have a material 、 Nutritious content , welcome Subscribe to follow My blog , Looking forward to meeting you here ~
版权声明
本文为[Ape knowledge]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230619310018.html
边栏推荐
- SQL tuning series - SQL performance methodology
- 转:毛姆:阅读是一座随身携带的避难所
- Chapter I Oracle database in memory related concepts (Continued) (im-1.2)
- 【leetcode】102.二叉树的层序遍历
- JVM——》常用命令
- 【省选联考 2022 D2T1】卡牌(状态压缩 DP,FWT卷积)
- CSP certification 202203-2 travel plan (multiple solutions)
- 24、两两交换链表中的节点(链表)
- What about Jerry's stack overflow? [chapter]
- [provincial election joint examination 2022 d2t1] card (state compression DP, FWT convolution)
猜你喜欢
Read LSTM (long short term memory)
得到知识服务app原型设计比较与实践
net start mysql MySQL 服务正在启动 . MySQL 服务无法启动。 服务没有报告任何错误。
Jerry's more accurate determination of abnormal address [chapter]
一文看懂 LSTM(Long Short-Term Memory)
101. Symmetric Tree
Initial exploration of NVIDIA's latest 3D reconstruction technology instant NGP
lnmp的配置
Wonderful review | deepnova x iceberg meetup online "building a real-time data Lake based on iceberg"
MySQL how to merge the same data in the same table
随机推荐
[Niuke challenge 47] C. conditions (BitSet acceleration Floyd)
Installing MySQL with CentOS / Linux
How can swagger2 custom parameter annotations not be displayed
SQL Server 递归查询上下级
What are the system events of Jerry's [chapter]
MySQL common statements
Windows installs redis and sets the redis service to start automatically
203. Remove linked list elements (linked list)
【leetcode】107.二叉树的层序遍历II
IDEA——》每次启动都会Indexing或 scanning files to index
SQL tuning series - Introduction to SQL tuning
Jerry's users how to handle events in the simplest way [chapter]
1、两数之和(哈希表)
部署jar包
206. Reverse linked list (linked list)
LeetCode-608. Tree node
142. Circular linked list||
Detailed explanation of MapReduce calculation process
景联文科技—专业数据标注公司和智能数据标注平台
shell脚本免交互