当前位置:网站首页>Multi thread crawling Marco Polo network supplier data
Multi thread crawling Marco Polo network supplier data
2022-04-23 18:00:00 【Round programmer】
This paper aims to exchange learning , Don't use it for other purposes , Otherwise, we will be responsible for the consequences
Environmental Science linux+pycharm+anaconda
import json
import csv
import random
from queue import Queue
import threading
import requests
from usere_agent import UA
from lxml import etree
HEADER = {
'User-Agent': UA,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip, deflate',
}
def get_request(url):
try:
response = requests.get(
url=url,
headers=HEADER,
verify=True,
timeout=50
)
return response.text
except Exception as e:
pass
class Img(threading.Thread):
def __init__(self, list_img):
threading.Thread.__init__(self)
self.list_img = list_img
def run(self):
while True:
keys = self.list_img.get()# take key Elements in the list
self.Get_img(keys)
self.list_img.task_done()# Automatically exit the program when the element cannot be retrieved
def Get_img(self, key):
try:
n_d = get_request(key)
n_data = etree.HTML(n_d)
good_url = n_data.xpath(
r'.//div[@class="s_product_item"]//div[@class="s_product_pic_box"]/a[@target="_blank"]/@href')
if good_url:
for j in good_url:
good_detali = get_request(j)
goo_deta_data = etree.HTML(good_detali)
title_deta = goo_deta_data.xpath(r'.//div[@class="con_msg f1"]/div[@class="con_title"]/text()')
price = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]/div[@class="con_price"]/span[@class="price"]/text()')
company_name = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/text()')
company_href = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/@href')
if company_href:
# print(company_href[0])
company_deta = get_request(company_href[0])
company_deta_data = etree.HTML(company_deta)
contacts = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[1]/text()')
phone = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[2]/span[2]/text()')
address = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[3]/text()')
#print(ti)
with open('/media/liu/_dde_data/project/spider/ supplier /mkbl_data/' + ti + '.csv', 'a+') as f:
f_csv = csv.writer(f)
f_csv.writerow([ti,title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0], address[0]])
print(ti, title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0],
address[0])
except Exception as e:
pass
if __name__ == '__main__':
list_img =Queue()
url='http://china.makepolo.com/list/d14/'
d = get_request(url)
data = etree.HTML(d)
href = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/@href')
title = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/text()')
for ti, h in zip(title, href):
for i in range(1, 101):
n_h = h + '{}/'.format(str(i))
list_img.put(n_h)
for item in range(9):
t = Img(list_img)
t.setDaemon(True)
t.start()
list_img.join()
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545316006.html
边栏推荐
- C language implements memcpy, memset, strcpy, strncpy, StrCmp, strncmp and strlen
- 2022 Shanghai safety officer C certificate operation certificate examination question bank and simulation examination
- Go's gin framework learning
- Amount input box, used for recharge and withdrawal
- JS forms the items with the same name in the array object into the same array according to the name
- Calculation of fishing net road density
- cv_ Solution of mismatch between bridge and opencv
- Svn simple operation command
- Special effects case collection: mouse planet small tail
- 油猴网站地址
猜你喜欢

YOLOv4剪枝【附代码】

2022制冷与空调设备运行操作判断题及答案

k8s之实现redis一主多从动态扩缩容

Cross domain settings of Chrome browser -- including new and old versions

Go language JSON package usage
![Yolov4 pruning [with code]](/img/09/ea4376d52edb7e419ace2cb1e0356b.gif)
Yolov4 pruning [with code]

Re regular expression

Gets the time range of the current week

String function in MySQL
![[UDS unified diagnostic service] IV. typical diagnostic service (4) - online programming function unit (0x34-0x38)](/img/07/4814eb203dcca59416a7997bbedbf6.png)
[UDS unified diagnostic service] IV. typical diagnostic service (4) - online programming function unit (0x34-0x38)
随机推荐
Encapsulate a timestamp to date method on string prototype
Read excel, int digital time to time
On the method of outputting the complete name of typeID from GCC
Amount input box, used for recharge and withdrawal
Batch export ArcGIS attribute table
This point in JS
k8s之实现redis一主多从动态扩缩容
Transfer learning of five categories of pictures based on VGg
消费者灰度实现思路
Use of list - addition, deletion, modification and query
油猴网站地址
Gets the time range of the current week
Operation of 2022 mobile crane driver national question bank simulation examination platform
Calculation of fishing net road density
[UDS unified diagnostic service] IV. typical diagnostic service (4) - online programming function unit (0x34-0x38)
Oil monkey website address
Summary of common server error codes
2022年广东省安全员A证第三批(主要负责人)特种作业证考试题库及在线模拟考试
Random number generation of C #
Re expression régulière