当前位置:网站首页>Multi thread crawling Marco Polo network supplier data
Multi thread crawling Marco Polo network supplier data
2022-04-23 18:00:00 【Round programmer】
This paper aims to exchange learning , Don't use it for other purposes , Otherwise, we will be responsible for the consequences
Environmental Science linux+pycharm+anaconda
import json
import csv
import random
from queue import Queue
import threading
import requests
from usere_agent import UA
from lxml import etree
HEADER = {
'User-Agent': UA,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip, deflate',
}
def get_request(url):
try:
response = requests.get(
url=url,
headers=HEADER,
verify=True,
timeout=50
)
return response.text
except Exception as e:
pass
class Img(threading.Thread):
def __init__(self, list_img):
threading.Thread.__init__(self)
self.list_img = list_img
def run(self):
while True:
keys = self.list_img.get()# take key Elements in the list
self.Get_img(keys)
self.list_img.task_done()# Automatically exit the program when the element cannot be retrieved
def Get_img(self, key):
try:
n_d = get_request(key)
n_data = etree.HTML(n_d)
good_url = n_data.xpath(
r'.//div[@class="s_product_item"]//div[@class="s_product_pic_box"]/a[@target="_blank"]/@href')
if good_url:
for j in good_url:
good_detali = get_request(j)
goo_deta_data = etree.HTML(good_detali)
title_deta = goo_deta_data.xpath(r'.//div[@class="con_msg f1"]/div[@class="con_title"]/text()')
price = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]/div[@class="con_price"]/span[@class="price"]/text()')
company_name = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/text()')
company_href = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/@href')
if company_href:
# print(company_href[0])
company_deta = get_request(company_href[0])
company_deta_data = etree.HTML(company_deta)
contacts = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[1]/text()')
phone = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[2]/span[2]/text()')
address = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[3]/text()')
#print(ti)
with open('/media/liu/_dde_data/project/spider/ supplier /mkbl_data/' + ti + '.csv', 'a+') as f:
f_csv = csv.writer(f)
f_csv.writerow([ti,title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0], address[0]])
print(ti, title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0],
address[0])
except Exception as e:
pass
if __name__ == '__main__':
list_img =Queue()
url='http://china.makepolo.com/list/d14/'
d = get_request(url)
data = etree.HTML(d)
href = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/@href')
title = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/text()')
for ti, h in zip(title, href):
for i in range(1, 101):
n_h = h + '{}/'.format(str(i))
list_img.put(n_h)
for item in range(9):
t = Img(list_img)
t.setDaemon(True)
t.start()
list_img.join()
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545316006.html
边栏推荐
- Notes on common basic usage of eigen Library
- Detailed deployment of flask project
- Land cover / use data product download
- 2022 Jiangxi Photovoltaic Exhibition, China Distributed Photovoltaic Exhibition, Nanchang Solar Energy Utilization Exhibition
- Secure credit
- ArcGIS license error -15 solution
- Utilisation de la liste - Ajouter, supprimer et modifier la requête
- Submit local warehouse and synchronize code cloud warehouse
- Add animation to the picture under V-for timing
- [UDS unified diagnostic service] (Supplement) v. detailed explanation of ECU bootloader development points (1)
猜你喜欢

Re expression régulière

Halo open source project learning (II): entity classes and data tables

Summary of floating point double precision, single precision and half precision knowledge

Batch export ArcGIS attribute table

Qtablewidget usage explanation

极致体验,揭晓抖音背后的音视频技术

Leak detection and vacancy filling (6)

.104History

Fashion classification case based on keras

Operators in C language
随机推荐
Nanotechnology + AI enabled proteomics | Luomi life technology completed nearly ten million US dollars of financing
C# 网络相关操作
C1 notes [task training chapter I]
Nat commun | current progress and open challenges of applied deep learning in Bioscience
C1小笔记【任务训练篇二】
Format problems encountered in word typesetting
2022 Shanghai safety officer C certificate operation certificate examination question bank and simulation examination
Classes and objects
[UDS unified diagnostic service] IV. typical diagnostic service (6) - input / output control unit (0x2F)
C# 的数据流加密与解密
Flash operates on multiple databases
Svn simple operation command
Halo open source project learning (II): entity classes and data tables
C language loop structure program
cv_ Solution of mismatch between bridge and opencv
20222 return to the workplace
positioner
The ultimate experience, the audio and video technology behind the tiktok
Error in created hook: "referenceerror:" promise "undefined“
re正則錶達式