当前位置:网站首页>Multi thread crawling Marco Polo network supplier data
Multi thread crawling Marco Polo network supplier data
2022-04-23 18:00:00 【Round programmer】
This paper aims to exchange learning , Don't use it for other purposes , Otherwise, we will be responsible for the consequences
Environmental Science linux+pycharm+anaconda
import json
import csv
import random
from queue import Queue
import threading
import requests
from usere_agent import UA
from lxml import etree
HEADER = {
'User-Agent': UA,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip, deflate',
}
def get_request(url):
try:
response = requests.get(
url=url,
headers=HEADER,
verify=True,
timeout=50
)
return response.text
except Exception as e:
pass
class Img(threading.Thread):
def __init__(self, list_img):
threading.Thread.__init__(self)
self.list_img = list_img
def run(self):
while True:
keys = self.list_img.get()# take key Elements in the list
self.Get_img(keys)
self.list_img.task_done()# Automatically exit the program when the element cannot be retrieved
def Get_img(self, key):
try:
n_d = get_request(key)
n_data = etree.HTML(n_d)
good_url = n_data.xpath(
r'.//div[@class="s_product_item"]//div[@class="s_product_pic_box"]/a[@target="_blank"]/@href')
if good_url:
for j in good_url:
good_detali = get_request(j)
goo_deta_data = etree.HTML(good_detali)
title_deta = goo_deta_data.xpath(r'.//div[@class="con_msg f1"]/div[@class="con_title"]/text()')
price = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]/div[@class="con_price"]/span[@class="price"]/text()')
company_name = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/text()')
company_href = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/@href')
if company_href:
# print(company_href[0])
company_deta = get_request(company_href[0])
company_deta_data = etree.HTML(company_deta)
contacts = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[1]/text()')
phone = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[2]/span[2]/text()')
address = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[3]/text()')
#print(ti)
with open('/media/liu/_dde_data/project/spider/ supplier /mkbl_data/' + ti + '.csv', 'a+') as f:
f_csv = csv.writer(f)
f_csv.writerow([ti,title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0], address[0]])
print(ti, title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0],
address[0])
except Exception as e:
pass
if __name__ == '__main__':
list_img =Queue()
url='http://china.makepolo.com/list/d14/'
d = get_request(url)
data = etree.HTML(d)
href = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/@href')
title = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/text()')
for ti, h in zip(title, href):
for i in range(1, 101):
n_h = h + '{}/'.format(str(i))
list_img.put(n_h)
for item in range(9):
t = Img(list_img)
t.setDaemon(True)
t.start()
list_img.join()
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545316006.html
边栏推荐
- Go language JSON package usage
- _ FindText error
- Click Cancel to return to the previous page and modify the parameter value of the previous page, let pages = getcurrentpages() let prevpage = pages [pages. Length - 2] / / the data of the previous pag
- 2022制冷与空调设备运行操作判断题及答案
- C1小笔记【任务训练篇二】
- Generate verification code
- Nat commun | current progress and open challenges of applied deep learning in Bioscience
- Read excel, int digital time to time
- Data stream encryption and decryption of C
- 读取excel,int 数字时间转时间
猜你喜欢
Qtablewidget usage explanation
Nat Commun|在生物科学领域应用深度学习的当前进展和开放挑战
Leak detection and vacancy filling (6)
Go's gin framework learning
Clion installation tutorial
2022 tea artist (primary) examination simulated 100 questions and simulated examination
Gaode map search, drag and drop query address
极致体验,揭晓抖音背后的音视频技术
MySQL_ 01_ Simple data retrieval
The ultimate experience, the audio and video technology behind the tiktok
随机推荐
Uniapp custom search box adaptation applet alignment capsule
纳米技术+AI赋能蛋白质组学|珞米生命科技完成近千万美元融资
ROS package NMEA_ navsat_ Driver reads GPS and Beidou Positioning Information Notes
Operation of 2022 mobile crane driver national question bank simulation examination platform
Format problems encountered in word typesetting
Error in created hook: "referenceerror:" promise "undefined“
The ultimate experience, the audio and video technology behind the tiktok
2022江西光伏展,中國分布式光伏展會,南昌太陽能利用展
Flash - Middleware
2022 judgment questions and answers for operation of refrigeration and air conditioning equipment
Excel opens large CSV format data
C language implements memcpy, memset, strcpy, strncpy, StrCmp, strncmp and strlen
油猴网站地址
Batch export ArcGIS attribute table
C [file operation] read TXT text by line
cartographer_ There is no problem compiling node, but running the bug that hangs directly
Anchor location - how to set the distance between the anchor and the top of the page. The anchor is located and offset from the top
Leak detection and vacancy filling (6)
Flask项目的部署详解
Laser slam theory and practice of dark blue College Chapter 3 laser radar distortion removal exercise