当前位置:网站首页>Multi thread crawling Marco Polo network supplier data
Multi thread crawling Marco Polo network supplier data
2022-04-23 18:00:00 【Round programmer】
This paper aims to exchange learning , Don't use it for other purposes , Otherwise, we will be responsible for the consequences
Environmental Science linux+pycharm+anaconda
import json
import csv
import random
from queue import Queue
import threading
import requests
from usere_agent import UA
from lxml import etree
HEADER = {
'User-Agent': UA,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip, deflate',
}
def get_request(url):
try:
response = requests.get(
url=url,
headers=HEADER,
verify=True,
timeout=50
)
return response.text
except Exception as e:
pass
class Img(threading.Thread):
def __init__(self, list_img):
threading.Thread.__init__(self)
self.list_img = list_img
def run(self):
while True:
keys = self.list_img.get()# take key Elements in the list
self.Get_img(keys)
self.list_img.task_done()# Automatically exit the program when the element cannot be retrieved
def Get_img(self, key):
try:
n_d = get_request(key)
n_data = etree.HTML(n_d)
good_url = n_data.xpath(
r'.//div[@class="s_product_item"]//div[@class="s_product_pic_box"]/a[@target="_blank"]/@href')
if good_url:
for j in good_url:
good_detali = get_request(j)
goo_deta_data = etree.HTML(good_detali)
title_deta = goo_deta_data.xpath(r'.//div[@class="con_msg f1"]/div[@class="con_title"]/text()')
price = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]/div[@class="con_price"]/span[@class="price"]/text()')
company_name = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/text()')
company_href = goo_deta_data.xpath(
r'.//div[@class="con_msg f1"]//div[@class="con_item"]/ul/li[3]/a[@target="_blank"]/@href')
if company_href:
# print(company_href[0])
company_deta = get_request(company_href[0])
company_deta_data = etree.HTML(company_deta)
contacts = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[1]/text()')
phone = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[2]/span[2]/text()')
address = company_deta_data.xpath(r'.//div[@class="item_info"]/ul/li[3]/text()')
#print(ti)
with open('/media/liu/_dde_data/project/spider/ supplier /mkbl_data/' + ti + '.csv', 'a+') as f:
f_csv = csv.writer(f)
f_csv.writerow([ti,title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0], address[0]])
print(ti, title_deta[0], price[0], company_name[0], company_href[0], contacts[0], phone[0],
address[0])
except Exception as e:
pass
if __name__ == '__main__':
list_img =Queue()
url='http://china.makepolo.com/list/d14/'
d = get_request(url)
data = etree.HTML(d)
href = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/@href')
title = data.xpath(r'.//div[@class="category clearfix"]//dl//dd//a/text()')
for ti, h in zip(title, href):
for i in range(1, 101):
n_h = h + '{}/'.format(str(i))
list_img.put(n_h)
for item in range(9):
t = Img(list_img)
t.setDaemon(True)
t.start()
list_img.join()
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545316006.html
边栏推荐
- Transfer learning of five categories of pictures based on VGg
- C language input and output (printf and scanf functions, putchar and getchar functions)
- Classes and objects
- Fashion classification case based on keras
- Thirteen documents in software engineering
- JS get link? The following parameter name or value, according to the URL? Judge the parameters after
- Use of list - addition, deletion, modification and query
- 2022 Jiangxi Photovoltaic Exhibition, China Distributed Photovoltaic Exhibition, Nanchang Solar Energy Utilization Exhibition
- Go language JSON package usage
- C1小笔记【任务训练篇二】
猜你喜欢

ArcGIS license error -15 solution

An example of linear regression based on tensorflow

2022 Jiangxi energy storage technology exhibition, China Battery exhibition, power battery exhibition and fuel cell Exhibition
![C1 notes [task training chapter I]](/img/2b/94a700da6858a96faf408d167e75bb.png)
C1 notes [task training chapter I]

C# 网络相关操作
![C1 notes [task training part 2]](/img/10/48f7490a6c097f2b178ae948cb2c91.png)
C1 notes [task training part 2]

Batch export ArcGIS attribute table

SystemVerilog (VI) - variable

Laser slam theory and practice of dark blue College Chapter 3 laser radar distortion removal exercise

Go language JSON package usage
随机推荐
消费者灰度实现思路
[UDS unified diagnostic service] v. diagnostic application example: Flash bootloader
Use of list - addition, deletion, modification and query
Crack sliding verification code
Special effects case collection: mouse planet small tail
Element calculation distance and event object
Anchor location - how to set the distance between the anchor and the top of the page. The anchor is located and offset from the top
读取excel,int 数字时间转时间
2022 Shanghai safety officer C certificate operation certificate examination question bank and simulation examination
Summary of floating point double precision, single precision and half precision knowledge
SystemVerilog (VI) - variable
Random number generation of C #
Remember using Ali Font Icon Library for the first time
How to read literature
Detailed deployment of flask project
Implementation of image recognition code based on VGg convolutional neural network
Identification verification code
列表的使用-增删改查
I / O multiplexing and its related details
开源按键组件Multi_Button的使用,含测试工程