当前位置:网站首页>爬取小米有品app商品数据
爬取小米有品app商品数据
2022-04-23 05:46:00 【圆滚滚的程序员】
本文旨在交流学习,勿作他用,否则后果自负
环境 linux+pycharm+anaconda
import csv
import requests
from lxml import etree
import re
import random
import json
from usere_agent import UA
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
url = 'https://youpin.mi.com/app/shopv3/pipe'
headers1 = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '130',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'youpin.mi.com',
'Origin': 'https://youpin.mi.com',
'Referer': 'https://youpin.mi.com/',
'User-Agent': UA)
}
headers2 = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '145',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': (你的cookie),
'Origin': 'https://youpin.mi.com',
'Referer': 'https://youpin.mi.com/',
'User-Agent': UA,
}
headers3 = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '364',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'youpin.mi.com',
'Origin': 'https://youpin.mi.com',
'Referer': 'https://youpin.mi.com/detail?gid={}'.format(str(id)),
'User-Agent': UA,
}
data1 = {
'mkbl_data': '{"result": {"model": "Homepage", "action": "GetGroup2ClassInfo", "parameters": {}}}',
}
req = requests.post(url=url, headers=headers1, data=data1, verify=False).json()
groups = req['result']['result']['mkbl_data']['groups']
c_name=[]
c_id=[]
for i in groups:
for j in i:
class1_name = j['class']['name']
ucid1 = j['class']['ucid']
c_name.append(class1_name)
c_id.append(ucid1)
for k in j['sub_class']:
class2_name = k['name']
ucid2 = k['ucid']
for i,j in zip(c_name,c_id):
s = requests.session()
s.headers.update(headers2)
data2 = {
'mkbl_data': '{"uClassList": {"model": "Homepage", "action": "BuildHome", "parameters": {"id": "' + str(
j) + '"}}}'
}
respon = s.post(url=url, data=data2, verify=False).json()
print(respon)
itemdata = respon['result']['uClassList']['mkbl_data']
for j in itemdata:
if 'content' in j:
content_name = j['content']['name']
ucid = j['content']['ucid']
for k in j['mkbl_data']:
try:
gid = k['gid'] ##商品ID
name = k['name'] ##商品名称
summary = k['summary'] ##商品简介
pic_url = k['pic_url'] ##商品图片
price_min = int(k['price_min']) / 100 ##价格
itemurl = k['url'] ##商品链接
print(i,name,summary,pic_url,price_min,itemurl)
with open('/media/liu/_dde_data/project/spider/供应商/xmyp/' + i + '.csv', 'a+') as f:
f_csv = csv.writer(f)
f_csv.writerows([(i,name,summary,pic_url,price_min,itemurl)])
except:
continue
版权声明
本文为[圆滚滚的程序员]所创,转载请带上原文链接,感谢
https://blog.csdn.net/qq_39483957/article/details/106319353
边栏推荐
猜你喜欢
随机推荐
Failure to deliver XID in Seata distributed transaction project
Arcpy为矢量数据添加字段与循环赋值
[leetcode 19] delete the penultimate node of the linked list
Mysql database foundation
Doomsday (simple computational geometry)
POI and easyexcel exercises
程序设计训练
Addition, deletion, modification and query of MySQL table
POJ - 2955 brackets interval DP
Easy to use data set and open source network comparison website
@Problems caused by internal dead loop of postconstruct method
xlsxwriter.exceptions.FileCreateError: [Errno 13] Permission denied问题
Introduction to virtualization features
Optional best practices
Rust 中的 RefCell
12. Monkeys climb mountains
Plane semi intersecting plate
Generate excel template (drop-down selection, multi-level linkage)
[untitled] database - limit the number of returned rows
Definition of C class and method









