当前位置:网站首页>爬取小米有品app商品数据
爬取小米有品app商品数据
2022-04-23 05:46:00 【圆滚滚的程序员】
本文旨在交流学习,勿作他用,否则后果自负
环境 linux+pycharm+anaconda
import csv
import requests
from lxml import etree
import re
import random
import json
from usere_agent import UA
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
url = 'https://youpin.mi.com/app/shopv3/pipe'
headers1 = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '130',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'youpin.mi.com',
'Origin': 'https://youpin.mi.com',
'Referer': 'https://youpin.mi.com/',
'User-Agent': UA)
}
headers2 = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '145',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': (你的cookie),
'Origin': 'https://youpin.mi.com',
'Referer': 'https://youpin.mi.com/',
'User-Agent': UA,
}
headers3 = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '364',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'youpin.mi.com',
'Origin': 'https://youpin.mi.com',
'Referer': 'https://youpin.mi.com/detail?gid={}'.format(str(id)),
'User-Agent': UA,
}
data1 = {
'mkbl_data': '{"result": {"model": "Homepage", "action": "GetGroup2ClassInfo", "parameters": {}}}',
}
req = requests.post(url=url, headers=headers1, data=data1, verify=False).json()
groups = req['result']['result']['mkbl_data']['groups']
c_name=[]
c_id=[]
for i in groups:
for j in i:
class1_name = j['class']['name']
ucid1 = j['class']['ucid']
c_name.append(class1_name)
c_id.append(ucid1)
for k in j['sub_class']:
class2_name = k['name']
ucid2 = k['ucid']
for i,j in zip(c_name,c_id):
s = requests.session()
s.headers.update(headers2)
data2 = {
'mkbl_data': '{"uClassList": {"model": "Homepage", "action": "BuildHome", "parameters": {"id": "' + str(
j) + '"}}}'
}
respon = s.post(url=url, data=data2, verify=False).json()
print(respon)
itemdata = respon['result']['uClassList']['mkbl_data']
for j in itemdata:
if 'content' in j:
content_name = j['content']['name']
ucid = j['content']['ucid']
for k in j['mkbl_data']:
try:
gid = k['gid'] ##商品ID
name = k['name'] ##商品名称
summary = k['summary'] ##商品简介
pic_url = k['pic_url'] ##商品图片
price_min = int(k['price_min']) / 100 ##价格
itemurl = k['url'] ##商品链接
print(i,name,summary,pic_url,price_min,itemurl)
with open('/media/liu/_dde_data/project/spider/供应商/xmyp/' + i + '.csv', 'a+') as f:
f_csv = csv.writer(f)
f_csv.writerows([(i,name,summary,pic_url,price_min,itemurl)])
except:
continue
版权声明
本文为[圆滚滚的程序员]所创,转载请带上原文链接,感谢
https://blog.csdn.net/qq_39483957/article/details/106319353
边栏推荐
- Common shortcut keys of IDE
- Storing inherited knowledge in cloud computing
- [transfer] MySQL: how many rows of data can InnoDB store in a B + tree?
- 队列解决约瑟夫问题
- Problems and solutions of database migration
- 4. Print form
- GNU EFI header file
- 自动控制原理知识点整合归纳(韩敏版)
- Basic knowledge of network in cloud computing
- Stability building best practices
猜你喜欢
随机推荐
檢測技術與原理
Export of data
GNU EFI header file
Stability building best practices
xlsxwriter.exceptions.FileCreateError: [Errno 13] Permission denied问题
爬取手游网站游戏详情和评论(MQ+多线程)
Formation à la programmation
Rust 的 Box指针
9.Life, the Universe, and Everything
检测技术与原理
POJ - 2955 brackets interval DP
@Problems caused by internal dead loop of postconstruct method
Kibana search syntax
LockSupport. Park and unpark, wait and notify
Common sense of thread pool
8. Integer Decomposition
Conversion between JS object and string
从源代码到可执行文件的过程
Generation of verification code
定位器








![[leetcode 54] spiral matrix](/img/c0/9a55a62befb783a5bfc39dc3a96cb2.png)