当前位置:网站首页>2021-09-03 crawler template (only static pages are supported)
2021-09-03 crawler template (only static pages are supported)
2022-04-23 03:40:00 【Cooking code King】
# -*- coding: utf-8 -*-
# @Time : 2021/9/3 21:32
# @Author : Yj Xue
# @FileName: entity_car.py
# @Software: PyCharm 2020.2.2 x64
# @Blog :https://blog.csdn.net/qq_37150711/category_9396602.html
from requests_html import HTMLSession
from requests_html import HTML
import requests
import time
import json
import random
import sys
import os
import csv
from fake_useragent import UserAgent
session = HTMLSession()
url = 'https://car.autohome.com.cn/price/series-4392.html'
#https://car.autohome.com.cn/config/series/3862.html
USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
"Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10"
]
# write in CSV file
def save2csv(writer,bank):
header = ['bank','num' 'url']
#writer.writerow(header)
csvrow1 = []
csvrow1.append(bank)
writer.writerow(csvrow1)
headers = {
"User-Agent": random.choice(USER_AGENTS)} # Here is a key anti crawler strategy !
session = HTMLSession()
# one by one URL To climb
response = session.get(url, headers=headers)
response.html.render() # Anti-reptilian technique ( Simulate human-computer interaction )
print(response)
response.encoding = 'gb2312'
print(response.text)
print("URL=", url)
banks = response.html.find('.interval01-list-cars div p a') # This is the path on the web page
print(banks)
current_dir = os.path.abspath('.')
file_name = os.path.join(current_dir, "data\\entity.csv")
# bank = banks[0]
# bank.default_decoding = 'utf-8'
# print(type(bank.text))
# bank = bank.text.encode('utf-8')
# print(bank)
# exit(0)
with open(file_name, 'wt', newline='', encoding='gb2312') as csvfile1:
for bank in banks:
print(bank.text)
writer = csv.writer(csvfile1) # It's important to build this object well !
# Back to the list
bk = bank.text # Get the content of the brand
# Reverse search : Vehicle system
print(bk)
save2csv(writer, bk)
csvfile1.flush()
print('over!')
版权声明
本文为[Cooking code King]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220601196936.html
边栏推荐
- JS - accuracy issues
- 变量、常量、运算符
- ROS series (IV): ROS communication mechanism series (1): topic communication
- Codeforces round 784 (Div. 4) (AK CF (XD) for the first time)
- Wechat applet cloud database value assignment to array error
- Development record of primary sensitive word detection
- Punch in: 4.23 C language chapter - (1) first knowledge of C language - (12) structure
- 標識符、關鍵字、數據類型
- Code forces round # 784 (DIV. 4) solution (First AK CF (XD)
- MySQL is completely uninstalled and MySQL service is cleaned up
猜你喜欢
Identificateur, mot - clé, type de données
On the principle of concurrent programming and the art of notify / Park
MySQL is completely uninstalled and MySQL service is cleaned up
浅学一下I/O流和File类文件操作
C set
What if win10 doesn't have a local group policy?
Instructions for fastmock
MySQL zip installation tutorial
Openvino only supports Intel CPUs of generation 6 and above
Learn about I / O flow and file operations
随机推荐
ROS series (IV): ROS communication mechanism series (3): parameter server
Identificateur, mot - clé, type de données
Unity knowledge points (ugui 2)
2022 group programming ladder simulation l2-1 blind box packaging line (25 points)
浅学一下I/O流和File类文件操作
Leetcode punch in diary day 01
Let matlab2018b support the mex configuration of vs2019
If statement format flow
What if win10 doesn't have a local group policy?
Raspberry pie 3B logs into the wired end of Ruijie campus network through mentohust, creates WiFi (open hotspot) for other devices, and realizes self startup at the same time
Wechat payment iframe sub page has no response
Activity supports multi window display
Codeforces Round #784 (Div. 4)題解 (第一次AK cf (XD
Unity knowledge points (ugui)
The art of concurrent programming (3): an in-depth understanding of the principle of synchronized
Source code and update details of new instance segmentation network panet (path aggregation network for instance segmentation)
2022 团体程序设计天梯赛 模拟赛 L1-7 矩阵列平移 (20 分)
Opencv4 QR code recognition test
Vs studio modifies C language scanf and other errors
淺學一下I/O流和File類文件操作