当前位置:网站首页>2021-09-03 crawler template (only static pages are supported)
2021-09-03 crawler template (only static pages are supported)
2022-04-23 03:40:00 【Cooking code King】
# -*- coding: utf-8 -*-
# @Time : 2021/9/3 21:32
# @Author : Yj Xue
# @FileName: entity_car.py
# @Software: PyCharm 2020.2.2 x64
# @Blog :https://blog.csdn.net/qq_37150711/category_9396602.html
from requests_html import HTMLSession
from requests_html import HTML
import requests
import time
import json
import random
import sys
import os
import csv
from fake_useragent import UserAgent
session = HTMLSession()
url = 'https://car.autohome.com.cn/price/series-4392.html'
#https://car.autohome.com.cn/config/series/3862.html
USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
"Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10"
]
# write in CSV file
def save2csv(writer,bank):
header = ['bank','num' 'url']
#writer.writerow(header)
csvrow1 = []
csvrow1.append(bank)
writer.writerow(csvrow1)
headers = {
"User-Agent": random.choice(USER_AGENTS)} # Here is a key anti crawler strategy !
session = HTMLSession()
# one by one URL To climb
response = session.get(url, headers=headers)
response.html.render() # Anti-reptilian technique ( Simulate human-computer interaction )
print(response)
response.encoding = 'gb2312'
print(response.text)
print("URL=", url)
banks = response.html.find('.interval01-list-cars div p a') # This is the path on the web page
print(banks)
current_dir = os.path.abspath('.')
file_name = os.path.join(current_dir, "data\\entity.csv")
# bank = banks[0]
# bank.default_decoding = 'utf-8'
# print(type(bank.text))
# bank = bank.text.encode('utf-8')
# print(bank)
# exit(0)
with open(file_name, 'wt', newline='', encoding='gb2312') as csvfile1:
for bank in banks:
print(bank.text)
writer = csv.writer(csvfile1) # It's important to build this object well !
# Back to the list
bk = bank.text # Get the content of the brand
# Reverse search : Vehicle system
print(bk)
save2csv(writer, bk)
csvfile1.flush()
print('over!')
版权声明
本文为[Cooking code King]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220601196936.html
边栏推荐
- ROS series (IV): ROS communication mechanism series (3): parameter server
- mui. Plusready does not take effect
- Detailed explanation on the use of annotation tool via (VGg image annotator) in mask RCNN
- The fourth operation
- Now is the best time to empower industrial visual inspection with AI
- Design and implementation of redis (6): how redis achieves high availability
- you need to be root to perform this command
- Alphafpld upgrade alphafold multimer
- ROS series (III): introduction to ROS architecture
- Cmake qmake simple knowledge
猜你喜欢

Applet - canvas drawing Poster

Romantic silhouette of L2-3 of 2022 group programming ladder Simulation Competition (25 points)

Visual programming -- how to customize the mouse cursor

对象和类的概念

Redis(17) -- Redis缓存相关问题解决

Learn about I / O flow and file operations

Vscode download and installation + running C language

(valid for personal testing) compilation guide of paddedetection on Jetson

L3-011 直捣黄龙 (30 分)

Let matlab2018b support the mex configuration of vs2019
随机推荐
将编译安装的mysql加入PATH环境变量
Use of rotary selector wheelpicker
Definition, understanding and calculation of significant figures in numerical analysis
Identificateur, mot - clé, type de données
Three types of cyclic structure
PyMOL usage
Using VBA interval to extract one column from another in Excel
Definition format of array
標識符、關鍵字、數據類型
Activity supports multi window display
Create virtual machine
Basic knowledge of convolutional neural network
vscode删除卸载残余
标识符、关键字、数据类型
Software testing process
Openvino only supports Intel CPUs of generation 6 and above
淺學一下I/O流和File類文件操作
VS Studio 修改C语言scanf等报错
Visual programming - Experiment 1
The principle and solution of not allowing pasting in an English Network