当前位置:网站首页>2021-09-03 crawler template (only static pages are supported)
2021-09-03 crawler template (only static pages are supported)
2022-04-23 03:40:00 【Cooking code King】
# -*- coding: utf-8 -*-
# @Time : 2021/9/3 21:32
# @Author : Yj Xue
# @FileName: entity_car.py
# @Software: PyCharm 2020.2.2 x64
# @Blog :https://blog.csdn.net/qq_37150711/category_9396602.html
from requests_html import HTMLSession
from requests_html import HTML
import requests
import time
import json
import random
import sys
import os
import csv
from fake_useragent import UserAgent
session = HTMLSession()
url = 'https://car.autohome.com.cn/price/series-4392.html'
#https://car.autohome.com.cn/config/series/3862.html
USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
"Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10"
]
# write in CSV file
def save2csv(writer,bank):
header = ['bank','num' 'url']
#writer.writerow(header)
csvrow1 = []
csvrow1.append(bank)
writer.writerow(csvrow1)
headers = {
"User-Agent": random.choice(USER_AGENTS)} # Here is a key anti crawler strategy !
session = HTMLSession()
# one by one URL To climb
response = session.get(url, headers=headers)
response.html.render() # Anti-reptilian technique ( Simulate human-computer interaction )
print(response)
response.encoding = 'gb2312'
print(response.text)
print("URL=", url)
banks = response.html.find('.interval01-list-cars div p a') # This is the path on the web page
print(banks)
current_dir = os.path.abspath('.')
file_name = os.path.join(current_dir, "data\\entity.csv")
# bank = banks[0]
# bank.default_decoding = 'utf-8'
# print(type(bank.text))
# bank = bank.text.encode('utf-8')
# print(bank)
# exit(0)
with open(file_name, 'wt', newline='', encoding='gb2312') as csvfile1:
for bank in banks:
print(bank.text)
writer = csv.writer(csvfile1) # It's important to build this object well !
# Back to the list
bk = bank.text # Get the content of the brand
# Reverse search : Vehicle system
print(bk)
save2csv(writer, bk)
csvfile1.flush()
print('over!')
版权声明
本文为[Cooking code King]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220601196936.html
边栏推荐
- Numpy's broadcasting mechanism (with examples)
- Common exceptions
- PyMOL usage
- ROS series (IV): ROS communication mechanism series (6): parameter server operation
- VS Studio 修改C語言scanf等報錯
- Why is it necessary to divide the variance by 255^2 when adding Gaussian noise using the imnoise function of MATLAB
- What if win10 doesn't have a local group policy?
- Digital image processing third edition Gonzalez notes Chapter 2
- Cmake qmake simple knowledge
- ROS series (II): ROS quick experience, taking HelloWorld program as an example
猜你喜欢
Paddlepaddle does not support arm64 architecture.
Instructions for fastmock
Let matlab2018b support the mex configuration of vs2019
将编译安装的mysql加入PATH环境变量
Seekbar custom style details
Unity games and related interview questions
Test questions (2)
Redis (17) -- redis cache related problem solving
Detailed explanation on the use of annotation tool via (VGg image annotator) in mask RCNN
Download and configuration of idea
随机推荐
2022 团体程序设计天梯赛 模拟赛 L1-7 矩阵列平移 (20 分)
Concepts of objects and classes
What to pay attention to when writing the first code
Identificateur, mot - clé, type de données
Jupiter notebook modify configuration file setting startup directory is invalid
Three column layout (fixed width on both sides in the middle and fixed width on both sides in the middle)
2022 团体程序设计天梯赛 模拟赛 L2-3 浪漫侧影 (25 分)
Design and implementation of redis (4): what is the event driver of redis
A sword is a sword. There is no difference between a wooden sword and a copper sword
Punch in: 4.23 C language chapter - (1) first knowledge of C language - (12) structure
ROS series (IV): ROS communication mechanism series (3): parameter server
【微服务】(十)—— 统一网关Gateway
Redis (17) -- redis cache related problem solving
Commonly used classes
ROS series (IV): ROS communication mechanism series (1): topic communication
The super large image labels in remote sensing data set are cut into specified sizes and saved into coco data set - target detection
2022 group programming ladder simulation match 1-8 are prime numbers (20 points)
2022 group programming ladder game simulation L2-4 Zhezhi game (25 points)
VS Studio 修改C语言scanf等报错
Visual programming -- how to customize the mouse cursor