当前位置:网站首页>创新实训(十二)爬虫
创新实训(十二)爬虫
2022-04-22 05:57:00 【散月】
爬虫入门
- requests+bs4
- selenium+geckodriver
- scrapy框架
Mac安装geckodriver
brew install geckodriver
设置配置文件~/.bash_profile文件
export path
import requests
from bs4 import BeautifulSoup
url = "https://b2c.csair.com/B2C40/newTrips/static/main/page/booking/index.html?t=S&c1=BJS&c2=SHA&d1=2021-04-20&at=1&ct=0&it=0&b1=PEK-PKX&b2=SHA-PVG"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
r = requests.get(url, headers=headers)
r.encoding = 'UTF-8'
soup = BeautifulSoup(r.text, "html.parser")
title = soup.find("title").text
print(title)
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'geckodriver')
url = "https://b2c.csair.com/B2C40/newTrips/static/main/page/booking/index.html?t=S&c1=BJS&c2=SHA&d1=2021-04-20&at=1&ct=0&it=0&b1=PEK-PKX&b2=SHA-PVG"
driver.get(url)
print(driver.page_source)
comment = driver.find_element_by_css_selector('div.zls-flplace')
content = comment.find_element_by_class_name('zls-flplace')
print(content.text)
版权声明
本文为[散月]所创,转载请带上原文链接,感谢
https://blog.csdn.net/qq_43480289/article/details/115362070
边栏推荐
- Pgdoucer best practices: a series
- Mysql 根据某一列的值 循环添加序号
- 数美科技与澎湃新闻联合发布《网络信息内容安全洞察报告》
- 使用MySQL/Tidb数据库的一些经验【缓慢更新中...】
- MySQL basics 2
- Using pgbackrest parallel archiving to solve wal stacking problem
- MYSQL 查看优化器后执行得SQL语句详情
- API慢接口分析
- Digital beauty technology won the "top ten intelligent risk control management innovation award" of the banker
- 数美科技社交行业未成年人保护解决方案:守护未成年人的“社交圈”
猜你喜欢

Pgdoucer best practices: Series 3

The difference between hash mode and history mode

Great! Kyushu cloud edge computing management platform has been certified by the national authority

Solve the problem of error in installing PostgreSQL under windows2012 R2

Iframe child parent pass parameter

Kyushu cloud edge MEP was selected as a typical product in the report of China edge cloud research

Async and await

数美科技未成年人保护解决方案重磅上线,开启未成年人网络护航新时代

Uncaught (in promise) NavigationDuplicated: Avoided redundant navigation to current location: “/?k=“

MySQL——索引
随机推荐
Flink理论基础
POM文件浅析
PostgreSQL使用clickhousedb_fdw访问ClickHouse
报错:In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column
一套sql语句同时支持Oracle跟Mysql?
Great! Kyushu cloud edge computing management platform has been certified by the national authority
MYSQL之高性能索引
MySQL中的锁与事务
There is a @ Kyushu cloud offer waiting for you
从零开始学安卓(kotlin)三——BaseActivity、ActivityCollector
A series of interpretations of the general data protection regulations (gdpr): how to judge whether offshore enterprises are under the jurisdiction of gdpr?
Shumei technology was honored as the "top 100 scientific and technological innovation of private enterprises in Beijing"
条形码生成及解码、二维码生成及解码
关于一段小程序的思考
Async and await
The digital risk control summit of digital America 2022 was opened, and the five highlights were exposed in advance
calendar.getActualMaximum(calendar.DAY_OF_MONTH)的坑点
MySQL Cluster Index
点击触发其他dom元素:< $refs,$el >
Introduction to postgreshub Chinese resource network