当前位置:网站首页>爬虫实例:爬取中国大学排名
爬虫实例:爬取中国大学排名
2022-04-21 13:52:00 【ddy-ddy】
步骤:
1.用requests库爬取网页信息
2.用BeautifulSoup处理html关键数据,放入列表中
3.格式化打印信息
import requests
import bs4
from bs4 import BeautifulSoup
def getData(url): #从网页获取信息
try:
r=requests.get(url,timeout=30)
r.raise_for_status()
r.encoding=r.apparent_encoding
return r.text
except:
print("爬取失败")
def settleData(html,datalist): #处理html关键数据,放入列表
soup=BeautifulSoup(html,'html.parser')
for tr in soup.find('tbody').children: #遍历tbody标签的所有儿子
if isinstance(tr,bs4.element.Tag):
tds=tr('td')
datalist.append([tds[0].string,tds[1].string,tds[2].string,tds[3].string])
def printData(datalist,num): #打印信息
tplt = "{0:{4}^10}\t{1:{4}^10}\t{2:{4}^10}\t{3:^10}"
print(tplt.format("排名","学校名称","学校地址","总分",chr(12288))) #为了对齐,使用中文字符填充chr(12288)
for i in range(num):
u=datalist[i]
print(tplt.format(u[0],u[1],u[2],u[3],chr(12288)))
def main():
uinfo=[]
url='http://www.zuihaodaxue.cn/zuihaodaxuepaiming2019.html'
html=getData(url)
settleData(html,uinfo)
printData(uinfo,30)
def search(string): #搜索函数
uinfo = []
url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2019.html'
html = getData(url)
settleData(html, uinfo)
for i in range(len(uinfo)):
u=uinfo[i]
if u[1]==string:
print(u[0],u[1])
main()
search("江西师范大学")
效果如下

版权声明
本文为[ddy-ddy]所创,转载请带上原文链接,感谢
https://blog.csdn.net/weixin_45314989/article/details/104396936
边栏推荐
- 山东大学项目实训树莓派提升计划二期(七)对象和类
- Impala common commands (continuous updating)
- Zabbix5 series - sound alarm, mail alarm (XIV)
- < 2021SC@SDUSC > Application and practice of software engineering in Shandong University jpress code analysis (8)
- Deep analysis of JVM bytecode file structure
- <2021SC@SDUSC>山东大学软件工程应用与实践JPress代码分析(六)
- <2021SC@SDUSC>山东大学软件工程应用与实践JPress代码分析(十四)
- Zabbix5系列-声音告警、邮件告警 (十四)
- Oracle数据库管理
- 数据库基础篇
猜你喜欢

<2021SC@SDUSC>山东大学软件工程应用与实践JPress代码分析(十一)

通过浏览器控制台解除监听的方法复制百度文库的内容

2021-08-16记一次无意发现正方教务系统的bug

小案例的实现

优先级队列 (堆)常用接口介绍 堆的存储 堆的创建

山东大学项目实训树莓派提升计划二期(三)SSH远程连接

stm32笔记

Shandong University project training raspberry pie promotion plan phase II (VI) condition judgment and cycle

iscsi

<2021SC@SDUSC>山东大学软件工程应用与实践JPress代码分析(十二)
随机推荐
stm32的内存分布
《商用密码应用与安全性评估》第三章 商用密码标准与产品应用-小结
Chapter IV key points for implementation of password application security assessment in commercial password application and security assessment - Summary
Chapter III commercial password standards and product applications of commercial password application and security evaluation - Summary
socket做的简单网络嗅探器
大学生职业发展与就业指导 中国大学mooc 福州大学 测验题目和答案
iscsi
Analysis of MySQL connection query cost and cost statistics
Chapter II commercial password application and security evaluation policies and regulations on commercial password application and security evaluation - Summary of deleted version
应急响应笔记
networkx计算边的重要性:边介数或者中介中心性edge_betweenness
剑指office-割线子
Detailed explanation of JVM memory allocation mechanism
< 2021SC@SDUSC > Application and practice of software engineering in Shandong University jpress code analysis (5)
<2021SC@SDUSC>山东大学软件工程应用与实践JPress代码分析(四)
Word2vec and node2vec notes (updating)
vite. Config configuration file
CAS and atomic atomic operation classes for concurrent programming
ftp服务
【leetcode】144. Preorder traversal of binary tree