当前位置:网站首页>Climb the actor's name and add a link
Climb the actor's name and add a link
2022-04-22 04:42:00 【tonyaqiqi】
import re
import requests
import json
import pandas
import os
import sys
from bs4 import BeautifulSoup
# Get request
def getHTMLText(url,kv):
try:
r = requests.get(url, headers=kv)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except Exception as e:
print(e)
# Analyze the actor's name and link data and store them in the file
def parserData(text):
soup = BeautifulSoup(text,'lxml')
review_list = soup.find_all('li',{
'class':'pages'})
soup1 = BeautifulSoup(str(review_list),'lxml')
all_dts = soup1.find_all('dt')
stars = []
i=0
for dt in all_dts:
star = {
}
try:
print(dt.find('a').text)
star["name"] = dt.find('a').text
star["link"] = 'https://baike.baidu.com' + dt.find('a').get('href')
stars.append(star)
except Exception as e:
continue
i+=1
print(i)
json_data = json.loads(str(stars).replace("\'","\""))
with open('zhifou.json','w',encoding='UTF-8') as f:
json.dump(json_data,f,ensure_ascii=False)
if __name__ == '__main__':
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
}
url = 'https://baike.baidu.com/item/%E7%9F%A5%E5%90%A6%E7%9F%A5%E5%90%A6%E5%BA%94%E6%98%AF%E7%BB%BF%E8%82%A5%E7%BA%A2%E7%98%A6/20485668?fr=aladdin'
text=getHTMLText(url, headers)
parserData(text)
print(" All information crawling completed !")
版权声明
本文为[tonyaqiqi]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220437167164.html
边栏推荐
猜你喜欢

CommDGI: Community detection oriented deep graph infomax 2020 CIKM

安装opencv时遇到的报错

Solve the problem of Chinese garbled code in idea (garbled code in configuration file)

2022T电梯修理考试练习题及在线模拟考试

Unity simple UI prefix tree red dot system

Mui pop up menu

How much do you know about the testing methods of software testing?

2022A特种设备相关管理(电梯)考试题模拟考试题库模拟考试平台操作

DNS domain name system - directory service of the Internet

Vue project NPM run build when packaging the project, time stamp the version number of CSS and JS files to prevent the browser from caching
随机推荐
Mui pop up menu
C language learning record -- use and analysis of string function (1)
[experience] Why does the IP address of HP printer start with 169.254
10. Libevent receives and processes server messages
LeetCode 剑指 Offer 18. 删除链表的节点
5_ Data analysis - Data Visualization
[concurrent programming 046] for the synchronization method, how does the processor realize atomic operation?
Unity simple UI prefix tree red dot system
Queue summary (Part I)
H7-tool releases firmware v2 15. For offline recording, the full series SPI flash of Renesas, Hetai and is25wp are added (2022-04-14)
【S32K3_STM&PIT_MCAL】
安装opencv时遇到的报错
rpc error: code = Unavailable desc = error reading from server: EOF
Matlab曲线的颜色、线型等参数设置方法
Leetcode sword finger offer 51 Reverse order pair in array***
13. Bufferevent receives and sends data
2022年A特种设备相关管理(电梯)复训题库及答案
2022P气瓶充装考试题库及模拟考试
jsp hello world中文乱码
Es next related