Explore scraping with BeautifulSoup!

Last update: Oct 05, 2022

Related tags

Overview

beautifulsoup-scrape

Explore scraping with BeautifulSoup!

Part One: Start from Shakespeare

As my professor is a poet (yes, and he teaches me data and database), he loves to give us assignments related to literature.

The start project with BeautifulSoup is scraping the first act of William Shakespeare's The Tempest.

My notebook is shakespeare-scrape.ipynb.

The code includes:

cook a soup doc, or download the html text from a webpage
search certain element like dic/p/ul, or certain attribute like class
locate certain element by .parent or .find_next_sibling()

Part Two: Develop with Supreme Court Decisions

In this case, I scrape the 2020 Supreme Court Decisions.

The notebook is guardian-and-supreme-court.ipynb.

The code includes:

use for loop to print each element in a list
find the link hidden in the attribute
save the output in a list of lists, even a three-deck list

Part Three: More practice with The Guardian

The webpage I scrape is the Best Non-Fiction Books of All Time listed by The Guardian.

The notebook is the same for Part Two!

You will find a surprise if you get the soup doc of that website. Yes! An advertisement hidden in the html!

The code is similar to the last project, but there is more:

list comprehension
list of liiiissssst

Bonus: More Real Shakespeare

In this case, I try to pull out the first 100 lines of Twelfth Night, available here.

The notebook is the same for Part Two!

It's indeed that my professor loves Shakespeare.

I had trouble with this project for a long time because it required each line to contain:

a code for act.scene.line along with whether is the stage direction
the speaker or the last person who spoke prior to the stage direction
a line or stage direction

I figured it out in a very complex way and I believe there is a better way to do it!

Explore scraping with BeautifulSoup!

Related tags

Overview

beautifulsoup-scrape

Part One: Start from Shakespeare

Part Two: Develop with Supreme Court Decisions

Part Three: More practice with The Guardian

Bonus: More Real Shakespeare

Owner

Chuqin

Discord webhook spammer with proxy support and proxy scraper

for those who dont want to pay $10/month for high school game footage with ads

Create crawler get some new products with maximum discount in banimode website

Anonymously scrapes onlinesim.ru for new usable phone numbers.

抢京东茅台脚本，定时自动触发，自动预约，自动停止

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

An Web Scraping API for MDL(My Drama List) for Python.

河南工业大学完美校园自动校外打卡

a way to scrape a database of all of the isef projects

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

A Powerful Spider(Web Crawler) System in Python.

热搜榜-python爬虫+正则re+beautifulsoup+xpath

A social networking service scraper in Python

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

Find papers by keywords and venues. Then download it automatically

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Explore scraping with BeautifulSoup!

Related tags

Overview

beautifulsoup-scrape

Part One: Start from Shakespeare

Part Two: Develop with Supreme Court Decisions

Part Three: More practice with The Guardian

Bonus: More Real Shakespeare

Owner

Chuqin

Discord webhook spammer with proxy support and proxy scraper

for those who dont want to pay $10/month for high school game footage with ads

Create crawler get some new products with maximum discount in banimode website

Anonymously scrapes onlinesim.ru for new usable phone numbers.

抢京东茅台脚本，定时自动触发，自动预约，自动停止

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

An Web Scraping API for MDL(My Drama List) for Python.

河南工业大学 完美校园 自动校外打卡

a way to scrape a database of all of the isef projects

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

A Powerful Spider(Web Crawler) System in Python.

热搜榜-python爬虫+正则re+beautifulsoup+xpath

A social networking service scraper in Python

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

Find papers by keywords and venues. Then download it automatically

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

河南工业大学完美校园自动校外打卡