A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Last update: Mar 28, 2022

Overview

New to Streaming Scraper

An in-progress web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://charlesdungy.github.io/new-to-streaming-scraper/

Future stage: Complete documentation, comments.

Description

Data are retrieved from two different data sources: What's on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

MySQL

Current Directory Tree

License

MIT

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Related tags

Overview

New to Streaming Scraper

Description

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

License

Owner

Charles Dungy

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye

Facebook Group Scraping Using Beautiful Soup & Selenium

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Libextract: extract data from websites

crypto currency scraping

对于有验证码的站点爆破，用于安全合法测试

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

SmartScraper: 简单、自动、快捷的Python网络爬虫

TikTok Username Swapper/Claimer/etc

NASA APOD Discord Bot - Fetches information from NASA APOD site.

An Web Scraping API for MDL(My Drama List) for Python.

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

京东云无线宝积分推送，支持查看多设备积分使用情况

This repo has the source code for the crawler and data crawled from auto-data.net

京东茅台抢购

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

A Telegram crawler to search groups and channels automatically and collect any type of data from them.

Example of scraping a paginated API endpoint and dumping the data into a DB

腾讯课堂，模拟登陆，获取课程信息，视频下载，视频解密。