Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Last update: Jan 04, 2022

Related tags

Web Crawling Web-scraping

Overview

Extract Data from the IRS website A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

How to run the script? This script runs on Python 3.8. Install the libraries on requirements.txt into a new environment, then run 'Script.py'.

What should I expect? The script will ask you for the form number(s) then scrap the IRS website. --> Please enter the complete tax form number separated by a comma followed by a space (not case sensitive): (ie. Form W-2, Form 1095-C, Form W-3, etc) --> Form W-2, Form 1095-C

Then the bot will ask if the user would like to download the forms. --> Would you like to download all related pdfs? (Y/N)

If selected, the bot will follow up by asking a year range. --> Please provide the year range by using a dash in between the years (starting year must be smaller than ending year): (ie. 2018-2020)

Once executed, the bot will automatically create a folder and download the relevant pdfs into the folder.

Finally, the results will be returned as a json string. If there are no results, the user will get a 'No results' instead.

Sample output: [ {'form_number': 'Form W-2', 'form_title': 'Wage and Tax Statement (Info Copy Only)', 'min_year': '1954', 'max_year': '2022'}, {'form_number': 'Form 1095-C', 'form_title': 'Employer-Provided Health Insurance Offer and Coverage', 'min_year': '2014', 'max_year': '2022'}, {'form_number': 'Form W-3', 'form_title': 'Transmittal of Wage and Tax Statements (Info Copy Only)', 'min_year': '1990', 'max_year': '2022'} ]

Note: To keep users engaged, the bot will display which task it is performing and what URL it is currently searching.

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Related tags

Overview

Owner

A Python web scraper to scrape latest posts from official Coinbase's Blog.

Python framework to scrape Pastebin pastes and analyze them

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

A web scraper for nomadlist.com, made to avoid website restrictions.

Web3 Pancakeswap Sniper bot written in python3

CreamySoup - a helper script for automated SourceMod plugin updates management.

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

The first public repository that provides free BUBT website scraping API script on Github.

A simple django-rest-framework api using web scraping

A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country

TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease.

A Pixiv web crawler module

Screen scraping and web crawling framework

Simple tool to scrape and download cross country ski timings and results from live.skidor.com

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Lovely Scrapper