This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Last update: Jan 10, 2022

Related tags

Web Crawling Website-Crawler-Python-

Overview

Website-Crawler-Python

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address. After getting the website address, it asks for how much crawling depth the user wants in between the number of links has been found after providing the website address.

Website Crawler takes 3 inputs:

A website address
Integer value for the crawling depth
A user specified regular expression to find user specific data

General tasks:

Find all the Nowgegian mobile numbers and saves into a text file.
Find all the sub-links inside the given website and saves into a text file.
Saves the website's raw HTML code into a text file.
Find all email addresses and save into a text file.
Find all the comments used in the website and saves it into a text file.
Find five most used words and print it into the terminal.

This is a Python based project and used some dependent libraries to execute the functionalities.

RegEx
Urllib3
BeautifulSoup 4
Counter in Collections

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Related tags

Overview

Website-Crawler-Python

Owner

Faisal Ahmed

A module for CME that spiders hashes across the domain with a given hash.

Download images from forum threads

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

jd_maotai rpa 基于selenium驱动的jd抢购rpa机器人

Dex-scrapper - Hobby project for scrapping dex data on VeChain

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

薅薅乐 - JD 测试脚本

Find papers by keywords and venues. Then download it automatically

Complete pipeline for crawling online newspaper article.

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

Simple proxy scraper made by using ProxyScrape's api.

A web scraper that exports your entire WhatsApp chat history.

Web Content Retrieval for Humans™

京东茅台抢购

Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

A Python library for automating interaction with websites.

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

Dictionary - Application focused on word search through web scraping

High available distributed ip proxy pool, powerd by Scrapy and Redis