Scraping web pages to get data

Last update: Nov 01, 2021

Related tags

Web Crawling scrapingweb

Overview

Scraping Data

Get public data and save in database

This is project use

Python

How to run a project

1 - Clone the repository 2 - Install beautifulsoup4

pip3 install beautifulsoup4

IMPORTANT This project send data on Transfermarkt to database of Soccer API.

How to work the project

1 - The script index.py get data on transfermarkt and generate a CVS files for each season a player in repository ./csvFiles

2 - The script updateDB.py get all CSV files on ./csvFiles and send all data to Soccer API, after all detele all CSV files on ./csvFiles

Owner

Soccer Project

GitHub Repository

Python Web Scrapper Project

Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

2 Jan 04, 2022

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Parallel web scraping The project is a training task for web scraping using python multithreading and a real-time-updated list of available proxy serv

1 Feb 10, 2022

HappyScrapper - Google news web scrapper with python

HappyScrapper ~ Google news web scrapper INSTALLATION ♦ Clone the repository ♦ O

0 Nov 07, 2022

一个m3u8视频流下载脚本

一个Python的m3u8流视频下载脚本介绍 m3u8流视频日益常见，目前好用的下载器也有很多，我把之前自己写的一个小脚本分享出来，供广大网友使用。写此程序的目的在于给视频下载爱好者提供一个下载样例，可直接调用，勿再重复造轮子。使用方法在python中直接运行程序或进行外部调用 import

0 Oct 10, 2021

a high-performance, lightweight and human friendly serving engine for scrapy

30 Mar 01, 2022

This tool crawls a list of websites and download all PDF and office documents

This tool crawls a list of websites and download all PDF and office documents. Then it analyses the PDF documents and tries to detect accessibility issues.

7 Sep 30, 2022

A simple code to fetch comments below an Instagram post and save them to a csv file

fetch_comments A simple code to fetch comments below an Instagram post and save them to a csv file usage First you have to enter your username and pas

2 Jul 14, 2022

A web service for scanning media hosted by a Matrix media repository

Matrix Content Scanner A web service for scanning media hosted by a Matrix media repository Installation TODO Development In a virtual environment wit

5 Dec 01, 2022

河南工业大学完美校园自动校外打卡

HAUT-checkin 河南工业大学自动校外打卡由于github actions存在明显延迟，建议直接使用腾讯云函数特点多人打卡使用简单，仅需账号密码以及用于微信推送的uid 自动获取上一次打卡信息用于打卡向所有成员微信单独推送打卡状态完美校园服务器繁忙时造成打卡失败会自动重新打卡

36 Oct 27, 2022

A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

1.5k Jan 04, 2023

哔哩哔哩爬取器：以个人为中心

Open Bilibili Crawer 哔哩哔哩是一个信息非常丰富的社交平台，我们基于此构造社交网络。在该网络中，节点包括用户（up主），以及视频、专栏等创作产物；关系包括：用户之间，包括关注关系（following/follower），回复关系（评论区），转发关系（对视频or动态转发）；用户对创

3 Oct 21, 2021

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

onlyfans-scraper A command-line program to download media, like and unlike posts, and more from creators on OnlyFans. Installation You can install thi

185 Jul 23, 2022

🕷 Phone Crawler with multi-thread functionality

Phone Crawler: Phone Crawler with multi-thread functionality Disclaimer: I'm not responsible for any illegal/misuse actions, this program was made for

3 Feb 10, 2022

Download images from forum threads

Forum Image Scraper Downloads images from forum threads Only works with forums which doesn't require a login to view and have an incremental paginatio

9 Nov 16, 2022

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

1 Dec 30, 2021

Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

searchcve Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs. Generates a CSV file in the current directory. Uses the NI

32 Oct 10, 2022

让中国用户使用git从github下载的速度提高1000倍!

序言 github上有很多好项目,但是国内用户连github却非常的慢.每次都要用插件或者其他工具来解决. 这次自己做一个小工具,输入github原地址后,就可以自动替换为代理地址,方便大家更快速的下载. 安装 pip install cit 主要功能与用法主要功能 change 将目标地址转换为

35 Aug 29, 2022

This is a script that scrapes the longitude and latitude on food.grab.com

grab This is a script that scrapes the longitude and latitude for any restaurant in Manila on food.grab.com, location can be adjusted. Search Result p

0 Nov 22, 2021

An arxiv spider

An Arxiv Spider 做为一个cser，杰出男孩深知内核对连接到计算机上的硬件设备进行管理的高效方式是中断而不是轮询。每当小伙伴发来一篇刚挂在arxiv上的”热乎“好文章时，杰出男孩都会感叹道：”师兄这是每天都挂在arxiv上呀，跑的好快~“。于是杰出男孩找了找 github，借鉴了一下其

11 Sep 09, 2022

Web scraper for Zillow

Zillow-Scraper Instructions All terminal commands are highlighted. Make sure you first have python 3 installed. You can check this by running "python

1 Nov 23, 2021