A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Python script who crawl first shodan page and check DBLTEK vulnerability

    🐛 MASS DBLTEK EXPLOIT CHECKER USING SHODAN 🕸 Python script who crawl first shodan page and check DBLTEK vulnerability

    Divin 4 Jan 09, 2022
    A social networking service scraper in Python

    snscrape snscrape is a scraper for social networking services (SNS). It scrapes things like user profiles, hashtags, or searches and returns the disco

    2.4k Jan 01, 2023
    A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

    cybernews A package that provides you Latest Cyber/Hacker News from website using Web-Scraping. Latest Cyber/Hacker News Using Webscraping Developed b

    Hitesh Rana 4 Jun 02, 2022
    Nekopoi scraper using python3

    Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper from nekopoi import Hent

    MhankBarBar 9 Apr 06, 2022
    🤖 Threaded Scraper to get discord servers from disboard.org written in python3

    Disboard-Scraper Threaded Scraper to get discord servers from disboard.org written in python3. Setup. One thread / tag If you whant to look for multip

    Ѵιcнч 11 Nov 01, 2022
    12306抢票脚本

    12306抢票脚本

    罐子里的茶 457 Jan 05, 2023
    Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

    Instagram_scrapper This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or exce

    Lakhdar Belkharroubi 5 Oct 17, 2022
    Web scraper for Zillow

    Zillow-Scraper Instructions All terminal commands are highlighted. Make sure you first have python 3 installed. You can check this by running "python

    Ali Rastegar 1 Nov 23, 2021
    A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

    A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

    Alex Papadopoulos 1 Nov 13, 2021
    Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

    Pythonic Crawling / Scraping Framework Built on Eventlet Features High Speed WebCrawler built on Eventlet. Supports relational databases engines like

    Juan Manuel Garcia 173 Dec 05, 2022
    A scalable frontier for web crawlers

    Frontera Overview Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large sc

    Scrapinghub 1.2k Jan 02, 2023
    Scrap the 42 Intranet's elearning videos in a single click

    42intra_scraper Scrap the 42 Intranet's elearning videos in a single click. Why you would want to use it ? Adjust speed at your convenience. (The intr

    Noufel 5 Oct 27, 2022
    API to parse tibia.com content into python objects.

    Tibia.py An API to parse Tibia.com content into object oriented data. No fetching is done by this module, you must provide the html content. Features:

    Allan Galarza 25 Oct 31, 2022
    A universal package of scraper scripts for humans

    Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

    299 Dec 15, 2022
    This script is intended to crawl license information of repositories through the GitHub API.

    GithubLicenseCrawler This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.

    schutera 4 Oct 25, 2022
    Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

    wallstreetbets-tracker Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit.

    91 Dec 08, 2022
    An IpVanish Proxies Scraper

    EzProxies Tired of searching for good proxies for hours? Just get an IpVanish account and get thousands of good proxies in few seconds! Showcase Watch

    11 Nov 13, 2022
    WebScrapping Project - G1 Latest News

    Web Scrapping com Python Esse projeto consiste em um código para o usuário buscar as últimas nóticias sobre um termo qualquer, no site G1. Para esse p

    Eduardo Henrique 2 Feb 13, 2022
    A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

    Parallel web scraping The project is a training task for web scraping using python multithreading and a real-time-updated list of available proxy serv

    Kushal Shingote 1 Feb 10, 2022
    A way to scrape sports streams for use with Jellyfin.

    Sportyfin Description Stream sports events straight from your Jellyfin server. Sportyfin allows users to scrape for live streamed events and watch str

    axelmierczuk 38 Nov 05, 2022