A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Snowflake database loading utility with Scrapy integration

    Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

    Oleg T. 0 Dec 06, 2021
    A Web Scraping Program.

    Web Scraping AUTHOR: Saurabh G. MTech Information Security, IIT Jammu. If you find this repository useful. I would appreciate if you Star it and Fork

    Saurabh G. 2 Dec 14, 2022
    A way to scrape sports streams for use with Jellyfin.

    Sportyfin Description Stream sports events straight from your Jellyfin server. Sportyfin allows users to scrape for live streamed events and watch str

    axelmierczuk 38 Nov 05, 2022
    Scraping and visualising India's real-time COVID-19 data from the MOHFW dataset.

    COVID19-WEB-SCRAPER Open Source Tech Lab - Project [SEMESTER IV] OSTL Assignments OSTL Assignments - 1 OSTL Assignments - 2 Project COVID19 India Data

    AMEY THAKUR 8 Apr 28, 2022
    Displays market info for the LUNI token on the Terra Blockchain

    LuniBot for Discord Displays market info for the LUNI/LUNA token on the Terra Blockchain (Webscrape method currently scraping CoinMarketCap). Will evo

    0 Jan 22, 2022
    Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

    Game Scraper Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms. Join the discord About The Proj

    KursK 2 Mar 28, 2022
    New World Market Scraper

    Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

    4 Sep 21, 2022
    🕷 Phone Crawler with multi-thread functionality

    Phone Crawler: Phone Crawler with multi-thread functionality Disclaimer: I'm not responsible for any illegal/misuse actions, this program was made for

    Kmuv1t 3 Feb 10, 2022
    A Very simple free proxy list scraper.

    Scrappp A Very simple free proxy list scraper, made in python The tool scrape proxy from diffrent sites and api's. Screenshots About the script !!! RE

    Joji aka Moncef 12 Oct 27, 2022
    A simple python web scraper.

    Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

    11 May 06, 2022
    Simply scrape / download all the media from an fansly account.

    Simply scrape / download all the media from an fansly account. Providing updates as long as its continuously gaining popularity, so hit the ⭐ button!

    Mika C. 334 Jan 01, 2023
    Python scraper to check for earlier appointments in Clalit Health Services

    clalit-appt-checker Python scraper to check for earlier appointments in Clalit Health Services Some background If you ever needed to schedule a doctor

    Dekel 16 Sep 17, 2022
    Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

    Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

    68 Oct 08, 2022
    Google Maps crawler using Selenium

    Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

    Guilherme Latrova 46 Dec 16, 2022
    Linkedin webscraping - Linkedin web scraping with python

    linkedin_webscraping This is the first step of a full project called "LinkedIn J

    Pedro Dib 4 Apr 24, 2022
    A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

    🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine

    DatNgo 32 Dec 31, 2022
    Web Scraping Instagram photos with Selenium by only using a hashtag.

    Web-Scraping-Instagram This project is used to automatically obtain images by web scraping Instagram with Selenium in Python. The required input will

    Sandro Agama 3 Nov 24, 2022
    Html Content / Article Extractor, web scrapping lib in Python

    Python-Goose - Article Extractor Intro Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a

    Xavier Grangier 3.8k Jan 02, 2023
    Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

    Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

    Scrapy project 859 Dec 29, 2022
    Grab the changelog from releases on Github

    release-notes-scraper This simple script can be used to grab the release notes for projects from github that do not keep a CHANGELOG, but publish thei

    Dan Čermák 4 Apr 01, 2022