A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    This is my CS 20 final assesment.

    eeeeeSpider This is my CS 20 final assesment. How to use: Open program Run to your hearts content! There are no external dependancies that you will ha

    1 Jan 17, 2022
    API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

    NameMC Scrape API This is an api to scrape NameMC using message previews generated by discord. NameMC makes it a pain to scrape their website, but som

    Twilak 2 Dec 22, 2021
    Pelican plugin that adds site search capability

    Search: A Plugin for Pelican This plugin generates an index for searching content on a Pelican-powered site. Why would you want this? Static sites are

    22 Nov 21, 2022
    Discord webhook spammer with proxy support and proxy scraper

    Discord webhook spammer with proxy support and proxy scraper

    3 Feb 27, 2022
    A simple proxy scraper that utilizes the requests module in python.

    Proxy Scraper A simple proxy scraper that utilizes the requests module in python. Usage Depending on your python installation your commands may vary.

    3 Sep 08, 2021
    A high-level distributed crawling framework.

    Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

    Xuye (Chris) Qin 1.5k Dec 24, 2022
    Get paper names from dblp.org

    scraper-dblp Get paper names from dblp.org and store them in a .txt file Useful for a related literature :) Install libraries pip3 install -r requirem

    Daisy Lab 1 Dec 07, 2021
    TikTok Username Swapper/Claimer/etc

    TikTok-Turbo TikTok Username Swapper/Claimer/etc I wanted to create it as fast as possible but i eventually gave up and recoded it many many many many

    Kevin 12 Dec 19, 2022
    Example of scraping a paginated API endpoint and dumping the data into a DB

    Provider API Scraper Example Example of scraping a paginated API endpoint and dumping the data into a DB. Pre-requisits Python = 3.9 Pipenv Setup # i

    Alex Skobelev 1 Oct 20, 2021
    Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

    Twitter Turbo / Auto Claimer / Swapper Version: 1.0 Last Update: 01/26/2022 Use this at your own descretion. I've only used this on test accounts and

    Underscores 6 May 02, 2022
    对于有验证码的站点爆破,用于安全合法测试

    使用方法 python3 main.py + 配置好的文件 python3 main.py Verify.json python3 main.py NoVerify.json 以上分别对应有验证码的demo和无验证码的demo Tips: 你可以以域名作为配置文件名字加载:python3 main

    47 Nov 09, 2022
    This is a web scraper, using Python framework Scrapy, built to extract data from the Deals of the Day section on Mercado Livre website.

    Deals of the Day This is a web scraper, using the Python framework Scrapy, built to extract data such as price and product name from the Deals of the

    David Souza 1 Jan 12, 2022
    Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

    SpaceX Sofware I developed software to scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info to use the software you need Python a

    Maxence Rémy 16 Aug 02, 2022
    A tool can scrape product in aliexpress: Title, Price, and URL Product.

    Scrape-Product-Aliexpress A tool can scrape product in aliexpress: Title, Price, and URL Product. Usage: 1. Install Python 3.8 3.9 padahal halaman ins

    Rahul Joshua Damanik 1 Dec 30, 2021
    Telegram Group Scrapper

    this programe is make your work so much easy on telegrame. do you want to send messages on everyone to your group or others group. use this script it will do your work automatically with one click. a

    HackArrOw 3 Dec 03, 2022
    京东云无线宝积分推送,支持查看多设备积分使用情况

    JDRouterPush 项目简介 本项目调用京东云无线宝API,可每天定时推送积分收益情况,帮助你更好的观察主要信息 更新日志 2021-03-02: 查询绑定的京东账户 通知排版优化 脚本检测更新 支持Server酱Turbo版 2021-02-25: 实现多设备查询 查询今

    雷疯 199 Dec 12, 2022
    Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

    Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi

    David Rusho 0 Aug 18, 2021
    Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

    Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

    Hanh Pham Van 0 Jan 06, 2022
    A Scrapper with python

    Scrapper-en-python Scrapper des données signifie récuperer des données pour les traiter ou les analyser. En python, il y'a 2 grands moyens de scrapper

    Lun4rIum 1 Dec 05, 2021
    Python scrapper scrapping torrent website and download new movies Automatically.

    torrent-scrapper Python scrapper scrapping torrent website and download new movies Automatically. If you like it Put a ⭐ on this repo 😇 Run this git

    Fazil vk 1 Jan 08, 2022