A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
An Anime Theme Telegram group management bot. With lot of features.

Emilia Project Emilia-Prjkt is a modular bot running on python3 with anime theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is

ZenitsuID #M•R•T™ 3 Feb 03, 2022
The Social-Engineer Toolkit (SET) is specifically designed to perform advanced attacks against the human element.

The Social-Engineer Toolkit (SET) The Social-Engineer Toolkit (SET) is specifically designed to perform advanced attacks against the human element. SE

Professor 6 Nov 28, 2022
A discord bot thet lets you play Space invaders.

space_Invaders A discord bot thet lets you play Space invaders. It is my first discord bot... so please give any suggestions to improve it :] Commands

2 Dec 30, 2021
Tools untuk cek nomor rekening, terhadap penipuan yang sudah terjadi!

No Rekening Checker Selalu waspada terhadap penipuan! Sebelum anda transfer sejumlah uang alangkah baiknya untuk cek terlebih dahulu, apakah norek itu

Hanif Ahmad Syauqi 8 Dec 25, 2022
Torrent-Igruha SDK Python

Простой пример использования библиотеки: Устанавливаем библиотеку python -m

LORD_CODE 2 Jun 25, 2022
NFT which pays royalties to its creator each time it is sold.

Chialisp NFT with Perpetual Creator Royalties This is a chialisp NFT in which the creator/minter defines a puzzle hash which will capture a fixed perc

Geoff Walmsley 20 Jun 28, 2022
This is a tutorial on how to make a Discord Bot using the discord.py library

HowToMakeADiscordBot This Github repository is here to help you code a Discord Bot using the discord.py library! 1 - Setup: Download the code inside t

Baz 1 Oct 31, 2021
This is Source Code of PdiskUploaderBot

PdiskUploaderBot This is the source code of PdiskUploaderBot. And the developer of this bot is AJTimePyro, His Telegram Channel & Group. You can use t

Abhijeet 8 Oct 20, 2022
Discord bot that shows valorant your daily store by using the Ingame API

Valorant store checker - Discord Bot Discord bot that shows valorant your daily store by using the Ingame API. written using Python and the Pycord lib

STACIA 226 Jan 02, 2023
Automatically Forward files from groups to channel & FSub

Backup & ForceSub Automatically Forward files from groups to channel & Do force sub on members Variables API_ID : Get from my.telegram.org API_HASH :

Arunkumar Shibu 7 Nov 06, 2022
GET-ACQ is a python tool used to gather all companies acquired by a given company domain name.

get-acq 🏢 GET-ACQ is a python tool used to gather all companies acquired by a given company domain name. It is done by calling SecurityTrails API. Us

Milan 7 Dec 19, 2022
alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API.

alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API. It allows rapid trading algo development easily, with support for both REST and streaming data interfaces

Alpaca 1.5k Jan 09, 2023
A powerful, cool and well-made userbot for your Telegram profile with promising extension capabilities.

Telecharm userbot A powerful, fast and simple Telegram userbot written in Python 3 and based on Pyrogram 1.X. Currently in active WIP state, so feel f

Daniil Kovalenko 16 Dec 01, 2022
Python wrapper for Wikipedia

Wikipedia API Wikipedia-API is easy to use Python wrapper for Wikipedias' API. It supports extracting texts, sections, links, categories, translations

Martin Majlis 369 Dec 30, 2022
Telegram Bot to save Posts or Files that can be Accessed via Special Links

OKAERI-FILE Bot Telegram untuk menyimpan Posting atau File yang dapat Diakses melalui Link Khusus. Jika Anda memerlukan tambahan module lagi dalam rep

Wahyusaputra 5 Aug 04, 2022
Revolt.py - An async library to interact with the https://revolt.chat api.

Revolt.py An async library to interact with the https://revolt.chat api. This library will be focused on making bots and i will not implement anything

Zomatree 0 Oct 08, 2022
M3U Playlist for free TV channels

Free TV This is an M3U playlist for free TV channels around the World. Either free locally (over the air): Or free on the Internet: Plex TV Pluto TV P

Free TV 964 Jan 08, 2023
Python script for download course from platzi.com

Platzi Downloader Tool Esta es una pequeña herramienta que hace mucho y que te ahorra una gran cantidad de trabajo a la hora de descargar cursos de Pl

Devil64-Dev 21 Sep 22, 2022
A Simple Telegram Bot To Download And Upload Files

AquaDLBot ➠ I Can Download And Upload files To Telegram DEMO Copyright (C) 2020-2026 by [ema

Asia Argento 8 Feb 15, 2022
Search all history of Chrome in terminal

Chrotry Search all history of Chrome in terminal. Demo Usages Move the Chrome history file to current directory by running move_history.sh Rename hist

Xiaoxu HU 2 Jun 13, 2022