A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
A python library created to make life easier for Telegram API Developers.

opentele A python library created to make life easier for Telegram API Developers. Read the documentation Features Convert Telegram Desktop tdata sess

103 Jan 02, 2023
Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail especificado.

getWIFIConnection Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail e

Vinícius Azevedo 3 Nov 27, 2022
Powerful Ethereum Smart-Contract Toolkit

Heimdall Heimdall is an advanced and modular smart-contract toolkit which aims to make dealing with smart contracts on EVM based chains easier. Instal

Jonathan Becker 69 Dec 26, 2022
Exports saved posts and comments on Reddit to a csv file.

reddit-saved-to-csv Exports saved posts and comments on Reddit to a csv file. Columns: ID, Name, Subreddit, Type, URL, NoSFW ID: Starts from 1 and inc

70 Jan 02, 2023
A taskbar clock for secondary taskbars on Windows 11

ElevenClock A taskbar clock for secondary taskbars on Windows 11. When microsoft's engineers were creating Windows 11, they forgot to add a clock on t

Martí Climent 1.7k Jan 07, 2023
Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

Alexandr Savinov 326 Jan 03, 2023
Telegram Bot Repo Capable of fetching the following Info via Anilist API inspired from AniFluid and Nepgear

Telegram Bot Repo Capable of fetching the following Info via Anilist API inspired from AniFluid and Nepgear Anime Airing Manga Character Scheduled Top

Rikka-Chan 2 Apr 01, 2022
Auto file forward bot with python

Auto-File-Forward-Bot Auto file forward bot. Without Admin Permission in FROM_CHANNEL Only Give Permission In your Telegram Personal Channel Please fo

Milas 1 Oct 15, 2021
Hack WhatsApp Account Easily(Android)

X-Whatsapp Hack WhatsApp Account Easily(Android) HOW TO RUN 👇 (Termux) pkg update && pkg upgrade pkg install python pkg install git git clone https:/

KiLL3R_xRO 72 Dec 21, 2022
Yet another discord-BOT

Note I have not added comments to the initial code as it is for my educational purpose. Use This is the code for a discord-BOT API py-cord-2.0.0a4178+

IRONMELTS 1 Dec 18, 2021
A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python.

disfork A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. Key Features Modern Pythonic API using async a

2 Feb 09, 2022
TuShare is a utility for crawling historical data of China stocks

TuShare Tushare Pro版已发布,请访问新的官网了解和查询数据接口! https://tushare.pro TuShare是实现对股票/期货等金融数据从数据采集、清洗加工 到 数据存储过程的工具,满足金融量化分析师和学习数据分析的人在数据获取方面的需求,它的特点是数据覆盖范围广,接口

挖地兔 11.9k Dec 30, 2022
A bot created with Python that interacts with GroupMe

GroupMe_Bot This is a bot I'm working on a small groupme group I'm in. This is something I'll work on in my spare time. Nothing but just a fun little

0 May 19, 2022
SongLink Discord Bot - Discord bot to share music links easily

SongLink_Discord_Bot Discord bot to share music links easily. Take a link from y

Edgar Lefevre 4 Feb 18, 2022
A very tiny python api for the stock exchange tradegate.de

pytradegate A very tiny python api for the stock exchange tradegate.de The api provides the recent ask/bid data and all other data as found on the det

dunderstr aka seimen 7 Aug 24, 2022
Extrait les informations contenues dans le code QR de la preuve de vaccination générée par le gouvernement du Québec

DecodeurPreuveVaccinationQC Extrait les informations contenues dans le code QR de la preuve de vaccination générée par le gouvernement du Québec Utili

Guillaume Morissette 8 Jul 26, 2022
This tool is created by Shahzain and is one of the best self bots out there!

Shahzain SelfBot This tool is created by Shahzain and is one of the best self bots out there! Features Token Destroyer! Server Nuker(50-100 Bans Per S

Shahzain 6 Apr 02, 2022
This is Pdisk Upload Bot made using Python with Pyrogram Framework. Its capable of uploading direct download link with thumbnail or without thumbnail & with Title Support.

Pdisk-Upload-Bot Introduction This Is PDisk Upload Bot Used To Upload Direct Link To Pdisk With Thumb Support Deploy Heroku Deploy Local Deploy pip in

HEIMAN PICTURES 32 Oct 21, 2022
Manage Proxmox KVM Virtual Machines via Slack bot.

proxmox-slack-bot Create KVM Virtual Machines on Proxmox, the easy way. Not much works works here yet... Setup dev environment Setup fully editable st

Plenus Pyramis 3 Mar 20, 2022
Autofill HZDR Zeitman entries

Zeitman_autofill Filling out Zeitman is boring. This script might make some of the pain go away. Requirements The selenium package and Chrome webdrive

Tim Callow 8 Mar 14, 2022