Facebook Group Scraping Using Beautiful Soup & Selenium

Overview

Notes

  • The scraper should only be used for educational purposes
  • Kindly refrain from scraping sensitive or private information
  • It is highly recommended to scrape public (and not private) groups
  • Ask for consent from the group adminstrator and/or group members before running any code
  • I am not responsible for any misuse of the code in any shape or form

Facebook Group Scraping Using Beautiful Soup & Selenium

Extract Facebook group posts that are related to a specific topic and write them to a .json file. This project was created in order to gather data needed to build a chatbot for a university's website.

Input

  • User's Credentials
  • Facebook Group URL
  • Number of Scrolls
    • Number of posts you want to collect
  • Directory of the Chromedriver
  • Optional: Specific topic to be searched

What the Scraper Does

  • Logs into Facebook using the User's Credentials
  • Enters the group specified by the User
  • Searches for the topic
  • Extracts all posts & their comments

Scraper Output

.json file that includes:

  • Each post
  • The comments replying to it

Format of file:

{ 
   "tag": "Topic 1",
   "patterns":  [ "Post text" ],
   "responses": [ "Comment 1", 
        "Comment 2",
        "Comment 3"  
    ]
}

Setup Requirements

  1. Make sure chrome is installed
  2. Install Chromedriver and place it in the same directory as the file
  3. Enter inputs required by the code
  4. Run the code

Updates

  • Scrape comments found in "view more comments"
  • Add a file for inputs only
  • Add comments to the code
  • Add an option to scrape the general group discussions and not specific topics
Owner
Fatima Ghadieh
Third-year ECE student at AUB looking for undergraduate Machine Learning internships
Fatima Ghadieh
for those who dont want to pay $10/month for high school game footage with ads

nfhs-scraper Disclaimer: I am in no way responsible for what you choose to do with this script and guide. I do not endorse avoiding paywalls or any il

Conrad Crawford 5 Apr 12, 2022
Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

Pythonic Crawling / Scraping Framework Built on Eventlet Features High Speed WebCrawler built on Eventlet. Supports relational databases engines like

Juan Manuel Garcia 173 Dec 05, 2022
对于有验证码的站点爆破,用于安全合法测试

使用方法 python3 main.py + 配置好的文件 python3 main.py Verify.json python3 main.py NoVerify.json 以上分别对应有验证码的demo和无验证码的demo Tips: 你可以以域名作为配置文件名字加载:python3 main

47 Nov 09, 2022
Scrape and display grades onto the console

WebScrapeGrades About The Project This Project is a personal project where I learned how to webscrape using python requests. Being able to get request

Cyrus Baybay 1 Oct 23, 2021
Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation This repository provides two web crawlers to label domain nam

1 Nov 05, 2021
Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia 🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

howie.hu 1.6k Jan 01, 2023
A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Alex Papadopoulos 1 Nov 13, 2021
A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

cybernews A package that provides you Latest Cyber/Hacker News from website using Web-Scraping. Latest Cyber/Hacker News Using Webscraping Developed b

Hitesh Rana 4 Jun 02, 2022
基于Github Action的定时HITsz疫情上报脚本,开箱即用

HITsz Daily Report 基于 GitHub Actions 的「HITsz 疫情系统」访问入口 定时自动上报脚本,开箱即用。 感谢 @JellyBeanXiewh 提供原始脚本和 idea。 感谢 @bugstop 对脚本进行重构并新增 Easy Connect 校内代理访问。

Ter 56 Nov 27, 2022
Crawl BookCorpus

These are scripts to reproduce BookCorpus by yourself.

Sosuke Kobayashi 590 Jan 03, 2023
淘宝茅台抢购最新优化版本,淘宝茅台秒杀,优化了茅台抢购线程队列

淘宝茅台抢购最新优化版本,淘宝茅台秒杀,优化了茅台抢购线程队列

MaoTai 118 Dec 16, 2022
A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

WaGpScraper A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working

Muhammed Rizad 27 Dec 18, 2022
Binance Smart Chain Contract Scraper + Contract Evaluator

Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

14 Dec 09, 2022
A universal package of scraper scripts for humans

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

299 Dec 15, 2022
A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

Isaac Muse 151 Dec 23, 2022
Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Alpha Swap English This is a simple python tool for the purpose of swapping latinic letters with cirylic ones and vice versa, in txt, docx and pdf fil

Aleksandar Damnjanovic 3 May 31, 2022
Scraping Top Repositories for Topics on GitHub,

0.-Webscrapping-using-python Scraping Top Repositories for Topics on GitHub, Web scraping is the process of extracting and parsing data from websites

Dev Aravind D Satprem 2 Mar 18, 2022
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine

DatNgo 32 Dec 31, 2022
simple http & https proxy scraper and checker

simple http & https proxy scraper and checker

Neospace 11 Nov 15, 2021
👁️ Tool for Data Extraction and Web Requests.

httpmapper 👁️ Project • Technologies • Installation • How it works • License Project 🚧 For educational purposes. This is a project that I developed,

15 Dec 05, 2021