Facebook Group Scraping Using Beautiful Soup & Selenium

Last update: Aug 12, 2022

Overview

Notes

The scraper should only be used for educational purposes
Kindly refrain from scraping sensitive or private information
It is highly recommended to scrape public (and not private) groups
Ask for consent from the group adminstrator and/or group members before running any code
I am not responsible for any misuse of the code in any shape or form

Facebook Group Scraping Using Beautiful Soup & Selenium

Extract Facebook group posts that are related to a specific topic and write them to a .json file. This project was created in order to gather data needed to build a chatbot for a university's website.

Input

User's Credentials
Facebook Group URL
Number of Scrolls
- Number of posts you want to collect
Directory of the Chromedriver
Optional: Specific topic to be searched

What the Scraper Does

Logs into Facebook using the User's Credentials
Enters the group specified by the User
Searches for the topic
Extracts all posts & their comments

Scraper Output

.json file that includes:

Each post
The comments replying to it

Format of file:

{ 
   "tag": "Topic 1",
   "patterns":  [ "Post text" ],
   "responses": [ "Comment 1", 
        "Comment 2",
        "Comment 3"  
    ]
}

Setup Requirements

Make sure chrome is installed
Install Chromedriver and place it in the same directory as the file
Enter inputs required by the code
Run the code

Updates

Scrape comments found in "view more comments"
Add a file for inputs only
Add comments to the code
Add an option to scrape the general group discussions and not specific topics

Facebook Group Scraping Using Beautiful Soup & Selenium

Related tags

Overview

Notes

Facebook Group Scraping Using Beautiful Soup & Selenium

Input

What the Scraper Does

Scraper Output

Format of file:

Setup Requirements

Updates

Owner

Fatima Ghadieh

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

Dude is a very simple framework for writing web scrapers using Python decorators

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

A scalable frontier for web crawlers

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Works very well and you can ask for the type of image you want the scrapper to collect.

中国大学生在线四史自动答题刷分(现仅支持英雄篇)

A tool can scrape product in aliexpress: Title, Price, and URL Product.

✂️🕷️ Spider-Cut is a Network Mapper Framework (NMAP Framework)

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

A Powerful Spider(Web Crawler) System in Python.

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Simply scrape / download all the media from an fansly account.

Subscrape - A Python scraper for substrate chains

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Google Maps crawler using Selenium

Facebook Group Scraping Using Beautiful Soup & Selenium

Related tags

Overview

Notes

Facebook Group Scraping Using Beautiful Soup & Selenium

Input

What the Scraper Does

Scraper Output

Format of file:

Setup Requirements

Updates

Owner

Fatima Ghadieh

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

Dude is a very simple framework for writing web scrapers using Python decorators

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

A scalable frontier for web crawlers

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Works very well and you can ask for the type of image you want the scrapper to collect.

中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

A tool can scrape product in aliexpress: Title, Price, and URL Product.

✂️🕷️ Spider-Cut is a Network Mapper Framework (NMAP Framework)

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

A Powerful Spider(Web Crawler) System in Python.

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Simply scrape / download all the media from an fansly account.

Subscrape - A Python scraper for substrate chains

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Google Maps crawler using Selenium

中国大学生在线四史自动答题刷分(现仅支持英雄篇)