Scrapping Connections' info on Linkedin

Last update: Feb 11, 2022

Overview

Scrap It!

! Disclaimer:

THIS CODE HAS BEEN IMPLEMENTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE INTERVIEW PROCESS OF MCI.IR AND INTERVIEWEES WERE SUPPOSED TO PUSH THE CODE ON THEIR GITHUB. CONTACT ME TO REMOVE THIS REPOSITORY, IN CASE IT IS AGAINST YOUR TOS.
IF ANY CONNECTION IS NOT OK TO THEIR CONTACT INFO BE HERE, CONTACT ME TO REMOVE THEM ASAP.

Functionalities:

This script automatically:

opens your Linkedin profile
accesses your connections page
crawls the page for grabbing their profile links
scraps each person's information and dumps it to Sqlite db
and simultaneously logs all necessary level of info into Linkedin.log

DataFlowDiagram

Enlisted desing patterns are (but not limited to):

Creator
Low Coupling
High Cohesion
Indirection
Modularization
Information Expert

Log/DB files:

Further develepments notes:

Check out other DBs that supports multithreading which anable us dumpping all information rows at once
change IP per request (You can find its code on my "Social Media Computing course" repository)
Sometimes you need to scroll down manually when "connection" page is being loaded. You can add one line code to scroll down for you.

References:

https://www.linkedin.com/pulse/how-easy-scraping-data-from-linkedin-profiles-david-craven

https://www.geeksforgeeks.org/scrape-linkedin-using-selenium-and-beautiful-soup-in-python/

https://stackoverflow.com/questions/28883769/remove-odd-indexed-elements-from-list-in-python#:~:text=Fun%20fact%3A%20to%20remove%20all,remove(x)%20.

https://stackoverflow.com/questions/34759787/fetch-all-href-link-using-selenium-in-python

https://www.tutorialspoint.com/fetch-all-href-link-using-selenium-in-python

https://stackoverflow.com/questions/64717302/deprecationwarning-executable-path-has-been-deprecated-selenium-python

https://chromedriver.chromium.org/home

https://www.youtube.com/watch?v=-ARI4Cz-awo

Scrapping Connections' info on Linkedin

Related tags

Overview

Scrap It!

Functionalities:

DataFlowDiagram

Enlisted desing patterns are (but not limited to):

Log/DB files:

Further develepments notes:

References:

Owner

MohammadReza Ardestani

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

茅台抢购最新优化版本，茅台秒杀，优化了抢购协程队列

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

A crawler of doubamovie

Google Developer Profile Badge Scraper

Scrapy-based cyber security news finder

An automated, headless YouTube Watcher and Scraper

Generate a repository with mirror links for DriveDroid app

Web Content Retrieval for Humans™

A dead simple crawler to get books information from Douban.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

fork huanghyw/jd_seckill

A distributed crawler for weibo, building with celery and requests.

学习强国自动化百分百正确、瞬间答题，分值45分

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Scrapping Connections' info on Linkedin

Related tags

Overview

Scrap It!

Functionalities:

DataFlowDiagram

Enlisted desing patterns are (but not limited to):

Log/DB files:

Further develepments notes:

References:

Owner

MohammadReza Ardestani

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

茅台抢购最新优化版本，茅台秒杀，优化了抢购协程队列

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

A crawler of doubamovie

Google Developer Profile Badge Scraper

Scrapy-based cyber security news finder

An automated, headless YouTube Watcher and Scraper

Generate a repository with mirror links for DriveDroid app

Web Content Retrieval for Humans™

A dead simple crawler to get books information from Douban.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

fork huanghyw/jd_seckill

A distributed crawler for weibo, building with celery and requests.

学习强国 自动化 百分百正确、瞬间答题，分值45分

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

学习强国自动化百分百正确、瞬间答题，分值45分