This is python to scrape overview and reviews of companies from Glassdoor.

Overview

Data Scraping for Glassdoor

This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of Service that explicitly prohibits web scraping.

Built With

  • Python
  • ChromeDriver

(back to top)

Getting Started

Download the SeleniumGlassdor.py file. Change the path of the chromedriver on your machine. Use your own file that contain the lists of the companies glassdoor url. The company url csv file is also attached here. The way to generate the file is also based on selenium, searching the 'glassdoor' + company name in google search engine, and extract the url from the first results. Per requests, I can also upload the file accordingly.

Prerequisites

Install the selenium before using it.

  • selenium
    pip install selenium

For the other sections

If you want to scape data from the other sections, such as jobs, salaries. You can use the following methods to first extract the url and then use the similar method to downlode the sections.

  • reviewsUrl = browser.find_element_by_xpath("//a[@data-label='Reviews']").get_attribute('href')
  • jobsUrl = browser.find_element_by_xpath("//a[@data-label='Jobs']").get_attribute('href')
  • salariesUrl = browser.find_element_by_xpath("//a[@data-label='Salaries']").get_attribute('href')
  • interviewsUrl = browser.find_element_by_xpath("//a[@data-label='Inter­views']").get_attribute('href')
  • benefitsUrl = browser.find_element_by_xpath("//a[@data-label='Benefits']").get_attribute('href')
  • photosUrl = browser.find_element_by_xpath("//a[@data-label='Photos']").get_attribute('href')

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Houping - [email protected]

(back to top)

(back to top)

Owner
Houping
Data Scientist, Business Analyst
Houping
Extract embedded metadata from HTML markup

extruct extruct is a library for extracting embedded metadata from HTML markup. Currently, extruct supports: W3C's HTML Microdata embedded JSON-LD Mic

Scrapinghub 725 Jan 03, 2023
Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

NewsScraper A simple Python 3 module to get crypto or news articles and their content from various RSS feeds. 🔧 Installation Clone the repo locally.

Rokas 3 Jan 02, 2022
Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Shopee Scraper A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil. The project was crea

Paulo DaRosa 5 Nov 29, 2022
PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具,可以快速批量下载大量论文,方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文,目前抓取成功率维持在90%以上。通过配置Config文件,可以抓取任意计算机领域相关会议的论文。 Installation Down

moxiaoxi 47 Nov 23, 2022
LSpider 一个为被动扫描器定制的前端爬虫

LSpider LSpider - 一个为被动扫描器定制的前端爬虫 什么是LSpider? 一款为被动扫描器而生的前端爬虫~ 由Chrome Headless、LSpider主控、Mysql数据库、RabbitMQ、被动扫描器5部分组合而成。

Knownsec, Inc. 321 Dec 12, 2022
Lovely Scrapper

Lovely Scrapper

Tushar Gadhe 2 Jan 01, 2022
A repository with scraping code and soccer dataset from understat.com.

UNDERSTAT - SHOTS DATASET As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goa

douglasbc 48 Jan 03, 2023
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine

DatNgo 32 Dec 31, 2022
Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

searchcve Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs. Generates a CSV file in the current directory. Uses the NI

32 Oct 10, 2022
A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

cybernews A package that provides you Latest Cyber/Hacker News from website using Web-Scraping. Latest Cyber/Hacker News Using Webscraping Developed b

Hitesh Rana 4 Jun 02, 2022
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

crawlersuseragents This Python script can be used to check if there is any differences in responses of an application when the request comes from a se

Podalirius 13 Dec 27, 2022
A Scrapper with python

Scrapper-en-python Scrapper des données signifie récuperer des données pour les traiter ou les analyser. En python, il y'a 2 grands moyens de scrapper

Lun4rIum 1 Dec 05, 2021
Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing. It can be ma

10 Jul 06, 2022
:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

Mort Yao 46.4k Jan 03, 2023
A universal package of scraper scripts for humans

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

299 Dec 15, 2022
淘宝茅台抢购最新优化版本,淘宝茅台秒杀,优化了茅台抢购线程队列

淘宝茅台抢购最新优化版本,淘宝茅台秒杀,优化了茅台抢购线程队列

MaoTai 118 Dec 16, 2022
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

Aditya Gupta 15 May 17, 2022
Automatically scrapes all menu items from the Taco Bell website

Automatically scrapes all menu items from the Taco Bell website. Returns as PANDAS dataframe.

Sasha 2 Jan 15, 2022
Automated Linkedin bot that will improve your visibility and increase your network.

LinkedinSpider LinkedinSpider is a small project using browser automating to increase your visibility and network of connections on Linkedin. DISCLAIM

Frederik 2 Nov 26, 2021
Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Guilherme Silva Uchoa 3 Oct 04, 2022