This is python to scrape overview and reviews of companies from Glassdoor.

Last update: Jun 23, 2022

Related tags

Overview

Data Scraping for Glassdoor

This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of Service that explicitly prohibits web scraping.

Built With

Python
ChromeDriver

(back to top)

Getting Started

Download the SeleniumGlassdor.py file. Change the path of the chromedriver on your machine. Use your own file that contain the lists of the companies glassdoor url. The company url csv file is also attached here. The way to generate the file is also based on selenium, searching the 'glassdoor' + company name in google search engine, and extract the url from the first results. Per requests, I can also upload the file accordingly.

Prerequisites

Install the selenium before using it.

selenium
```
pip install selenium
```

For the other sections

If you want to scape data from the other sections, such as jobs, salaries. You can use the following methods to first extract the url and then use the similar method to downlode the sections.

reviewsUrl = browser.find_element_by_xpath("//a[@data-label='Reviews']").get_attribute('href')
jobsUrl = browser.find_element_by_xpath("//a[@data-label='Jobs']").get_attribute('href')
salariesUrl = browser.find_element_by_xpath("//a[@data-label='Salaries']").get_attribute('href')
interviewsUrl = browser.find_element_by_xpath("//a[@data-label='Interviews']").get_attribute('href')
benefitsUrl = browser.find_element_by_xpath("//a[@data-label='Benefits']").get_attribute('href')
photosUrl = browser.find_element_by_xpath("//a[@data-label='Photos']").get_attribute('href')

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Houping - [email protected]

(back to top)

This is python to scrape overview and reviews of companies from Glassdoor.

Related tags

Overview

Data Scraping for Glassdoor

Built With

Getting Started

Prerequisites

For the other sections

Contributing

License

Contact

Owner

Houping

Extract embedded metadata from HTML markup

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

LSpider 一个为被动扫描器定制的前端爬虫

Lovely Scrapper

A repository with scraping code and soccer dataset from understat.com.

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

A Scrapper with python

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

:arrow_double_down: Dumb downloader that scrapes the web

A universal package of scraper scripts for humans

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Automatically scrapes all menu items from the Taco Bell website

Automated Linkedin bot that will improve your visibility and increase your network.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.