Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
GitHub action for AppSweep Mobile Application Security Testing

GitHub action for AppSweep can be used to continuously integrate app scanning using AppSweep into your Android app build process

Guardsquare 14 Oct 06, 2022
A utility for mocking out the Python Requests library.

Responses A utility library for mocking out the requests Python library. Note Responses requires Python 2.7 or newer, and requests = 2.0 Installing p

Sentry 3.8k Jan 03, 2023
a wrapper around pytest for executing tests to look for test flakiness and runtime regression

bubblewrap a wrapper around pytest for assessing flakiness and runtime regressions a cs implementations practice project How to Run: First, install de

Anna Nagy 1 Aug 05, 2021
Automação de Processos (obtenção de informações com o Selenium), atualização de Planilha e Envio de E-mail.

Automação de Processo: Código para acompanhar o valor de algumas ações na B3. O código entra no Google Drive, puxa os valores das ações (pré estabelec

Hemili Beatriz 1 Jan 08, 2022
AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

6 Jan 28, 2022
Python Webscraping using Selenium

Web Scraping with Python and Selenium The code shows how to do web scraping using Python and Selenium. We use as data the https://sbot.org.br/localize

Luís Miguel Massih Pereira 1 Dec 01, 2021
Headless chrome/chromium automation library (unofficial port of puppeteer)

Pyppeteer Pyppeteer has moved to pyppeteer/pyppeteer Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation libr

miyakogi 3.5k Dec 30, 2022
A pytest plugin that enables you to test your code that relies on a running Elasticsearch search engine

pytest-elasticsearch What is this? This is a pytest plugin that enables you to test your code that relies on a running Elasticsearch search engine. It

Clearcode 65 Nov 10, 2022
The successor to nose, based on unittest2

Welcome to nose2 nose2 is the successor to nose. It's unittest with plugins. nose2 is a new project and does not support all of the features of nose.

736 Dec 16, 2022
Fi - A simple Python 3.9+ command-line application for managing Fidelity portfolios

fi fi is a simple Python 3.9+ command-line application for managing Fidelity por

Darik Harter 2 Feb 26, 2022
Testing Calculations in Python, using OOP (Object-Oriented Programming)

Testing Calculations in Python, using OOP (Object-Oriented Programming) Create environment with venv python3 -m venv venv Activate environment . venv

William Koller 1 Nov 11, 2021
Auto Click by pyautogui and excel operations.

Auto Click by pyautogui and excel operations.

Janney 2 Dec 21, 2021
Photostudio是一款能进行自动化检测网页存活并实时给网页拍照的工具,通过调用Fofa/Zoomeye/360qua/shodan等 Api快速准确查询资产并进行网页截图,从而实施进一步的信息筛查。

Photostudio-红队快速爬取网页快照工具 一、简介: 正如其名:这是一款能进行自动化检测,实时给网页拍照的工具 信息收集要求所收集到的信息要真实可靠。 当然,这个原则是信息收集工作的最基本的要求。为达到这样的要求,信息收集者就必须对收集到的信息反复核实,不断检验,力求把误差减少到最低限度。我

s7ck Team 41 Dec 11, 2022
Hypothesis is a powerful, flexible, and easy to use library for property-based testing.

Hypothesis Hypothesis is a family of testing libraries which let you write tests parametrized by a source of examples. A Hypothesis implementation the

Hypothesis 6.4k Jan 05, 2023
Youtube Tool using selenium Python

YT-AutoLikeComment-AutoReportComment-AutoComment Youtube Tool using selenium Python Auto Comment Auto Like Comment Auto Report Comment Usage: 1. Insta

Rahul Joshua Damanik 1 Dec 13, 2021
Flexible test automation for Python

Nox - Flexible test automation for Python nox is a command-line tool that automates testing in multiple Python environments, similar to tox. Unlike to

Stargirl Flowers 941 Jan 03, 2023
Tools for test driven data-wrangling and data validation.

datatest: Test driven data-wrangling and data validation Datatest helps to speed up and formalize data-wrangling and data validation tasks. It impleme

269 Dec 16, 2022
show python coverage information directly in emacs

show python coverage information directly in emacs

wouter bolsterlee 30 Oct 26, 2022
Silky smooth profiling for Django

Silk Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before prese

Jazzband 3.7k Jan 04, 2023
Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.

Mimesis - Fake Data Generator Description Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes

Isaak Uchakaev 3.8k Dec 29, 2022