Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
Python version of the Playwright testing and automation library.

🎭 Playwright for Python Docs | API Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright del

Microsoft 7.8k Jan 02, 2023
Codeforces Test Parser for C/C++ & Python on Windows

Codeforces Test Parser for C/C++ & Python on Windows Installation Run pip instal

Minh Vu 2 Jan 05, 2022
This repository contains a testing script for nmigen-boards that tries to build blinky for all the platforms provided by nmigen-boards.

Introduction This repository contains a testing script for nmigen-boards that tries to build blinky for all the platforms provided by nmigen-boards.

S.J.R. van Schaik 4 Jul 23, 2022
Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

Daniele Faraglia 15.2k Jan 01, 2023
To automate the generation and validation tests of COSE/CBOR Codes and it's base45/2D Code representations

To automate the generation and validation tests of COSE/CBOR Codes and it's base45/2D Code representations, a lot of data has to be collected to ensure the variance of the tests. This respository was

160 Jul 25, 2022
Command line driven CI frontend and development task automation tool.

tox automation project Command line driven CI frontend and development task automation tool At its core tox provides a convenient way to run arbitrary

tox development team 3.1k Jan 04, 2023
tidevice can be used to communicate with iPhone device

tidevice can be used to communicate with iPhone device

Alibaba 1.8k Jan 08, 2023
Baseball Discord bot that can post up-to-date scores, lineups, and home runs.

Sunny Day Discord Bot Baseball Discord bot that can post up-to-date scores, lineups, and home runs. Uses webscraping techniques to scrape baseball dat

Benjamin Hammack 1 Jun 20, 2022
A simple python script that uses selenium(chrome web driver),pyautogui,time and schedule modules to enter google meets automatically

A simple python script that uses selenium(chrome web driver),pyautogui,time and schedule modules to enter google meets automatically

3 Feb 07, 2022
Pyramid debug toolbar

pyramid_debugtoolbar pyramid_debugtoolbar provides a debug toolbar useful while you're developing your Pyramid application. Note that pyramid_debugtoo

Pylons Project 95 Sep 17, 2022
A configurable set of panels that display various debug information about the current request/response.

Django Debug Toolbar The Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/respons

Jazzband 7.3k Jan 02, 2023
Network automation lab using nornir, scrapli, and containerlab with Arista EOS

nornir-scrapli-eos-lab Network automation lab using nornir, scrapli, and containerlab with Arista EOS. Objectives Deploy base configs to 4xArista devi

Vireak Ouk 13 Jul 07, 2022
Tattoo - System for automating the Gentoo arch testing process

Naming origin Well, naming things is very hard. Thankfully we have an excellent

Arthur Zamarin 4 Nov 07, 2022
ApiPy was created for api testing with Python pytest framework which has also requests, assertpy and pytest-html-reporter libraries.

ApiPy was created for api testing with Python pytest framework which has also requests, assertpy and pytest-html-reporter libraries. With this f

Mustafa 1 Jul 11, 2022
py.test fixture for benchmarking code

Overview docs tests package A pytest fixture for benchmarking code. It will group the tests into rounds that are calibrated to the chosen timer. See c

Ionel Cristian Mărieș 1k Jan 03, 2023
Mockoon is the easiest and quickest way to run mock APIs locally. No remote deployment, no account required, open source.

Mockoon Mockoon is the easiest and quickest way to run mock APIs locally. No remote deployment, no account required, open source. It has been built wi

mockoon 4.4k Dec 30, 2022
Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away

Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away. Fast execution of profit-take/loss-cut orders is built-in. Seamless with Pandas.

Epymetheus 39 Jan 06, 2023
Scalable user load testing tool written in Python

Locust Locust is an easy to use, scriptable and scalable performance testing tool. You define the behaviour of your users in regular Python code, inst

Locust.io 20.4k Jan 04, 2023
Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced

Instagram-Unfollower-Bot Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced.

Biswarup Bhattacharjee 1 Dec 24, 2021
A library for generating fake data and populating database tables.

Knockoff Factory A library for generating mock data and creating database fixtures that can be used for unit testing. Table of content Installation Ch

Nike Inc. 30 Sep 23, 2022