Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
Test django schema and data migrations, including migrations' order and best practices.

django-test-migrations Features Allows to test django schema and data migrations Allows to test both forward and rollback migrations Allows to test th

wemake.services 382 Dec 27, 2022
Auto-hms-action - Automation of NU Health Management System

🦾 Automation of NU Health Management System 🤖 長崎大学 健康管理システムの自動化 🏯 Usage / 使い方

k5-mot 3 Mar 04, 2022
HTTP load generator, ApacheBench (ab) replacement, formerly known as rakyll/boom

hey is a tiny program that sends some load to a web application. hey was originally called boom and was influenced from Tarek Ziade's tool at tarekzia

Jaana Dogan 14.9k Jan 07, 2023
HTTP client mocking tool for Python - inspired by Fakeweb for Ruby

HTTPretty 1.0.5 HTTP Client mocking tool for Python created by Gabriel Falcão . It provides a full fake TCP socket module. Inspired by FakeWeb Github

Gabriel Falcão 2k Jan 06, 2023
Silky smooth profiling for Django

Silk Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before prese

Jazzband 3.7k Jan 04, 2023
Fail tests that take too long to run

GitHub | PyPI | Issues pytest-fail-slow is a pytest plugin for making tests fail that take too long to run. It adds a --fail-slow DURATION command-lin

John T. Wodder II 4 Nov 27, 2022
Python version of the Playwright testing and automation library.

🎭 Playwright for Python Docs | API Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright del

Microsoft 7.8k Jan 02, 2023
The (Python-based) mining software required for the Game Boy mining project.

ntgbtminer - Game Boy edition This is a version of ntgbtminer that works with the Game Boy bitcoin miner. ntgbtminer ntgbtminer is a no thrills getblo

Ghidra Ninja 31 Nov 04, 2022
Active Directory Penetration Testing methods with simulations

AD penetration Testing Project By Ruben Enkaoua - GL4Di4T0R Based on the TCM PEH course (Heath Adams) Index 1 - Setting Up the Lab Intallation of a Wi

GL4DI4T0R 3 Aug 12, 2021
Parameterized testing with any Python test framework

Parameterized testing with any Python test framework Parameterized testing in Python sucks. parameterized fixes that. For everything. Parameterized te

David Wolever 714 Dec 21, 2022
Pymox - open source mock object framework for Python

Pymox is an open source mock object framework for Python. First Steps Installation Tutorial Documentation http://pymox.readthedocs.io/en/latest/index.

Ivan Rocha 7 Feb 02, 2022
Testing - Instrumenting Sanic framework with Opentelemetry

sanic-otel-splunk Testing - Instrumenting Sanic framework with Opentelemetry Test with python 3.8.10, sanic 20.12.2 Step to instrument pip install -r

Donler 1 Nov 26, 2021
UUM Merit Form Filler is a web automation which helps automate entering a matric number to the UUM system in order for participants to obtain a merit

About UUM Merit Form Filler UUM Merit Form Filler is a web automation which helps automate entering a matric number to the UUM system in order for par

Ilham Rachmat 3 May 31, 2022
create custom test databases that are populated with fake data

About Generate fake but valid data filled databases for test purposes using most popular patterns(AFAIK). Current support is sqlite, mysql, postgresql

Emir Ozer 2.2k Jan 04, 2023
Pytest modified env

Pytest plugin to fail a test if it leaves modified os.environ afterwards.

wemake.services 7 Sep 11, 2022
An AWS Pentesting tool that lets you use one-liner commands to backdoor an AWS account's resources with a rogue AWS account - or share the resources with the entire internet 😈

An AWS Pentesting tool that lets you use one-liner commands to backdoor an AWS account's resources with a rogue AWS account - or share the resources with the entire internet 😈

Brandon Galbraith 276 Mar 03, 2021
A single module to link Python ecosystem to the Web

A single module to link Python ecosystem to the Web. Have a quick look at the Gallery first to get convinced ! FAQ For any questions, please use Stack

66 Dec 21, 2022
HTTP traffic mocking and testing made easy in Python

pook Versatile, expressive and hackable utility library for HTTP traffic mocking and expectations made easy in Python. Heavily inspired by gock. To ge

Tom 305 Dec 23, 2022
bulk upload files to libgen.lc (Selenium script)

LibgenBulkUpload bulk upload files to http://libgen.lc/librarian.php (Selenium script) Usage ./upload.py to_upload uploaded rejects So title and autho

8 Jul 07, 2022
AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

6 Jan 28, 2022