A minimal caching proxy to GitHub's REST & GraphQL APIs

Overview

github-proxy

CircleCI Maintainability Test Coverage

A caching forward proxy to GitHub's REST and GraphQL APIs. GitHub-Proxy is a thin, highly extensible, highly configurable python framework based on Werkzeug. It comes with out-of-the-box support for Flask and Redis, but can be extended to integrate with other application frameworks, databases, and monitoring tools.

Features:

  • Caching of GitHub responses based on conditional requests.
  • Improved and granular monitoring of client usage and rate-limit consumption.
  • Provides a central and extensible pool of GitHub credentials (either GitHub Apps or user PATs) enabling rotation of rate-limited tokens. Negates the need of managing a GitHub App or bot user account per client.
  • Coarse-grained and highly configurable authorization of clients based on API resource scopes.
  • 100% compatible with GitHub's REST and GraphQL interfaces as well as the Enterprise API.

Install

Core framework:

$ pip install github-proxy

With support for Redis as a cache backend:

$ pip install github-proxy[redis]

With Flask support:

$ pip install github-proxy[flask]

Docker image: (Coming Soon)

$ docker pull babylonhealth/github-proxy

Usage

Flask integration (see example):

from github_proxy import blueprint

app = Flask(__name__)

app.register_blueprint(blueprint)
app.register_blueprint(
    blueprint, name="github_enterprise_proxy", url_prefix="/api/v3"
)  # enterprise server

if __name__ == "__main__":
    app.run()

Core framework (only needed for non-Flask applications):

# Explicitly construct a proxy instance
from github_proxy import Proxy, Config, CacheBackend, TelemetryCollector

config = Config()
proxy = Proxy(
    github_api_url=config.github_api_url,
    github_token_config=config,
    cache=CacheBackend.factory(config),
    rate_limited={},
    clients=config.clients,
    tel_collector=TelemetryCollector.from_type(config.tel_collector_type),
)

# Or inject an instance loaded from the environment
from github_proxy import inject_proxy
from werkzeug import Request, Response

@inject_proxy
def request_handler(proxy: Proxy, request: Request) -> Response:
    return proxy.request(request.path, request, "foo-client")

Registered clients (see client registry file) can then integrate with the proxy using their proxy client token. A proxy client token may be used as a regular GitHub PAT in SAML SSO Authentication:

$ curl -H "Authorization: token ${CLIENT_TOKEN}" http://localhost:5000/zen
Keep it logically awesome.

Architecture

The need for such a solution stemmed from Babylon's reliance on GitOps as an operational and change release framework. This led to a high (and at times abusive) usage of the GitHub API through a limited number of GitHub bot users. Frequent rate-limiting and lack of observability in terms of which client/workflow/team is abusing the API resulted in suboptimal developer experience.

High usage clients of the GitHub API are usually CI/CD pipelines and automated tests. These workflows are traditionally implemented as a collection of job processes executing independently to each other. This setup does not allow hot resources (and their Etags) to be shared across different workflows, or even jobs of the same workflow.

The GitHub-Proxy provides a centralised store of Etags that can be shared and re-used amongst its client base, letting workflows take full advantage of conditional requests which do not count against the rate limit.

graph LR
  subgraph clients
  A[CI job foo]
  C[CD job bar]
  D[Synthetic test baz]
  end
  A -- Auth via `Authorization: token REDACTED` HTTP header --> B[GitHub Proxy]
  C --> B
  D --> B
  B -- Auth via GitHub App or user PAT --> E[GitHub]
  B -- Read/Write responses of GitHub --> F[(Cache)]
  B -- Reporting of usage --> G[(Telemetry Collector)]

Sequence diagrams:

Configuring the proxy

By default, the proxy loads its configuration using the github_proxy.Config class from the following 2 sources:

  1. Environment variables
  2. Client registry file

Environment variables

Variable Description Default
GITHUB_API_URL Base url of the GitHub API server. https://api.github.com
CACHE_TTL The TTL (in seconds) of the cache that stores GitHub responses. 3600
CACHE_BACKEND_URL URI of the cache backend that stores GitHub responses. The scheme of the URI infers the cache backend type. inmemory://
GITHUB_CREDS_CACHE_MAXSIZE The max size of the inmemory cache used for storing rate limited GitHub credentials. 256
GITHUB_CREDS_CACHE_TTL_PADDING The TTL padding (in minutes) of the inmemory cache used for storing rate limited GitHub credentials. This padding accounts for potential clock drift between the proxy and the GitHub servers. 10
TELEMETRY_COLLECTOR_TYPE The type of telemetry collector to be used. noop
CLIENT_REGISTRY_FILE_PATH (Required) Path to the client registry file. See here for more. n/a
GITHUB_PAT_* Variable pattern to specify GitHub user PATs that the proxy can use when integrating with the GitHub API. Example variable name: GITHUB_PAT_FOO. n/a
GITHUB_APP_*_ID Variable pattern to specify GitHub App IDs that the proxy can use when integrating with the GitHub API. Example variable name: GITHUB_APP_BAR_ID. n/a
GITHUB_APP_*_INSTALLATION_ID Variable pattern to specify the GitHub App installation IDs that correspond to each of the GitHub App IDs. Example variable name: GITHUB_APP_BAR_INSTALLATION_ID. n/a
GITHUB_APP_*_PEM Variable pattern to specify the GitHub App private keys that correspond to each of the GitHub App IDs. Example variable name: GITHUB_APP_BAR_PEM. n/a

Client registry file

This file specifies the set of clients that are authorized to integrate with the proxy:

---
version: 1
clients:
  - name: test
    token: H+hYxlecgRq7yfmhq2COlJk7tpSwDmdsp8thdPsnbnQ=
  - name: read_only
    token: oed4+Uo4s4mgwstjSAY/N+HSOsGwfbX91QxqSOjsVlU=
    scopes:
    - method: GET
      path: .*
...

The tokens included in this file are the authorization tokens that clients need to pass to the proxy (instead of GitHub tokens). The name of each client should be unique and is to be used for telemetry purposes. The scopes define the resources and methods of the REST API that each of the clients is authorized to access (default to full access).

Tokens within this file must be treated as secrets. Since secrets cannot be commited to VCS, the registry file can also be provided as a Jinja2 template, enabling the injection of secrets at runtime through env variables:

version: 1
clients:
  - name: test
    token: {{ env.TOKEN_TEST }}

Extending the proxy

Adding a new type of cache backend:

from github_proxy import CacheBackend, CacheBackendConfig

class PostgresCacheBackend(CacheBackend, scheme="postgres"):
    # Implement the __init__, _get, _set, and _make_key methods
    # of the CacheBackend interface
    pass

Adding a new type of telemetry collector:

from github_proxy import TelemetryCollector

class JaegerTelemetryCollector(TelemetryCollector, type_="jaeger"):
    # Implement the collect_gh_response_metrics and collect_proxy_request_metrics 
    # methods of the TelemetryCollector interface
    pass

Once imported, the above extensions can be selected using the respective CACHE_BACKEND_URL and TELEMETRY_COLLECTOR_TYPE env variables.

Relevant references

  1. Google's magic GitHub proxy: Proxy that enables IAM for GitHub API tokens.
  2. Sourcegraph's GitHub proxy: Provides enhanced observability.
You might also like...
 Pancakeswap Sniper BOT - TORNADO CASH Proxy (MAC WINDOWS ANDROID LINUX) A fully decentralized protocol for private transactions
Pancakeswap Sniper BOT - TORNADO CASH Proxy (MAC WINDOWS ANDROID LINUX) A fully decentralized protocol for private transactions

TORNADO CASH Proxy Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX) ⭐️ A fully decentralized protocol for private transactions ⭐️ AUTO DOWNL

TORNADO CASH Proxy Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX)
TORNADO CASH Proxy Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX)

TORNADO CASH Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX) ⭐️ A ful

Simple Webhook Spammer with Optional Proxy Support
Simple Webhook Spammer with Optional Proxy Support

😎 �Simple Webhook Spammer with Optional Proxy Support:- [+] git clone https://g

a public repository helping ML/DL engineers and DS to beautify the notebook with minimal coding.

ml-helper-functions a public repository helping ML/DL engineers and DS to beautify the notebook with minimal coding.

Minimal telegram voice chat music bot, in pyrogram.

VCBOT Fully working VC (user)Bot, based on py-tgcalls and py-tgcalls-wrapper with minimal features. Deploying To heroku: Local machine/VPS: git clone

Send Informative, Concise Slack Notifications With Minimal Effort

slack-templates Send Informative, Concise Slack Notifications With Minimal Effort slack-templates Slack Integration Available Templates Usage Report t

Minimal API for the COVID Booking System of the Offices at the UniPD Math Dep

Simple and easy to use python BOT for the COVID registration booking system of the math department @ unipd (torre archimede). This API creates an interface with the official website, with more useful functionalities.

A free, minimal, lightweight, cross-platform, easily expandable  Twitch IRC/API bot.
A free, minimal, lightweight, cross-platform, easily expandable Twitch IRC/API bot.

parky's twitch bot A free, minimal, lightweight, cross-platform, easily expandable Twitch IRC/API bot. Features 🔌 Connect to Twitch IRC chat! 🔌 Conn

Lamblayer: a minimal deployment tool for AWS Lambda layers

lamblayer lamblayer is a minimal deployment tool for AWS Lambda layers. lamblayer does, Create a Layers of built pip-installable python packages. Crea

Comments
  • [CLOUD-5072] Porting the codebase to a new repo

    [CLOUD-5072] Porting the codebase to a new repo

    • Spike application
    • Spike
    • Add some todos
    • Remove injectors
    • Deployable service
    • Last modified headers (#3)
    • Introducing HTTP token auth (#4)
    • Enterprise server support (#5)
    • Log incoming cache headers (#6)
    • Using a lazy pool of GitHub credentials (#7)
    • Proxy should not fail if cache is down (#8)
    • Support TLS redis (#10)
    • Telemetry (#13)
    • Change custom prom metric names (#14)
    • Change metric labels (#15)
    • Integration tests + refactoring (#17)
    • Media type of a resource should be part of the cache key (#18)
    • Add healthcheck (#19)
    • Better separation between cache misses and uncacheable (#20)
    • Noop renaming of credentials to tokens (#21)
    • Query params to be used when indexing cached responses (#23)
    • Re-using TCP connections (#26)
    • Filter the whole set of hop-by-hop headers (#28)
    • Filter host and content headers (#29)
    • Authorize clients based on scopes (#30)
    • Improve docs (#31)
    • Remove promnight dependency from the github_proxy package
    • abstractmethods
    • Remove hard dependencies on redis and flask
    • Using a lazy pool of GitHub credentials (#7)
    • Telemetry (#13)
    • Integration tests + refactoring (#17)
    • Media type of a resource should be part of the cache key (#18)
    • Add healthcheck (#19)
    • Better separation between cache misses and uncacheable (#20)
    • Noop renaming of credentials to tokens (#21)
    • Query params to be used when indexing cached responses (#23)
    • Re-using TCP connections (#26)
    • Authorize clients based on scopes (#30)
    • Porting the codebase over to a new repo
    opened by dedoussis 0
Releases(v0.4.5)
  • v0.4.5(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Add cryptography dependency by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/7

    Full Changelog: https://github.com/babylonhealth/github-proxy/compare/v0.4.4...v0.4.5

    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Retry poetry by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/6

    Full Changelog: https://github.com/babylonhealth/github-proxy/compare/v0.4.3...v0.4.4

    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Retry poetry by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/5

    Full Changelog: https://github.com/babylonhealth/github-proxy/compare/v0.4.2...v0.4.3

    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Retry poetry by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/4

    Full Changelog: https://github.com/babylonhealth/github-proxy/compare/v0.4.0...v0.4.2

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Fix poetry dynamic versioning by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/3

    Full Changelog: https://github.com/babylonhealth/github-proxy/compare/v0.3.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Fix poetry dynamic versioning by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/3

    Full Changelog: https://github.com/babylonhealth/github-proxy/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Fix circleci tag filtering by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/2

    Full Changelog: https://github.com/babylonhealth/github-proxy/compare/v0.2.0...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(May 9, 2022)

    What's Changed

    • [CLOUD-5072] Porting the codebase to a new repo by @dedoussis in https://github.com/babylonhealth/github-proxy/pull/1

    New Contributors

    • @dedoussis made their first contribution in https://github.com/babylonhealth/github-proxy/pull/1

    Full Changelog: https://github.com/babylonhealth/github-proxy/commits/v0.2.0

    Source code(tar.gz)
    Source code(zip)
Owner
Babylon Health
Putting an accessible and affordable health service in the hands of every person on earth.
Babylon Health
PerrOS - The operating system for your discord server.

PerrOS PerrOS is a Opensource Discord Bot to do it all! Installation Use the package manager pip to install the python3 requirements. pip3 install -r

Webshort 2 Jun 20, 2022
Announces when a web3 wallet receives a token

excitare_cito v2.0 by Bogdan Vaida ([email protected]) Announces wh

1 Nov 30, 2021
A simple Discord Bot created for basic functionality and fun chat commands for use in a private server.

LoveAndChaos-Bot v0.1.0 LoveAndChaos-Bot is a Discord Bot specifically designed for a private server; this bot is merely a test and a method to expose

Morgan Rose 1 Dec 12, 2021
Extend the commitizen tools to create conventional commits and README that link to Jira and GitHub.

cz-github-jira-conventional cz-github-jira-conventional is a plugin for the commitizen tools, a toolset that helps you to create conventional commit m

12 Dec 13, 2022
Schedule Twitter updates with easy

coo: schedule Twitter updates with easy Coo is an easy to use Python library for scheduling Twitter updates. To use it, you need to first apply for a

wilfredinni 46 Nov 03, 2022
Python tool to Check running WebClient services on multiple targets based on @leechristensen

WebClient Service Scanner Python tool to Check running WebClient services on multiple targets based on @tifkin_ idea. This tool uses impacket project.

Pixis 153 Dec 28, 2022
A Bot that Forwards Tweets to Telegram using Airtable as a database.

Twitter Telegram Forward A Bot that Forwards Tweets to Telegram using Airtable as a Database. Features: Handles multiple twitter and telegram channels

George Bakev 3 Dec 21, 2022
Python library to download market data via Bloomberg, Eikon, Quandl, Yahoo etc.

findatapy findatapy creates an easy to use Python API to download market data from many sources including Quandl, Bloomberg, Yahoo, Google etc. using

Cuemacro 1.3k Jan 04, 2023
🐍 Mnemonic code for generating deterministic keys, BIP39

python-mnemonic 🐍 Mnemonic code for generating deterministic keys, BIP39 Installation To install this library and its dependencies use: pip install m

9 Dec 22, 2022
Gdrive-python: A wrapping module in python of gdrive

gdrive-python gdrive-python is a wrapping module in python of gdrive made by @pr

Vittorio Pippi 3 Feb 19, 2022
A listener for RF >= 4.0 that prints a Stack Trace to console to faster find the code section where the failure appears.

robotframework-stacktrace A listener for RF = 4.0 that prints a Stack Trace to console to faster find the code section where the failure appears. Ins

marketsquare 16 Nov 24, 2022
Wrapper around the Mega API

python-mega Overview Wrapper around the Mega API. Based on the work of Julien Marchand. Installation Install using pip, including any optional package

Juan Riaza 104 Nov 26, 2022
An API wrapper around the pythonanywhere's API.

pyaww An API wrapper around the pythonanywhere's API. The name stands for pythonanywherewrapper. 100% API coverage Most of the codebase is documented

7 Dec 11, 2022
Fully automated Chegg Discord bot for "homework help"

cheggbog Fully automated Chegg Discord bot for "homework help". Working Sept 15, 2021 Overview Recently, Chegg has made it extremely difficult to auto

Bryce Hackel 8 Dec 23, 2022
Simple integration between FastAPI and cloud authentication services (AWS Cognito, Auth0, Firebase Authentication).

FastAPI Cloud Auth fastapi-cloudauth standardizes and simplifies the integration between FastAPI and cloud authentication services (AWS Cognito, Auth0

tokusumi 255 Jan 07, 2023
A Python Library to interface with Tumblr v2 REST API & OAuth

Tumblpy Tumblpy is a Python library to help interface with Tumblr v2 REST API & OAuth Features Retrieve user information and blog information Common T

Mike Helmick 125 Jun 20, 2022
Analyzed the data of VISA applicants to build a predictive model to facilitate the process of VISA approvals.

Analyzed the data of Visa applicants, built a predictive model to facilitate the process of visa approvals, and based on important factors that significantly influence the Visa status recommended a s

Jesus 1 Jan 08, 2022
A modular dynamical-systems model of Ethereum's validator economics.

CADLabs Ethereum Economic Model A modular dynamical-systems model of Ethereum's validator economics, based on the open-source Python library radCAD, a

CADLabs 104 Jan 03, 2023
ToqueIO Nuke tools - A collection of tools designed to assist in enhancing your workflows within nuke

ToqueIO Nuke tools - A collection of tools designed to assist in enhancing your workflows within nuke

4 Feb 19, 2022