This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

  • We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.

  • Then we locally test the API. You can find the instructions within the api directory.

  • To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:

    • Looks for any changes in the specified directory. If there are any changes:
    • Builds and pushes the latest Docker image to Google Container Register (GCR).
    • Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

  • Create a k8s cluster on GKE. Here's a relevant resource.

  • Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.

  • Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

  • Configure bucket storage related permissions for the service account:

    $ export PROJECT_ID=<PROJECT_ID>
    $ export ACCOUNT=<ACCOUNT>
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.admin
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectAdmin
    
    gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectCreator
  • If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

  • Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
  • We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

  • Set up logging for the k8s pods.
  • Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments
  • Feat/locust grpc

    Feat/locust grpc

    @deep-diver currently, the load test runs into:

    Screenshot 2022-04-02 at 10 54 26 AM

    I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

    Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

    opened by sayakpaul 18
  • Setup TF Serving based deployment

    Setup TF Serving based deployment

    In this new feature, the following works are expected

    • Update the notebook Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

    • Update the notebook Update the newly created notebook to check the %%timeit on the TF Serving server locally.

    • Build/Commit docker image based on TF Serving base image using this method.

    • Deploy the built docker image on GKE cluster

    • Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

    new feature 
    opened by deep-diver 11
  • Perform load testing with Locust

    Perform load testing with Locust

    Resources:

    • https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7
    • https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html
    • https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
    opened by sayakpaul 10
  • 4 dockerize

    4 dockerize

    fix

    • move api/utils/requirements.txt to /api
    • add missing dependency python-multipart to the requirements.txt

    add

    • Dockerfile

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4

    opened by deep-diver 4
  • Deployment on GKE with GitHub Actions

    Deployment on GKE with GitHub Actions

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

    opened by sayakpaul 2
  • chore: refactored the colab notebook.

    chore: refactored the colab notebook.

    Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

    opened by sayakpaul 2
Owner
Sayak Paul
ML Engineer at @carted | One PR at a time
Sayak Paul
Asynchronous event dispatching/handling library for FastAPI and Starlette

fastapi-events An event dispatching/handling library for FastAPI, and Starlette. Features: straightforward API to emit events anywhere in your code ev

Melvin 238 Jan 07, 2023
🍃 A comprehensive monitoring and alerting solution for the status of your Chia farmer and harvesters.

chia-monitor A monitoring tool to collect all important metrics from your Chia farming node and connected harvesters. It can send you push notificatio

Philipp Normann 153 Oct 21, 2022
FastAPI with async for generating QR codes and bolt11 for Lightning Addresses

sendsats An API for getting QR codes and Bolt11 Invoices from Lightning Addresses. Share anywhere; as a link for tips on a twitter profile, or via mes

Bitkarrot 12 Jan 07, 2023
SuperSaaSFastAPI - Python SaaS Boilerplate for building Software-as-Service (SAAS) apps with FastAPI, Vue.js & Tailwind

Python SaaS Boilerplate for building Software-as-Service (SAAS) apps with FastAP

Rudy Bekker 31 Jan 10, 2023
A request rate limiter for fastapi

fastapi-limiter Introduction FastAPI-Limiter is a rate limiting tool for fastapi routes. Requirements redis Install Just install from pypi pip insta

long2ice 200 Jan 08, 2023
A fast and durable Pub/Sub channel over Websockets. FastAPI + WebSockets + PubSub == ⚡ 💪 ❤️

⚡ 🗞️ FastAPI Websocket Pub/Sub A fast and durable Pub/Sub channel over Websockets. The easiest way to create a live publish / subscribe multi-cast ov

8 Dec 06, 2022
API for Submarino store

submarino-api API for the submarino e-commerce documentation read the documentation in: https://submarino-api.herokuapp.com/docs or in https://submari

Miguel 1 Oct 14, 2021
Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

Jupter Notebook REST API Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as

Invictify 54 Nov 04, 2022
FastAPI simple cache

FastAPI Cache Implements simple lightweight cache system as dependencies in FastAPI. Installation pip install fastapi-cache Usage example from fastapi

Ivan Sushkov 188 Dec 29, 2022
FastAPI client generator

FastAPI-based API Client Generator Generate a mypy- and IDE-friendly API client from an OpenAPI spec. Sync and async interfaces are both available Com

David Montague 283 Jan 04, 2023
This code generator creates FastAPI app from an openapi file.

fastapi-code-generator This code generator creates FastAPI app from an openapi file. This project is an experimental phase. fastapi-code-generator use

Koudai Aono 632 Jan 05, 2023
A rate limiter for Starlette and FastAPI

SlowApi A rate limiting library for Starlette and FastAPI adapted from flask-limiter. Note: this is alpha quality code still, the API may change, and

Laurent Savaete 562 Jan 01, 2023
🐞 A debug toolbar for FastAPI based on the original django-debug-toolbar. 🐞

Debug Toolbar 🐞 A debug toolbar for FastAPI based on the original django-debug-toolbar. 🐞 Swagger UI & GraphQL are supported. Documentation: https:/

Dani 74 Dec 30, 2022
Sample FastAPI project that uses async SQLAlchemy, SQLModel, Postgres, Alembic, and Docker.

FastAPI + SQLModel + Alembic Sample FastAPI project that uses async SQLAlchemy, SQLModel, Postgres, Alembic, and Docker. Want to learn how to build th

228 Jan 02, 2023
FastAPI interesting concepts.

fastapi_related_stuffs FastAPI interesting concepts. FastAPI version :- 0.70 Python3 version :- 3.9.x Steps Test Django Like settings export FASTAPI_S

Mohd Mujtaba 3 Feb 06, 2022
A Python framework to build Slack apps in a flash with the latest platform features.

Bolt for Python A Python framework to build Slack apps in a flash with the latest platform features. Read the getting started guide and look at our co

SlackAPI 684 Jan 09, 2023
Fetching Cryptocurrency Prices from Coingecko and Displaying them on Grafana

cryptocurrency-prices-grafana Fetching Cryptocurrency Prices from Coingecko and Displaying them on Grafana About This stack consists of: Prometheus (t

Ruan Bekker 7 Aug 01, 2022
Voucher FastAPI

Voucher-API Requirement Docker Installed on system Libraries Pandas Psycopg2 FastAPI PyArrow Pydantic Uvicorn How to run Download the repo on your sys

Hassan Munir 1 Jan 26, 2022
✨️🐍 SPARQL endpoint built with RDFLib to serve machine learning models, or any other logic implemented in Python

✨ SPARQL endpoint for RDFLib rdflib-endpoint is a SPARQL endpoint based on a RDFLib Graph to easily serve machine learning models, or any other logic

Vincent Emonet 27 Dec 19, 2022
FastAPI Auth Starter Project

This is a template for FastAPI that comes with authentication preconfigured.

Oluwaseyifunmi Oyefeso 6 Nov 13, 2022