This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Last update: Dec 23, 2022

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.
Then we locally test the API. You can find the instructions within the api directory.
To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:
- Looks for any changes in the specified directory. If there are any changes:
- Builds and pushes the latest Docker image to Google Container Register (GCR).
- Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

Create a k8s cluster on GKE. Here's a relevant resource.
Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.
Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

Configure bucket storage related permissions for the service account:

$ export PROJECT_ID=<PROJECT_ID>
$ export ACCOUNT=<ACCOUNT>

$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.admin

$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.objectAdmin

gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.objectCreator

If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

Set up logging for the k8s pods.
Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments

Feat/locust grpc

@deep-diver currently, the load test runs into:

I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

opened by sayakpaul 18
Setup TF Serving based deployment
In this new feature, the following works are expected

~~Update the notebook~~ Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

~~Update the notebook~~ Update the newly created notebook to check the %%timeit on the TF Serving server locally.

Build/Commit docker image based on TF Serving base image using this method.

Deploy the built docker image on GKE cluster

Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

new feature
opened by deep-diver 11
Perform load testing with Locust
Resources:

https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7

https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html

https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
opened by sayakpaul 10
4 dockerize
fix

move api/utils/requirements.txt to /api

add missing dependency python-multipart to the requirements.txt

add

Dockerfile

Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4
opened by deep-diver 4
Deployment on GKE with GitHub Actions

Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

opened by sayakpaul 2
chore: refactored the colab notebook.

Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

opened by sayakpaul 2

Releases(v1.0.0)

v1.0.0(Feb 21, 2022)

Source code(tar.gz)
Source code(zip)
resnet50_w_preprocessing.onnx(97.42 MB)
resnet50_w_preprocessing_tf.tar.gz(101.89 MB)

Owner

Sayak Paul

ML Engineer at @carted | One PR at a time

GitHub Repository

Starlette middleware for Prerender

Prerender Python Starlette Starlette middleware for Prerender Documentation: https://BeeMyDesk.github.io/prerender-python-starlette/ Source Code: http

14 May 02, 2021

flask extension for integration with the awesome pydantic package

249 Jan 06, 2023

Dead simple CSRF security middleware for Starlette ⭐ and Fast API ⚡

csrf-starlette-fastapi Dead simple CSRF security middleware for Starlette ⭐ and Fast API ⚡ Will work with either a input type="hidden" field or ajax

9 Nov 20, 2022

🐍Pywork is a Yeoman generator to scaffold a Bare-bone Python Application

Pywork python app yeoman generator Yeoman | Npm Pywork | Home PyWork is a Yeoman generator for a basic python-worker project that makes use of Pipenv,

10 Dec 16, 2022

Asynchronous event dispatching/handling library for FastAPI and Starlette

fastapi-events An event dispatching/handling library for FastAPI, and Starlette. Features: straightforward API to emit events anywhere in your code ev

238 Jan 07, 2023

Prometheus exporter for metrics from the MyAudi API

Prometheus Audi Exporter This Prometheus exporter exports metrics that it fetches from the MyAudi API. Usage Checkout submodules Install dependencies

7 Dec 19, 2022

Generate Class & Decorators for your FastAPI project ✨🚀

Classes and Decorators to use FastAPI with class based routing. In particular this allows you to construct an instance of a class and have methods of that instance be route handlers for FastAPI & Pyt

34 Oct 27, 2022

Single Page App with Flask and Vue.js

Developing a Single Page App with FastAPI and Vue.js Want to learn how to build this? Check out the post. Want to use this project? Build the images a

91 Jan 05, 2023

A utility that allows you to use DI in fastapi without Depends()

fastapi-better-di What is this ? fastapi-better-di is a utility that allows you to use DI in fastapi without Depends() Installation pip install fastap

9 May 24, 2022

Keycloak integration for Python FastAPI

FastAPI Keycloak Integration Documentation Introduction Welcome to fastapi-keycloak. This projects goal is to ease the integration of Keycloak (OpenID

113 Dec 31, 2022

Hyperlinks for pydantic models

Hyperlinks for pydantic models In a typical web application relationships between resources are modeled by primary and foreign keys in a database (int

10 Apr 18, 2022

Flood Detection with Google Earth Engine

ee-fastapi: Flood Detection System A ee-fastapi is a simple FastAPI web application for performing flood detection using Google Earth Engine in the ba

69 Jan 06, 2023

A Flask extension that enables or disables features based on configuration.

Flask FeatureFlags This is a Flask extension that adds feature flagging to your applications. This lets you turn parts of your site on or off based on

131 Sep 26, 2022

Adds integration of the Chameleon template language to FastAPI.

fastapi-chameleon Adds integration of the Chameleon template language to FastAPI. If you are interested in Jinja instead, see the sister project: gith

124 Nov 26, 2022

A web application using [FastAPI + streamlit + Docker] Neural Style Transfer (NST) refers to a class of software algorithms that manipulate digital images

Neural Style Transfer Web App - [FastAPI + streamlit + Docker] NST - application based on the Perceptual Losses for Real-Time Style Transfer and Super

3 Dec 05, 2022

Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

Jupter Notebook REST API Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as

54 Nov 04, 2022