This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

Overview

MLOps with Vertex AI

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The example use Keras to implement the ML model, TFX to implement the training pipeline, and Model Builder SDK to interact with Vertex AI.

MLOps lifecycle

Getting started

  1. Setup your MLOps environment on Google Cloud.

  2. Start your AI Notebook instance.

  3. Open the JupyterLab then open a new Terminal

  4. Clone the repository to your AI Notebook instance:

    git clone https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai.git
    cd mlops-with-vertex-ai
    
  5. Install the required Python packages:

    pip install tfx==1.2.0 --user
    pip install -r requirements.txt
    

    NOTE: You can ignore the pip dependencies issues. These will be fixed when upgrading to subsequent TFX version.


  6. Upgrade the gcloud components:

    sudo apt-get install google-cloud-sdk
    gcloud components update
    

Dataset Management

The Chicago Taxi Trips dataset is one of public datasets hosted with BigQuery, which includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. The task is to predict whether a given trip will result in a tip > 20%.

The 01-dataset-management notebook covers:

  1. Performing exploratory data analysis on the data in BigQuery.
  2. Creating Vertex AI Dataset resource using the Python SDK.
  3. Generating the schema for the raw data using TensorFlow Data Validation.

ML Development

We experiment with creating a Custom Model using 02-experimentation notebook, which covers:

  1. Preparing the data using Dataflow.
  2. Implementing a Keras classification model.
  3. Training the Keras model with Vertex AI using a pre-built container.
  4. Upload the exported model from Cloud Storage to Vertex AI.
  5. Extract and visualize experiment parameters from Vertex AI Metadata.
  6. Use Vertex AI for hyperparameter tuning.

We use Vertex TensorBoard and Vertex ML Metadata to track, visualize, and compare ML experiments.

In addition, the training steps are formalized by implementing a TFX pipeline. The 03-training-formalization notebook covers implementing and testing the pipeline components interactively.

Training Operationalization

The 04-pipeline-deployment notebook covers executing the CI/CD steps for the training pipeline deployment using Cloud Build. The CI/CD routine is defined in the pipeline-deployment.yaml file, and consists of the following steps:

  1. Clone the repository to the build environment.
  2. Run unit tests.
  3. Run a local e2e test of the TFX pipeline.
  4. Build the ML container image for pipeline steps.
  5. Compile the pipeline.
  6. Upload the pipeline to Cloud Storage.

Continuous Training

After testing, compiling, and uploading the pipeline definition to Cloud Storage, the pipeline is executed with respect to a trigger. We use Cloud Functions and Cloud Pub/Sub as a triggering mechanism. The Cloud Function listens to the Pub/Sub topic, and runs the training pipeline given a message sent to the Pub/Sub topic. The Cloud Function is implemented in src/pipeline_triggering.

The 05-continuous-training notebook covers:

  1. Creating a Cloud Pub/Sub topic.
  2. Deploying a Cloud Function.
  3. Triggering the pipeline.

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

  1. Receive hyper-parameters using hyperparam_gen custom python component.
  2. Extract data from BigQuery using BigQueryExampleGen component.
  3. Validate the raw data using StatisticsGen and ExampleValidator component.
  4. Process the data using on Dataflow Transform component.
  5. Train a custom model with Vertex AI using Trainer component.
  6. Evaluate and validate the custom model using ModelEvaluator component.
  7. Save the blessed to model registry location in Cloud Storage using Pusher component.
  8. Upload the model to Vertex AI using vertex_model_pusher custom python component.

Model Deployment

The 06-model-deployment notebook covers executing the CI/CD steps for the model deployment using Cloud Build. The CI/CD routine is defined in build/model-deployment.yaml file, and consists of the following steps:

  1. Test model interface.
  2. Create an endpoint in Vertex AI.
  3. Deploy the model to the endpoint.
  4. Test the Vertex AI endpoint.

Prediction Serving

We serve the deployed model for prediction. The 07-prediction-serving notebook covers:

  1. Use the Vertex AI endpoint for online prediction.
  2. Use the Vertex AI uploaded model for batch prediction.
  3. Run the batch prediction using Vertex Pipelines.

Model Monitoring

After a model is deployed in for prediction serving, continuous monitoring is set up to ensure that the model continue to perform as expected. The 08-model-monitoring notebook covers configuring Vertex AI Model Monitoring for skew and drift detection:

  1. Set skew and drift threshold.
  2. Create a monitoring job for all the models under and endpoint.
  3. List the monitoring jobs.
  4. List artifacts produced by monitoring job.
  5. Pause and delete the monitoring job.

Metadata Tracking

You can view the parameters and metrics logged by your experiments, as well as the artifacts and metadata stored by your Vertex Pipelines in Cloud Console.

Disclaimer

This is not an official Google product but sample code provided for an educational purpose.


Copyright 2021 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Google Cloud Platform
Google Cloud Platform
[ICCV 2021] Learning A Single Network for Scale-Arbitrary Super-Resolution

ArbSR Pytorch implementation of "Learning A Single Network for Scale-Arbitrary Super-Resolution", ICCV 2021 [Project] [arXiv] Highlights A plug-in mod

Longguang Wang 229 Dec 30, 2022
pix2pix in tensorflow.js

pix2pix in tensorflow.js This repo is moved to https://github.com/yining1023/pix2pix_tensorflowjs_lite See a live demo here: https://yining1023.github

Yining Shi 47 Oct 04, 2022
Faster Convex Lipschitz Regression

Faster Convex Lipschitz Regression This reepository provides a python implementation of our Faster Convex Lipschitz Regression algorithm with GPU and

Ali Siahkamari 0 Nov 19, 2021
Optical Character Recognition + Instance Segmentation for russian and english languages

Распознавание рукописного текста в школьных тетрадях Соревнование, проводимое в рамках олимпиады НТО, разработанное Сбером. Платформа ODS. Результаты

Gerasimov Maxim 21 Dec 19, 2022
General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

General Virtual Sketching Framework for Vector Line Art - SIGGRAPH 2021 Paper | Project Page Outline Dependencies Testing with Trained Weights Trainin

Haoran MO 118 Dec 27, 2022
Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 03, 2023
COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

COVID-ViT COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models This code is to response to te MIA-COV19 compe

17 Dec 30, 2022
Double pendulum simulator using a symplectic Euler's method and Hamiltonian mechanics

Symplectic Double Pendulum Simulator Double pendulum simulator using a symplectic Euler's method. The program calculates the momentum and position of

Scott Marino 1 Jan 12, 2022
交互式标注软件,暂定名 iann

iann 交互式标注软件,暂定名iann。 安装 按照官网介绍安装paddle。 安装其他依赖 pip install -r requirements.txt 运行 git clone https://github.com/PaddleCV-SIG/iann/ cd iann python iann

294 Dec 30, 2022
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models

This repository contains a collection of resources and papers on Diffusion Models and Score-based Models. If there are any missing valuable resources

5.1k Jan 08, 2023
Code to accompany the paper "Finding Bipartite Components in Hypergraphs", which is published in NeurIPS'21.

Finding Bipartite Components in Hypergraphs This repository contains code to accompany the paper "Finding Bipartite Components in Hypergraphs", publis

Peter Macgregor 5 May 06, 2022
Custom Implementation of Non-Deep Networks

ParNet Custom Implementation of Non-deep Networks arXiv:2110.07641 Ankit Goyal, Alexey Bochkovskiy, Jia Deng, Vladlen Koltun Official Repository https

Pritama Kumar Nayak 20 May 27, 2022
VLGrammar: Grounded Grammar Induction of Vision and Language

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong 27 Dec 23, 2022
PyTorch implementation of probabilistic deep forecast applied to air quality.

Probabilistic Deep Forecast PyTorch implementation of a paper, titled: Probabilistic Deep Learning to Quantify Uncertainty in Air Quality Forecasting

Abdulmajid Murad 13 Nov 16, 2022
Elevation Mapping on GPU.

Elevation Mapping cupy Overview This is a ros package of elevation mapping on GPU. Code are written in python and uses cupy for GPU calculation. * pla

Robotic Systems Lab - Legged Robotics at ETH Zürich 183 Dec 19, 2022
🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Deep CORAL A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation. B Sun, K Saenko, ECCV 2016' Deep CORAL can learn

Andy Hsu 200 Dec 25, 2022
A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

Yunxia Zhao 3 Dec 29, 2022
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

An-Introduction-to-Statistical-Learning This repository contains the exercises and its solution contained in the book An Introduction to Statistical L

2.1k Jan 02, 2023
Keyword-BERT: Keyword-Attentive Deep Semantic Matching

project discription An implementation of the Keyword-BERT model mentioned in my paper Keyword-Attentive Deep Semantic Matching (Plz cite this github r

1 Nov 14, 2021
Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

DNA This repository provides the code of our paper: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Illustration of DNA

Changlin Li 215 Dec 19, 2022