This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

Overview

MLOps with Vertex AI

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The example use Keras to implement the ML model, TFX to implement the training pipeline, and Model Builder SDK to interact with Vertex AI.

MLOps lifecycle

Getting started

  1. Setup your MLOps environment on Google Cloud.

  2. Start your AI Notebook instance.

  3. Open the JupyterLab then open a new Terminal

  4. Clone the repository to your AI Notebook instance:

    git clone https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai.git
    cd mlops-with-vertex-ai
    
  5. Install the required Python packages:

    pip install tfx==1.2.0 --user
    pip install -r requirements.txt
    

    NOTE: You can ignore the pip dependencies issues. These will be fixed when upgrading to subsequent TFX version.


  6. Upgrade the gcloud components:

    sudo apt-get install google-cloud-sdk
    gcloud components update
    

Dataset Management

The Chicago Taxi Trips dataset is one of public datasets hosted with BigQuery, which includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. The task is to predict whether a given trip will result in a tip > 20%.

The 01-dataset-management notebook covers:

  1. Performing exploratory data analysis on the data in BigQuery.
  2. Creating Vertex AI Dataset resource using the Python SDK.
  3. Generating the schema for the raw data using TensorFlow Data Validation.

ML Development

We experiment with creating a Custom Model using 02-experimentation notebook, which covers:

  1. Preparing the data using Dataflow.
  2. Implementing a Keras classification model.
  3. Training the Keras model with Vertex AI using a pre-built container.
  4. Upload the exported model from Cloud Storage to Vertex AI.
  5. Extract and visualize experiment parameters from Vertex AI Metadata.
  6. Use Vertex AI for hyperparameter tuning.

We use Vertex TensorBoard and Vertex ML Metadata to track, visualize, and compare ML experiments.

In addition, the training steps are formalized by implementing a TFX pipeline. The 03-training-formalization notebook covers implementing and testing the pipeline components interactively.

Training Operationalization

The 04-pipeline-deployment notebook covers executing the CI/CD steps for the training pipeline deployment using Cloud Build. The CI/CD routine is defined in the pipeline-deployment.yaml file, and consists of the following steps:

  1. Clone the repository to the build environment.
  2. Run unit tests.
  3. Run a local e2e test of the TFX pipeline.
  4. Build the ML container image for pipeline steps.
  5. Compile the pipeline.
  6. Upload the pipeline to Cloud Storage.

Continuous Training

After testing, compiling, and uploading the pipeline definition to Cloud Storage, the pipeline is executed with respect to a trigger. We use Cloud Functions and Cloud Pub/Sub as a triggering mechanism. The Cloud Function listens to the Pub/Sub topic, and runs the training pipeline given a message sent to the Pub/Sub topic. The Cloud Function is implemented in src/pipeline_triggering.

The 05-continuous-training notebook covers:

  1. Creating a Cloud Pub/Sub topic.
  2. Deploying a Cloud Function.
  3. Triggering the pipeline.

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

  1. Receive hyper-parameters using hyperparam_gen custom python component.
  2. Extract data from BigQuery using BigQueryExampleGen component.
  3. Validate the raw data using StatisticsGen and ExampleValidator component.
  4. Process the data using on Dataflow Transform component.
  5. Train a custom model with Vertex AI using Trainer component.
  6. Evaluate and validate the custom model using ModelEvaluator component.
  7. Save the blessed to model registry location in Cloud Storage using Pusher component.
  8. Upload the model to Vertex AI using vertex_model_pusher custom python component.

Model Deployment

The 06-model-deployment notebook covers executing the CI/CD steps for the model deployment using Cloud Build. The CI/CD routine is defined in build/model-deployment.yaml file, and consists of the following steps:

  1. Test model interface.
  2. Create an endpoint in Vertex AI.
  3. Deploy the model to the endpoint.
  4. Test the Vertex AI endpoint.

Prediction Serving

We serve the deployed model for prediction. The 07-prediction-serving notebook covers:

  1. Use the Vertex AI endpoint for online prediction.
  2. Use the Vertex AI uploaded model for batch prediction.
  3. Run the batch prediction using Vertex Pipelines.

Model Monitoring

After a model is deployed in for prediction serving, continuous monitoring is set up to ensure that the model continue to perform as expected. The 08-model-monitoring notebook covers configuring Vertex AI Model Monitoring for skew and drift detection:

  1. Set skew and drift threshold.
  2. Create a monitoring job for all the models under and endpoint.
  3. List the monitoring jobs.
  4. List artifacts produced by monitoring job.
  5. Pause and delete the monitoring job.

Metadata Tracking

You can view the parameters and metrics logged by your experiments, as well as the artifacts and metadata stored by your Vertex Pipelines in Cloud Console.

Disclaimer

This is not an official Google product but sample code provided for an educational purpose.


Copyright 2021 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Google Cloud Platform
Google Cloud Platform
A tensorflow implementation of GCN-LPA

GCN-LPA This repository is the implementation of GCN-LPA (arXiv): Unifying Graph Convolutional Neural Networks and Label Propagation Hongwei Wang, Jur

Hongwei Wang 83 Nov 28, 2022
TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification [NeurIPS 2021] Abstract Multiple instance learn

132 Dec 30, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 03, 2023
Official implementation of VaxNeRF (Voxel-Accelearated NeRF).

VaxNeRF Paper | Google Colab This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF). VaxNeRF provides very fast training and slightl

naruya 132 Nov 21, 2022
Effective Use of Transformer Networks for Entity Tracking

Effective Use of Transformer Networks for Entity Tracking (EMNLP19) This is a PyTorch implementation of our EMNLP paper on the effectiveness of pre-tr

5 Nov 06, 2021
Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

Ivan Skorokhodov 203 Jan 03, 2023
Tf alloc - Simplication of GPU allocation for Tensorflow2

tf_alloc Simpliying GPU allocation for Tensorflow Developer: korkite (Junseo Ko)

Junseo Ko 3 Feb 10, 2022
Unofficial Implement PU-Transformer

PU-Transformer-pytorch Pytorch unofficial implementation of PU-Transformer (PU-Transformer: Point Cloud Upsampling Transformer) https://arxiv.org/abs/

Lee Hyung Jun 7 Sep 21, 2022
A code generator from ONNX to PyTorch code

onnx-pytorch Generating pytorch code from ONNX. Currently support onnx==1.9.0 and torch==1.8.1. Installation From PyPI pip install onnx-pytorch From

Wenhao Hu 94 Jan 06, 2023
A package, and script, to perform imaging transcriptomics on a neuroimaging scan.

Imaging Transcriptomics Imaging transcriptomics is a methodology that allows to identify patterns of correlation between gene expression and some prop

Alessio Giacomel 10 Dec 27, 2022
PyTorch deep learning projects made easy.

PyTorch Template Project PyTorch deep learning project made easy. PyTorch Template Project Requirements Features Folder Structure Usage Config file fo

Victor Huang 3.8k Jan 01, 2023
BTC-Generator - BTC Generator With Python

Что такое BTC-Generator? Это генератор чеков всеми любимого @BTC_BANKER_BOT Для

DoomGod 3 Aug 24, 2022
This repository contains implementations and illustrative code to accompany DeepMind publications

DeepMind Research This repository contains implementations and illustrative code to accompany DeepMind publications. Along with publishing papers to a

DeepMind 11.3k Dec 31, 2022
Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks

pix2vox [Demonstration video] Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks. Generated samples Single-category generation M

Takumi Moriya 232 Nov 14, 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 08, 2023
RepVGG: Making VGG-style ConvNets Great Again

This repository is the code that needs to be submitted for OpenMMLab Algorithm Ecological Challenge,the paper is RepVGG: Making VGG-style ConvNets Great Again

Ty Feng 62 May 21, 2022
MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

DeciForce: Crossroads of Machine Perception and Autonomy 276 Jan 04, 2023
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Multipath RefineNet A MATLAB based framework for semantic image segmentation and general dense prediction tasks on images. This is the source code for

Guosheng Lin 575 Dec 06, 2022
Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

Music Trees Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Ins

Hugo Flores García 32 Nov 22, 2022