Deploy AutoML as a service using Flask

Overview

AutoML Service

Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving.

The framework implements a fully automated time series classification pipeline, automating both feature engineering and model selection and optimization using Python libraries, TPOT and tsfresh.

Check out the blog post for more info.

Resources:

  • TPOT– Automated feature preprocessing and model optimization tool
  • tsfresh– Automated time series feature engineering and selection
  • Flask– A web development microframework for Python

Architecture

The application exposes both model training and model predictions with a RESTful API. For model training, input data and labels are sent via POST request, a pipeline is trained, and model predictions are accessible via a prediction route.

Pipelines are stored to a unique key, and thus, live predictions can be made on the same data using different feature construction and modeling pipelines.

An automated pipeline for time-series classification.

The model training logic is exposed as a REST endpoint. Raw, labeled training data is uploaded via a POST request and an optimal model is developed.

Raw training data is uploaded via a POST request and a model prediction is returned.

Using the app

View the Jupyter Notebook for an example.

Deploying

# deploy locally
python automl_service.py
# deploy on cloud foundry
cf push

Usage

Train a pipeline:

train_url = 'http://0.0.0.0:8080/train_pipeline'
train_files = {'raw_data': open('data/data_train.json', 'rb'),
               'labels'  : open('data/label_train.json', 'rb'),
               'params'  : open('parameters/train_parameters_model2.yml', 'rb')}

# post request to train pipeline
r_train = requests.post(train_url, files=train_files)
result_df = json.loads(r_train.json())

returns:

{'featureEngParams': {'default_fc_parameters': "['median', 'minimum', 'standard_deviation', 
                                                 'sum_values', 'variance', 'maximum', 
                                                 'length', 'mean']",
                      'impute_function': 'impute',
                      ...},
 'mean_cv_accuracy': 0.865,
 'mean_cv_roc_auc': 0.932,
 'modelId': 1,
 'modelType': "Pipeline(steps=[('stackingestimator', StackingEstimator(estimator=LinearSVC(...))),
                               ('logisticregression', LogisticRegressionClassifier(solver='liblinear',...))])"
 'trainShape': [1647, 8],
 'trainTime': 1.953}

Serve pipeline predictions:

serve_url = 'http://0.0.0.0:8080/serve_prediction'
test_files = {'raw_data': open('data/data_test.json', 'rb'),
              'params' : open('parameters/test_parameters_model2.yml', 'rb')}

# post request to serve predictions from trained pipeline
r_test  = requests.post(serve_url, files=test_files)
result = pd.read_json(r_test.json()).set_index('id')
example_id prediction
1 0.853
2 0.991
3 0.060
4 0.995
5 0.003
... ...

View all trained models:

r = requests.get('http://0.0.0.0:8080/models')
pipelines = json.loads(r.json())
{'1':
    {'mean_cv_accuracy': 0.873,
     'modelType': "RandomForestClassifier(...),
     ...},
 '2':
    {'mean_cv_accuracy': 0.895,
     'modelType': "GradientBoostingClassifier(...),
     ...},
 '3':
    {'mean_cv_accuracy': 0.859,
     'modelType': "LogisticRegressionClassifier(...),
     ...},
...}

Running the tests

Supply a user argument for the host.

# use local app
py.test --host http://0.0.0.0:8080
# use cloud-deployed app
py.test --host http://ROUTE-HERE

Scaling the architecture

For production, I would suggest splitting training and serving into seperate applications, and incorporating a fascade API. Also it would be best to use a shared cache such as Redis or Pivotal Cloud Cache to allow other applications and multiple instances of the pipeline to access the trained model. Here is a potential architecture.

A scalable model training and model serving architecture.

Author

Chris Rawles

Owner
Chris Rawles
...
Chris Rawles
Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

Model Search Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers sp

AriesTriputranto 1 Dec 13, 2021
Ml based project which uses regression technique to predict the price.

Price-Predictor Ml based project which uses regression technique to predict the price. I have used various regression models and finds the model with

Garvit Verma 1 Jul 09, 2022
PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

Lensa 1.1k Jan 04, 2023
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

Daniel Han-Chen 1.4k Jan 01, 2023
This machine learning model was developed for House Prices

This machine learning model was developed for House Prices - Advanced Regression Techniques competition in Kaggle by using several machine learning models such as Random Forest, XGBoost and LightGBM.

serhat_derya 1 Mar 02, 2022
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 07, 2022
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Dec 29, 2022
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 01, 2023
LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch

LibTraffic is a unified, flexible and comprehensive traffic prediction library, which provides researchers with a credibly experimental tool and a convenient development framework. Our library is imp

432 Jan 05, 2023
A simple guide to MLOps through ZenML and its various integrations.

ZenBytes Join our Slack Community and become part of the ZenML family Give the main ZenML repo a GitHub star to show your love ZenBytes is a series of

ZenML 127 Dec 27, 2022
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

Miles Cranmer 924 Jan 03, 2023
30 Days Of Machine Learning Using Pytorch

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

Mayur 119 Nov 24, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models

538 Jan 01, 2023
cleanlab is the data-centric ML ops package for machine learning with noisy labels.

cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and lear

Cleanlab 51 Nov 28, 2022
使用数学和计算机知识投机倒把

偷鸡不成项目集锦 坦率地讲,涉及金融市场的好策略如果公开,必然导致使用的人多,最后策略变差。所以这个仓库只收集我目前失败了的案例。 加密货币组合套利 中国体育彩票预测 我赚不上钱的项目,也许可以帮助更有能力的人去赚钱。

Roy 28 Dec 29, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 02, 2023
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

SUN Group @ UMN 28 Aug 03, 2022