Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Related tags

Machine Learningmlops
Overview

Federal University of Rio Grande do Norte

Technology Center

Department of Computer Engineering and Automation

Machine Learning Based Systems Design

References

  • 📚 Noah Gift, Alfredo Deza. Practical MLOps: Operationalizing Machine Learning Models [Link]
  • 📚 Chip Huyen. Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications. [Link]
  • 📚 Hannes Hapke, Catherine Nelson. Building Machine Learning Pipelines. [Link]
  • 📚 Mariano Anaya. Clean Code in Python [Link]
  • 📚 Aurélien Géron. Hands on Machine Learning with Scikit-Learn, Keras and TensorFlow. [Link]
  • 🤜 Dataquest Academic Program [Link]
  • 😃 CS329S - ML Systems Design [Link]
  • 🎯 Machine Learning Operations [Link]

Lessons

Week 01: Course Outline Open in PDF

  • Git and Version Control Open in Dataquest
    • You'll learn how to: a) organize your code using version control, b) resolve conflicts in version control, c) employ Git and Github to collaborate with others.
    • 👊 U1T1: guided project + getting a git repository.

Week 02: CLI fundamentals

  • Elements of the Command Line Open in Dataquest
    • You'll learn how to: a) employ the command line for Data Science, b) modify the behavior of commands with options, c) employ glob patterns and wildcards, d) define Important command line concepts, e) navigate he filesystem, f) manage users and permissions.
  • Text Processing in the Command Line Open in Dataquest
    • You'll learn how to: a) read and explore documentation, b) perform basic text processing, c) redirect and pipe output, d) inspect files, e) define different kinds of output, f) employ streams and file descriptors.
  • 🔠 U1T2: working with command line.

Week 03 - Clean Code Principles for Data Science and Machine Learning Open in PDF

  • Outline Open in Loom
  • Coding Best Practices Open in Loom
  • Writing Clean Code Open in Loom
  • Refactoring Code Open in Loom
  • Efficient Code Open in Loom
  • Documentation Open in Loom
  • Python Code Quality Authority (PCQA) - pycodestyle Open in Loom
  • PCQA - pylint Open in Loom
  • PCQA - autopep8 Open in Loom
  • PCQA - nbQA Open in Loom
  • ▶️ Hands on
    • 💾 Datasets [Link]
    • Writting Clean Code Jupyter
    • Exercise 01 Jupyter
    • Exercise 02 Jupyter
    • Exercise 03 Jupyter
    • Using pycodestyle Jupyter
    • Using pylint - script Python refactored script Python
    • Functions: Advanced - Best practices for writing functions Open in Dataquest

Week 04 Production Ready Code Open in PDF

  • Outline Open in Loom
  • Catching Errors Open in Loom
  • Testing and Data Science Open in Loom
  • A brief introduction about pytest Open in Loom
  • Logging Open in Loom
  • Case study: testing and logging Open in Loom
  • Model Drift Open in Loom
  • Hands on
    • Production ready code Jupyter
    • Data Visualization Fundamentals Open in Dataquest
      • You will learn how to: a) how to use data visualization to explore data and b) how and when to use the most common plots.
    • Storytelling Data Visualization and Information Design Open in Dataquest
      • You will learn how to: a) Create graphs using information design principles, b) create narrative data visualizations using Matplotlib, c) create visual patterns using Gestalt principles, d) control attention using pre-attentive attributes and e) employ Matplotlib's built-in styles.
Owner
Ivanovitch Silva
I'm an experimenter by design, and very interested in technologies related to Data Science & Machine Learning, Vehicles and Complex Networks.
Ivanovitch Silva
TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

MilaGraph 1.1k Jan 08, 2023
使用数学和计算机知识投机倒把

偷鸡不成项目集锦 坦率地讲,涉及金融市场的好策略如果公开,必然导致使用的人多,最后策略变差。所以这个仓库只收集我目前失败了的案例。 加密货币组合套利 中国体育彩票预测 我赚不上钱的项目,也许可以帮助更有能力的人去赚钱。

Roy 28 Dec 29, 2022
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
Predicting job salaries from ads - a Kaggle competition

Predicting job salaries from ads - a Kaggle competition

Zygmunt Zając 57 Oct 23, 2020
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

sklearn-evaluation Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking, and Jupyter notebook analysis. Suppo

Eduardo Blancas 354 Dec 31, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 09, 2023
A simple example of ML classification, cross validation, and visualization of feature importances

Simple-Classifier This is a basic example of how to use several different libraries for classification and ensembling, mostly with sklearn. Example as

Rob 2 Aug 25, 2022
🎛 Distributed machine learning made simple.

🎛 lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
Greykite: A flexible, intuitive and fast forecasting library

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

LinkedIn 1.7k Jan 04, 2023
Lightning ⚡️ fast forecasting with statistical and econometric models.

Nixtla Statistical ⚡️ Forecast Lightning fast forecasting with statistical and econometric models StatsForecast offers a collection of widely used uni

Nixtla 2.1k Dec 29, 2022
Python Automated Machine Learning library for tabular data.

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie

Daniel Khromov 47 Dec 17, 2022
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis. It is distributed under the MIT License.

Jeong-Yoon Lee 720 Dec 25, 2022
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

Brett Vogelsang 2 Jan 18, 2022
ml4ir: Machine Learning for Information Retrieval

ml4ir: Machine Learning for Information Retrieval | changelog Quickstart → ml4ir Read the Docs | ml4ir pypi | python ReadMe ml4ir is an open source li

Salesforce 77 Jan 06, 2023
Python 3.6+ toolbox for submitting jobs to Slurm

Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps

Facebook Incubator 768 Jan 03, 2023
Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

David Burns 536 Dec 29, 2022
A collection of machine learning examples and tutorials.

machine_learning_examples A collection of machine learning examples and tutorials.

LazyProgrammer.me 7.1k Jan 01, 2023
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

154 Dec 17, 2022
LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading. The framework simplify development, testing, deployment, analysis and training algo trading strategies

Amichay Oren 458 Dec 24, 2022