Lightweight Machine Learning Experiment Logging 📖

Last update: Dec 08, 2022

Related tags

Overview

A Lightweight Logger for ML Experiments 📖

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart checkout the notebook blog 🚀

The API 🎮

from mle_logging import MLELogger

# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch')

time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 0.1234, 'test_loss': 0.1235}

# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()

You can also log model checkpoints, matplotlib figures and other .pkl compatible objects.

# Save a model (torch, tensorflow, sklearn, jax, numpy)
import torchvision.models as models
model = models.resnet18()
log.save_model(model)

# Save a matplotlib figure as .png
fig, ax = plt.subplots()
log.save_plot(fig)

# You can also save (somewhat) arbitrary objects .pkl
some_dict = {"hi" : "there"}
log.save_extra(some_dict)

Or do everything in a single line...

log.update(time_tic, stats_tic, model, fig, extra, save=True)

File Structure & Re-Loading 📚

The MLELogger will create a nested directory, which looks as follows:

experiment_dir
├── extra: Stores saved .pkl object files
├── figures: Stores saved .png figures
├── logs: Stores .hdf5 log files (meta, stats, time)
├── models: Stores different model checkpoints
    ├── final: Stores most recent checkpoint
    ├── every_k: Stores every k-th checkpoint provided in update
    ├── top_k: Stores portfolio of top-k checkpoints based on performance
├── tboards: Stores tensorboards for model checkpointing
├── .json: Copy of configuration file (if provided)

For visualization and post-processing load the results via

>> log_out.meta.keys() # odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type']) # >>> log_out.stats.keys() # odict_keys(['test_loss', 'train_loss']) # >>> log_out.time.keys() # odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed']) ">

from mle_logging import load_log
log_out = load_log("experiment_dir/")

# The results can be accessed via meta, stats and time keys
# >>> log_out.meta.keys()
# odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type'])
# >>> log_out.stats.keys()
# odict_keys(['test_loss', 'train_loss'])
# >>> log_out.time.keys()
# odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed'])

If an experiment was aborted, you can reload and continue the previous run via the reload=True option:

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch',
                reload=True)

Installation ⏳

A PyPI installation is available via:

pip install mle-logging

Alternatively, you can clone this repository and afterwards 'manually' install it:

git clone https://github.com/RobertTLange/mle-logging.git
cd mle-logging
pip install -e .

Advanced Options 🚴

Merging Multiple Logs 👫

Merging Multiple Random Seeds 🌱 + 🌱

>> log.eval_ids # ['seed_1', 'seed_2'] ">

from mle_logging import merge_seed_logs
merge_seed_logs("multi_seed.hdf", "experiment_dir/")
log_out = load_log("experiment_dir/")
# >>> log.eval_ids
# ['seed_1', 'seed_2']

Merging Multiple Configurations 🔖 + 🔖

>> log.eval_ids # ['config_2', 'config_1'] # >>> meta_log.config_1.stats.test_loss.keys() # odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90'])) ">

from mle_logging import merge_config_logs, load_meta_log
merge_config_logs(experiment_dir="experiment_dir/",
                  all_run_ids=["config_1", "config_2"])
meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
# >>> log.eval_ids
# ['config_2', 'config_1']
# >>> meta_log.config_1.stats.test_loss.keys()
# odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90']))

Plotting of Logs 🧑‍🎨

meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
meta_log.plot("train_loss", "num_updates")

Storing Checkpoint Portfolios 📂

Logging every k-th checkpoint update ❗ ⏩ ... ⏩ ❗

# Save every second checkpoint provided in log.update (stored in models/every_k)
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir='every_k_dir/',
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_every_k_ckpt=2)

Logging top-k checkpoints based on metric 🔱

# Save top-3 checkpoints provided in log.update (stored in models/top_k)
# Based on minimizing the test_loss metric
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="top_k_dir/",
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_top_k_ckpt=3,
                top_k_metric_name="test_loss",
                top_k_minimize_metric=True)

Development & Milestones for Next Release

You can run the test suite via python -m pytest -vv tests/. If you find a bug or are missing your favourite feature, feel free to contact me @RobertTLange or create an issue 🤗 . Here are some features I want to implement for the next release:

Add a progress bar if total number of updates is specified
Add Weights and Biases Backend Support
Extend Tensorboard logging (for JAX/TF models)

Comments

Make `pickle5` requirement Python version dependent

The pickle5 dependency forces python < 3.8. If I understand it correctly, pickle5 is only there to backport pickle features that were added with Python 3.8, right? I modified the dependency to only apply for Python < 3.8. With this I was able to install mle-logging in my Python 3.9 environment.

I also modified the only place where pickle5 was used. Didn't test anything, I was hoping this PR would trigger some tests to make sure I didn't break anything (didn't want to install all those test dependencies locally :P).

opened by denisalevi 2
Missing sample json config files break colab demo

Hello!

Just read your blogpost and ~50% of the way through the colab demo, and I have to say that so far it looks like this project has the potential to be profoundly clarifying in how it simplifies & abstracts various pieces of key experiment logic that otherwise suffers from unnecessary complexity. As a PhD student who has had to refactor my whole experimental configuration workflow more times than I would like to admit to even myself, I'm super excited to try out your logger!

I'd also like to commend you for how to-the-point your choice of explanatory examples were for the blogpost. Too many frameworks fill their docs with a bunch of overly-simplistic toy problems and fail to bridge the gap between these and a real experimental situation (e.g. the elegant layout of your multi-seed, multi-config experiment

That said, my experience working through your demo was interrupted once I reached the section "Log Different Random Seeds for Same Configuration". It seems this code cell references a file called "config_1.json", which doesnt exist. While I'm sure I could figure out a simple json file with 1-2 example items, this kind of guesswork distracts immensely from the otherwise very elegant flow from simple to complex that you've set up. I also assume your target audience stretches further than experienced coders, so providing a simple demo config file to reduce the time from reading->coding seems worthwhile.

tldr; the colab needs 1-2 demo config json files

opened by JacobARose 1
Add `wandb` support

I want to add a weights&biases backend which performs automatic grouping across seeds/search experiments. The credentials can be passed as options at initialization of MLELogger and a WandbLogger object has to be added.

When calling log.update this will then automatically forward all info with correct grouping by project/search/config/seed to W&B.

Think about how to integrate gradients/weights from flax/jax models in a natural way (tree flattening?).

opened by RobertTLange 0
Merge `experiment_dir` for different seeds into single one
I would like to have utilities for merging two experiments which are identical except for the seed_id they used (probably only for the multiple-configs case). Steps should include something like this:

Check that experiments are actually identical.

Identify different seeds.

Create new results directory.

Copy over extra/, figures/ for different seeds.

Open both logs (for all configs) and combine them.

Clean-up old directories for different experiments.
opened by RobertTLange 0

[Bug] "OSError: Can't write data" if `what_to_track` has certain Types

Code to recreate:

from mle_logging import MLELogger

# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                config_dict={"train_config": {"lrate": 0.01}},
                use_tboard=False,
                model_type='torch',
                print_every_k_updates=1,
                verbose=True)

# Save some time series statistics
time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 1, 'test_loss': 1}

# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()

Output from the console:

Traceback (most recent call last):
  File "mle-log-test.py", line 19, in <module>
    log.save()
  File "/home/luc/.local/lib/python3.8/site-packages/mle_logging/mle_logger.py", line 417, in save
    write_to_hdf5(
  File "/home/luc/.local/lib/python3.8/site-packages/mle_logging/utils.py", line 74, in write_to_hdf5
    h5f.create_dataset(
  File "/home/luc/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 149, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/luc/.local/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 143, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw
OSError: Can't write data (no appropriate function for conversion path)

The above code is essentially the Getting Started code with the what_to_track Float values swapped out for Ints. If only 1 of the Floats is swapped for an Int, it still works (I guess it casts the Int to a Float?). I also found the same issue if the what_to_track values are Floats from a DeviceArray.

Please let me know if you have any suggestions or questions!

opened by DiamonDiva 0

Releases(v0.0.4)

v0.0.4(Dec 7, 2021)
[x] Add plot details (title, labels) to meta_log.plot()

[x] Get rid of time string in sub directories

[x] Make log merging more robust

[x] Small fixes for mle-monitor release

[x] Fix overwrite and make verbose warning

Source code(tar.gz)
Source code(zip)
v0.0.3(Sep 11, 2021)
🎉 Mini-release getting rid of small bugs and adding functionality (🐛 & 📈 ) :

Add function to store initial model checkpoint for post-processing via log.save_init_model(model).

Fix byte decoding for strings stored as arrays in .hdf5 log file. Previously this only worked for multi seed/config settings.

MLELogger got a new optional argument: config_dict, which allows you to provide a (nested) configuration of your experiment. It will be stored as a .yaml file if you don't provide a path to an alternative configuration file. The file can either be a .json or a .yaml:

log = MLELogger(time_to_track=['num_updates', 'num_epochs'], what_to_track=['train_loss', 'test_loss'], experiment_dir="experiment_dir/", config_dict={"train_config": {"lrate": 0.01}}, model_type='torch', verbose=True)

The config_dict/ loaded config_fname data will be stored in the meta data of the loaded log and can be easily retrieved:

log = load_log("experiment_dir/") log.meta.config_dict
Source code(tar.gz)
Source code(zip)
v0.0.2(Aug 23, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.1(Aug 18, 2021)

First release of mle-logging utilities.
Source code(tar.gz)
Source code(zip)

Owner

Robert Lange

Deep Something @ TU Berlin 🕵️

GitHub Repository

Kaggle Competition using 15 numerical predictors to predict a continuous outcome.

Kaggle-Comp.-Data-Mining Kaggle Competition using 15 numerical predictors to predict a continuous outcome as part of a final project for a stats data

1 Dec 28, 2021

Drug prediction

I have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Dr

1 Jan 28, 2022

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning. It is a Web Application.

3 Aug 04, 2022

LinearRegression2 Tvads and CarSales

LinearRegression2_Tvads_and_CarSales This project infers the insight that how the TV ads for cars and car Sales are being linked with each other. It i

1 Dec 29, 2021

李航《统计学习方法》复现

本项目复现李航《统计学习方法》每一章节的算法特点：笔记摘要：在每个文件开头都会有一些核心的摘要 pythonic：这里会用尽可能规范的方式来实现，包括编程风格几乎严格按照PEP8 循序渐进：前期的算法会更list的方式来做计算，可读性比较强，后期几乎完全为numpy.array的计算，并且辅助详

58 Oct 22, 2021

This is the code repository for LRM Stochastic watershed model.

LRM-Squannacook Input data for generating stochastic streamflows are observed and simulated timeseries of streamflow. their format needs to be CSV wit

1 Feb 14, 2022

An AutoML survey focusing on practical systems.

This project is a community effort in constructing and maintaining an up-to-date beginner-friendly introduction to AutoML, focusing on practical systems. AutoML is a big field, and continues to grow

16 Aug 14, 2022

machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made th

73 Dec 01, 2022

List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

11.7k Dec 30, 2022

This jupyter notebook project was completed by me and my friend using the dataset from Kaggle

ARM This jupyter notebook project was completed by me and my friend using the dataset from Kaggle. The world Happiness 2017, which ranks 155 countries

1 Jan 23, 2022

A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

18.2k Jan 02, 2023

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 05, 2023

Lightweight Machine Learning Experiment Logging 📖

Related tags

Overview

A Lightweight Logger for ML Experiments 📖

The API 🎮

File Structure & Re-Loading 📚

Installation ⏳

Advanced Options 🚴

Merging Multiple Logs 👫

Plotting of Logs 🧑‍🎨

Storing Checkpoint Portfolios 📂

Development & Milestones for Next Release

Comments

Make `pickle5` requirement Python version dependent

Missing sample json config files break colab demo

Add `wandb` support

Merge `experiment_dir` for different seeds into single one

[Bug] "OSError: Can't write data" if `what_to_track` has certain Types

Releases(v0.0.4)

v0.0.4(Dec 7, 2021)

v0.0.3(Sep 11, 2021)

v0.0.2(Aug 23, 2021)

v0.0.1(Aug 18, 2021)

Owner

Robert Lange

Kaggle Competition using 15 numerical predictors to predict a continuous outcome.

Drug prediction

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

LinearRegression2 Tvads and CarSales

李航《统计学习方法》复现

This is the code repository for LRM Stochastic watershed model.

An AutoML survey focusing on practical systems.

machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

List of Data Science Cheatsheets to rule the world

This jupyter notebook project was completed by me and my friend using the dataset from Kaggle

A game theoretic approach to explain the output of any machine learning model.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Distributed scikit-learn meta-estimators in PySpark

Library of Stan Models for Survival Analysis

A high performance and generic framework for distributed DNN training

Predicting diabetes over a five year period using logistic regression and the Pima First-Nation dataset

Reproducibility and Replicability of Web Measurement Studies

Project to deploy a machine learning model based on Titanic dataset from Kaggle

Lightning ⚡️ fast forecasting with statistical and econometric models.

A simple guide to MLOps through ZenML and its various integrations.