Lightweight Machine Learning Experiment Logging ๐Ÿ“–

Overview

A Lightweight Logger for ML Experiments ๐Ÿ“–

Pyversions PyPI version Code style: black Colab

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart checkout the notebook blog ๐Ÿš€

The API ๐ŸŽฎ

from mle_logging import MLELogger

# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch')

time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 0.1234, 'test_loss': 0.1235}

# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()

You can also log model checkpoints, matplotlib figures and other .pkl compatible objects.

# Save a model (torch, tensorflow, sklearn, jax, numpy)
import torchvision.models as models
model = models.resnet18()
log.save_model(model)

# Save a matplotlib figure as .png
fig, ax = plt.subplots()
log.save_plot(fig)

# You can also save (somewhat) arbitrary objects .pkl
some_dict = {"hi" : "there"}
log.save_extra(some_dict)

Or do everything in a single line...

log.update(time_tic, stats_tic, model, fig, extra, save=True)

File Structure & Re-Loading ๐Ÿ“š

The MLELogger will create a nested directory, which looks as follows:

experiment_dir
โ”œโ”€โ”€ extra: Stores saved .pkl object files
โ”œโ”€โ”€ figures: Stores saved .png figures
โ”œโ”€โ”€ logs: Stores .hdf5 log files (meta, stats, time)
โ”œโ”€โ”€ models: Stores different model checkpoints
    โ”œโ”€โ”€ final: Stores most recent checkpoint
    โ”œโ”€โ”€ every_k: Stores every k-th checkpoint provided in update
    โ”œโ”€โ”€ top_k: Stores portfolio of top-k checkpoints based on performance
โ”œโ”€โ”€ tboards: Stores tensorboards for model checkpointing
โ”œโ”€โ”€ .json: Copy of configuration file (if provided)

For visualization and post-processing load the results via

>> log_out.meta.keys() # odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type']) # >>> log_out.stats.keys() # odict_keys(['test_loss', 'train_loss']) # >>> log_out.time.keys() # odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed']) ">
from mle_logging import load_log
log_out = load_log("experiment_dir/")

# The results can be accessed via meta, stats and time keys
# >>> log_out.meta.keys()
# odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type'])
# >>> log_out.stats.keys()
# odict_keys(['test_loss', 'train_loss'])
# >>> log_out.time.keys()
# odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed'])

If an experiment was aborted, you can reload and continue the previous run via the reload=True option:

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch',
                reload=True)

Installation โณ

A PyPI installation is available via:

pip install mle-logging

Alternatively, you can clone this repository and afterwards 'manually' install it:

git clone https://github.com/RobertTLange/mle-logging.git
cd mle-logging
pip install -e .

Advanced Options ๐Ÿšด

Merging Multiple Logs ๐Ÿ‘ซ

Merging Multiple Random Seeds ๐ŸŒฑ + ๐ŸŒฑ

>> log.eval_ids # ['seed_1', 'seed_2'] ">
from mle_logging import merge_seed_logs
merge_seed_logs("multi_seed.hdf", "experiment_dir/")
log_out = load_log("experiment_dir/")
# >>> log.eval_ids
# ['seed_1', 'seed_2']

Merging Multiple Configurations ๐Ÿ”– + ๐Ÿ”–

>> log.eval_ids # ['config_2', 'config_1'] # >>> meta_log.config_1.stats.test_loss.keys() # odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90'])) ">
from mle_logging import merge_config_logs, load_meta_log
merge_config_logs(experiment_dir="experiment_dir/",
                  all_run_ids=["config_1", "config_2"])
meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
# >>> log.eval_ids
# ['config_2', 'config_1']
# >>> meta_log.config_1.stats.test_loss.keys()
# odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90']))

Plotting of Logs ๐Ÿง‘โ€๐ŸŽจ

meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
meta_log.plot("train_loss", "num_updates")

Storing Checkpoint Portfolios ๐Ÿ“‚

Logging every k-th checkpoint update โ— โฉ ... โฉ โ—

# Save every second checkpoint provided in log.update (stored in models/every_k)
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir='every_k_dir/',
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_every_k_ckpt=2)

Logging top-k checkpoints based on metric ๐Ÿ”ฑ

# Save top-3 checkpoints provided in log.update (stored in models/top_k)
# Based on minimizing the test_loss metric
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="top_k_dir/",
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_top_k_ckpt=3,
                top_k_metric_name="test_loss",
                top_k_minimize_metric=True)

Development & Milestones for Next Release

You can run the test suite via python -m pytest -vv tests/. If you find a bug or are missing your favourite feature, feel free to contact me @RobertTLange or create an issue ๐Ÿค— . Here are some features I want to implement for the next release:

  • Add a progress bar if total number of updates is specified
  • Add Weights and Biases Backend Support
  • Extend Tensorboard logging (for JAX/TF models)
Comments
  • Make `pickle5` requirement Python version dependent

    Make `pickle5` requirement Python version dependent

    The pickle5 dependency forces python < 3.8. If I understand it correctly, pickle5 is only there to backport pickle features that were added with Python 3.8, right? I modified the dependency to only apply for Python < 3.8. With this I was able to install mle-logging in my Python 3.9 environment.

    I also modified the only place where pickle5 was used. Didn't test anything, I was hoping this PR would trigger some tests to make sure I didn't break anything (didn't want to install all those test dependencies locally :P).

    opened by denisalevi 2
  • Missing sample json config files break colab demo

    Missing sample json config files break colab demo

    Hello!

    Just read your blogpost and ~50% of the way through the colab demo, and I have to say that so far it looks like this project has the potential to be profoundly clarifying in how it simplifies & abstracts various pieces of key experiment logic that otherwise suffers from unnecessary complexity. As a PhD student who has had to refactor my whole experimental configuration workflow more times than I would like to admit to even myself, I'm super excited to try out your logger!

    I'd also like to commend you for how to-the-point your choice of explanatory examples were for the blogpost. Too many frameworks fill their docs with a bunch of overly-simplistic toy problems and fail to bridge the gap between these and a real experimental situation (e.g. the elegant layout of your multi-seed, multi-config experiment

    That said, my experience working through your demo was interrupted once I reached the section "Log Different Random Seeds for Same Configuration". It seems this code cell references a file called "config_1.json", which doesnt exist. While I'm sure I could figure out a simple json file with 1-2 example items, this kind of guesswork distracts immensely from the otherwise very elegant flow from simple to complex that you've set up. I also assume your target audience stretches further than experienced coders, so providing a simple demo config file to reduce the time from reading->coding seems worthwhile.

    tldr; the colab needs 1-2 demo config json files

    opened by JacobARose 1
  • Add `wandb` support

    Add `wandb` support

    I want to add a weights&biases backend which performs automatic grouping across seeds/search experiments. The credentials can be passed as options at initialization of MLELogger and a WandbLogger object has to be added.

    When calling log.update this will then automatically forward all info with correct grouping by project/search/config/seed to W&B.

    Think about how to integrate gradients/weights from flax/jax models in a natural way (tree flattening?).

    opened by RobertTLange 0
  • Merge `experiment_dir` for different seeds into single one

    Merge `experiment_dir` for different seeds into single one

    I would like to have utilities for merging two experiments which are identical except for the seed_id they used (probably only for the multiple-configs case). Steps should include something like this:

      1. Check that experiments are actually identical.
      1. Identify different seeds.
      1. Create new results directory.
      1. Copy over extra/, figures/ for different seeds.
      1. Open both logs (for all configs) and combine them.
      1. Clean-up old directories for different experiments.
    opened by RobertTLange 0
  • [Bug]

    [Bug] "OSError: Can't write data" if `what_to_track` has certain Types

    Code to recreate:

    from mle_logging import MLELogger
    
    # Instantiate logging to experiment_dir
    log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                    what_to_track=['train_loss', 'test_loss'],
                    experiment_dir="experiment_dir/",
                    config_dict={"train_config": {"lrate": 0.01}},
                    use_tboard=False,
                    model_type='torch',
                    print_every_k_updates=1,
                    verbose=True)
    
    # Save some time series statistics
    time_tic = {'num_updates': 10, 'num_epochs': 1}
    stats_tic = {'train_loss': 1, 'test_loss': 1}
    
    # Update the log with collected data & save it to .hdf5
    log.update(time_tic, stats_tic)
    log.save()
    

    Output from the console:

    Traceback (most recent call last):
      File "mle-log-test.py", line 19, in <module>
        log.save()
      File "/home/luc/.local/lib/python3.8/site-packages/mle_logging/mle_logger.py", line 417, in save
        write_to_hdf5(
      File "/home/luc/.local/lib/python3.8/site-packages/mle_logging/utils.py", line 74, in write_to_hdf5
        h5f.create_dataset(
      File "/home/luc/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 149, in create_dataset
        dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
      File "/home/luc/.local/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 143, in make_new_dset
        dset_id.write(h5s.ALL, h5s.ALL, data)
      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
      File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
      File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw
    OSError: Can't write data (no appropriate function for conversion path)
    

    The above code is essentially the Getting Started code with the what_to_track Float values swapped out for Ints. If only 1 of the Floats is swapped for an Int, it still works (I guess it casts the Int to a Float?). I also found the same issue if the what_to_track values are Floats from a DeviceArray.

    Please let me know if you have any suggestions or questions!

    opened by DiamonDiva 0
Releases(v0.0.4)
  • v0.0.4(Dec 7, 2021)

    • [x] Add plot details (title, labels) to meta_log.plot()
    • [x] Get rid of time string in sub directories
    • [x] Make log merging more robust
    • [x] Small fixes for mle-monitor release
    • [x] Fix overwrite and make verbose warning
    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Sep 11, 2021)

    ๐ŸŽ‰ Mini-release getting rid of small bugs and adding functionality (๐Ÿ› & ๐Ÿ“ˆ ) :

    1. Add function to store initial model checkpoint for post-processing via log.save_init_model(model).

    2. Fix byte decoding for strings stored as arrays in .hdf5 log file. Previously this only worked for multi seed/config settings.

    3. MLELogger got a new optional argument: config_dict, which allows you to provide a (nested) configuration of your experiment. It will be stored as a .yaml file if you don't provide a path to an alternative configuration file. The file can either be a .json or a .yaml:

    log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                    what_to_track=['train_loss', 'test_loss'],
                    experiment_dir="experiment_dir/",
                    config_dict={"train_config": {"lrate": 0.01}},
                    model_type='torch',
                    verbose=True)
    
    1. The config_dict/ loaded config_fname data will be stored in the meta data of the loaded log and can be easily retrieved:
    log = load_log("experiment_dir/")
    log.meta.config_dict
    
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Aug 18, 2021)

Owner
Robert Lange
Deep Something @ TU Berlin ๐Ÿ•ต๏ธ
Robert Lange
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

Nixtla 401 Jan 08, 2023
A Time Series Library for Apache Spark

Flint: A Time Series Library for Apache Spark The ability to analyze time series data at scale is critical for the success of finance and IoT applicat

Two Sigma 970 Jan 04, 2023
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

AutoViz and Auto_ViML 519 Jan 03, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

pywFM pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library: Factorization machines (FM) are a generic approa

Joรฃo Ferreira Loff 251 Sep 23, 2022
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
Iterative stochastic gradient descent (SGD) linear regressor with regularization

SGD-Linear-Regressor Iterative stochastic gradient descent (SGD) linear regressor with regularization Dataset: Kaggle โ€œGraduate Admission 2โ€ https://w

Zechen Ma 1 Oct 29, 2021
Flask app to predict daily radiation from the time series of Solcast from Islamabad, Pakistan

Solar-radiation-ISB-MLOps - Flask app to predict daily radiation from the time series of Solcast from Islamabad, Pakistan.

Abid Ali Awan 1 Dec 31, 2021
Tools for diffing and merging of Jupyter notebooks.

nbdime provides tools for diffing and merging of Jupyter Notebooks.

Project Jupyter 2.3k Jan 03, 2023
This machine learning model was developed for House Prices

This machine learning model was developed for House Prices - Advanced Regression Techniques competition in Kaggle by using several machine learning models such as Random Forest, XGBoost and LightGBM.

serhat_derya 1 Mar 02, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

SUN Group @ UMN 28 Aug 03, 2022
Made in collaboration with Chris George for Art + ML Spring 2019.

Deepdream Eyes Made in collaboration with Chris George for Art + ML Spring 2019.

Francisco Cabrera 1 Jan 12, 2022
่™šๆ‹Ÿ่ดงๅธ(BTCใ€ETH)็‚’ๅธ้‡ๅŒ–็ณป็ปŸ้กน็›ฎใ€‚ๅœจไธ€็‰ˆๆœฌ็š„ๅŸบ็ก€ไธŠๅŠ ๅ…ฅไบ†่ถ‹ๅŠฟๅˆคๆ–ญ

๐ŸŽ‰ ็ฌฌไบŒ็‰ˆๆœฌ ๐ŸŽ‰ ๏ผˆ็Žฐ่ดง่ถ‹ๅŠฟ็ฝ‘ๆ ผ๏ผ‰ ไป‹็ป ๅœจ็ฌฌไธ€็‰ˆๆœฌ็š„ๅŸบ็ก€ไธŠ ่ถ‹ๅŠฟๅˆคๆ–ญ๏ผŒไธๅœจๅ›บๅฎš็‚นไฝๅผ€ๅ•๏ผŒ้€‰ๆ‹ฉๆ›ดไผ˜็š„ๅผ€ไป“็‚นไฝ ไผ˜ๅŠฟ๏ผš ๐ŸŽ‰ ็ฎ€ๅ•ๆ˜“ไธŠๆ‰‹ ๅฎ‰ๅ…จ(ไธ็”จๅฐ†api_secretๅ‘Š่ฏ‰ไป–ไบบ) ๅฆ‚ไฝ•ๅฏๅŠจ ไฟฎๆ”นapp็›ฎๅฝ•ไธ‹็š„authorizationๆ–‡ไปถ

ๅนธ็ฆๆ‘็š„็ ๅ†œ 250 Jan 07, 2023
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

ClearML 4k Jan 09, 2023
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
About Solve CTF offline disconnection problem - based on python3's small crawler

About Solve CTF offline disconnection problem - based on python3's small crawler, support keyword search and local map bed establishment, currently support Jianshu, xianzhi,anquanke,freebuf,seebug

ๅคฉๆฒณ 32 Oct 25, 2022
Regularization and Feature Selection in Least Squares Temporal Difference Learning

Regularization and Feature Selection in Least Squares Temporal Difference Learning Description This is Python implementations of Least Angle Regressio

Mina Parham 0 Jan 18, 2022
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 01, 2023
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

42 Dec 23, 2022