onelearn: Online learning in Python

Overview

Build Status Documentation Status PyPI - Python Version PyPI - Wheel GitHub stars GitHub issues GitHub license Coverage Status

onelearn: Online learning in Python

Documentation | Reproduce experiments |

onelearn stands for ONE-shot LEARNning. It is a small python package for online learning with Python. It provides :

  • online (or one-shot) learning algorithms: each sample is processed once, only a single pass is performed on the data
  • including multi-class classification and regression algorithms
  • For now, only ensemble methods, namely Random Forests

Installation

The easiest way to install onelearn is using pip

pip install onelearn

But you can also use the latest development from github directly with

pip install git+https://github.com/onelearn/onelearn.git

References

@article{mourtada2019amf,
  title={AMF: Aggregated Mondrian Forests for Online Learning},
  author={Mourtada, Jaouad and Ga{\"\i}ffas, St{\'e}phane and Scornet, Erwan},
  journal={arXiv preprint arXiv:1906.10529},
  year={2019}
}
Comments
  • Unable to pickle AMFClassifier.

    Unable to pickle AMFClassifier.

    I would like to save the AMFClassifier, but am unable to pickle it. I have also tried to use dill or joblib, but they also don't seem to work.

    Is there maybe another way to somehow export the AMFClassifier in any way, such that I can save it and load it in another kernel?

    Below I added a snippet of code which reproduces the error. Note that only after the partial_fit method an error occurs when pickling. When the AMFClassifier has not been fit yet, pickling happens without problems, however, exporting an empty model is pretty useless.

    Any help or tips is much appreciated.

    from onelearn import AMFClassifier
    import dill as pickle
    from sklearn import datasets
    
    
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    amf = AMFClassifier(n_classes=3)
    
    dump = pickle.dumps(amf)
    amf = pickle.loads(dump)
    
    amf.partial_fit(X,y)
    
    dump = pickle.dumps(amf)
    amf = pickle.loads(dump)
    
    opened by w-feijen 1
  • Move experiments of the paper in a experiments folder

    Move experiments of the paper in a experiments folder

    • Update the documentation
    • Explain that we must clone the repo

    Move also the short experiments to a examples folder and build a sphinx gallery with it

    enhancement 
    opened by stephanegaiffas 1
  • Add some extra tests

    Add some extra tests

    • Test that batch versus online training leads to the exact same forest
    • Test the behavior of reserve_samples, with several calls to partial_fit to check that memory is correctly allocated and
    tests 
    opened by stephanegaiffas 1
  • What if predict_proba receives a single sample

    What if predict_proba receives a single sample

    get_amf_decision_online amf.partial_fit(X_train[iteration - 1], y_train[iteration - 1]) File "/Users/stephanegaiffas/Code/onelearn/onelearn/forest.py", line 259, in partial_fit n_samples, n_features = X.shape

    opened by stephanegaiffas 1
  • Improve coverage

    Improve coverage

    A problem is that @jit functions don't work with coverage... a workaround is to disable using the NUMBA_DISABLE_JIT environment variable, but breaks the code that use @jitclass and .class_type.instance_type attributes

    enhancement bug fix 
    opened by stephanegaiffas 1
Releases(v0.3)
  • v0.3(Sep 29, 2021)

    This release adds the following improvements

    • AMFClassifier and AMFRegressor can be serialized to files (using internally pickle) using the save and load methods
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Apr 6, 2020)

    This release adds the following improvements

    • SampleCollection pre-allocates more samples instead of the bare minimum for faster computation
    • The playground can be launched from the library
    • A documentation on readthedocs
    • Faster computations and a lot of code cleaning
    • Unittests for python 3.6-3.8
    Source code(tar.gz)
    Source code(zip)
💀mummify: a version control tool for machine learning

mummify is a version control tool for machine learning. It's simple, fast, and designed for model prototyping.

Max Humber 43 Jul 09, 2022
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 07, 2023
High performance Python GLMs with all the features!

High performance Python GLMs with all the features!

QuantCo 200 Dec 14, 2022
Data Efficient Decision Making

Data Efficient Decision Making

Microsoft 197 Jan 06, 2023
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 02, 2023
Classification based on Fuzzy Logic(C-Means).

CMeans_fuzzy Classification based on Fuzzy Logic(C-Means). Table of Contents About The Project Fuzzy CMeans Algorithm Built With Getting Started Insta

Armin Zolfaghari Daryani 3 Feb 08, 2022
Solve automatic numerical differentiation problems in one or more variables.

numdifftools The numdifftools library is a suite of tools written in _Python to solve automatic numerical differentiation problems in one or more vari

Per A. Brodtkorb 181 Dec 16, 2022
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET Can a machine learning project be implemented to estimate the salaries of baseball players whos

Pinar Oner 7 Dec 18, 2021
Anytime Learning At Macroscale

On Anytime Learning At Macroscale Learning from sequential data dumps (key) Requirements Python 3.7 Pytorch 1.9.0 Hydra 1.1.0 (pip install hydra-core

Meta Research 8 Mar 29, 2022
A toolbox to iNNvestigate neural networks' predictions!

iNNvestigate neural networks! Table of contents Introduction Installation Usage and Examples More documentation Contributing Releases Introduction In

Maximilian Alber 1.1k Jan 05, 2023
Crunchdao - Python API for the Crunchdao machine learning tournament

Python API for the Crunchdao machine learning tournament Interact with the Crunc

3 Jan 19, 2022
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
Apache (Py)Spark type annotations (stub files).

PySpark Stubs A collection of the Apache Spark stub files. These files were generated by stubgen and manually edited to include accurate type hints. T

Maciej 114 Nov 22, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Microsoft 241 Dec 26, 2022
A handy tool for common machine learning models' hyper-parameter tuning.

Common machine learning models' hyperparameter tuning This repo is for a collection of hyper-parameter tuning for "common" machine learning models, in

Kevin Hu 2 Jan 27, 2022
A machine learning web application for binary classification using streamlit

Machine Learning web App This is a machine learning web application for binary classification using streamlit options this application contains 3 clas

abdelhak mokri 1 Dec 20, 2021
Price Prediction model is used to develop an LSTM model to predict the future market price of Bitcoin and Ethereum.

Price Prediction model is used to develop an LSTM model to predict the future market price of Bitcoin and Ethereum.

2 Jun 14, 2022
healthy and lesion models for learning based on the joint estimation of stochasticity and volatility

health-lesion-stovol healthy and lesion models for learning based on the joint estimation of stochasticity and volatility Reference please cite this p

5 Nov 01, 2022
PennyLane is a cross-platform Python library for differentiable programming of quantum computers

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural ne

PennyLaneAI 1.6k Jan 01, 2023