onelearn: Online learning in Python

Overview

Build Status Documentation Status PyPI - Python Version PyPI - Wheel GitHub stars GitHub issues GitHub license Coverage Status

onelearn: Online learning in Python

Documentation | Reproduce experiments |

onelearn stands for ONE-shot LEARNning. It is a small python package for online learning with Python. It provides :

  • online (or one-shot) learning algorithms: each sample is processed once, only a single pass is performed on the data
  • including multi-class classification and regression algorithms
  • For now, only ensemble methods, namely Random Forests

Installation

The easiest way to install onelearn is using pip

pip install onelearn

But you can also use the latest development from github directly with

pip install git+https://github.com/onelearn/onelearn.git

References

@article{mourtada2019amf,
  title={AMF: Aggregated Mondrian Forests for Online Learning},
  author={Mourtada, Jaouad and Ga{\"\i}ffas, St{\'e}phane and Scornet, Erwan},
  journal={arXiv preprint arXiv:1906.10529},
  year={2019}
}
Comments
  • Unable to pickle AMFClassifier.

    Unable to pickle AMFClassifier.

    I would like to save the AMFClassifier, but am unable to pickle it. I have also tried to use dill or joblib, but they also don't seem to work.

    Is there maybe another way to somehow export the AMFClassifier in any way, such that I can save it and load it in another kernel?

    Below I added a snippet of code which reproduces the error. Note that only after the partial_fit method an error occurs when pickling. When the AMFClassifier has not been fit yet, pickling happens without problems, however, exporting an empty model is pretty useless.

    Any help or tips is much appreciated.

    from onelearn import AMFClassifier
    import dill as pickle
    from sklearn import datasets
    
    
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    amf = AMFClassifier(n_classes=3)
    
    dump = pickle.dumps(amf)
    amf = pickle.loads(dump)
    
    amf.partial_fit(X,y)
    
    dump = pickle.dumps(amf)
    amf = pickle.loads(dump)
    
    opened by w-feijen 1
  • Move experiments of the paper in a experiments folder

    Move experiments of the paper in a experiments folder

    • Update the documentation
    • Explain that we must clone the repo

    Move also the short experiments to a examples folder and build a sphinx gallery with it

    enhancement 
    opened by stephanegaiffas 1
  • Add some extra tests

    Add some extra tests

    • Test that batch versus online training leads to the exact same forest
    • Test the behavior of reserve_samples, with several calls to partial_fit to check that memory is correctly allocated and
    tests 
    opened by stephanegaiffas 1
  • What if predict_proba receives a single sample

    What if predict_proba receives a single sample

    get_amf_decision_online amf.partial_fit(X_train[iteration - 1], y_train[iteration - 1]) File "/Users/stephanegaiffas/Code/onelearn/onelearn/forest.py", line 259, in partial_fit n_samples, n_features = X.shape

    opened by stephanegaiffas 1
  • Improve coverage

    Improve coverage

    A problem is that @jit functions don't work with coverage... a workaround is to disable using the NUMBA_DISABLE_JIT environment variable, but breaks the code that use @jitclass and .class_type.instance_type attributes

    enhancement bug fix 
    opened by stephanegaiffas 1
Releases(v0.3)
  • v0.3(Sep 29, 2021)

    This release adds the following improvements

    • AMFClassifier and AMFRegressor can be serialized to files (using internally pickle) using the save and load methods
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Apr 6, 2020)

    This release adds the following improvements

    • SampleCollection pre-allocates more samples instead of the bare minimum for faster computation
    • The playground can be launched from the library
    • A documentation on readthedocs
    • Faster computations and a lot of code cleaning
    • Unittests for python 3.6-3.8
    Source code(tar.gz)
    Source code(zip)
AP1 Transcription Factor Binding Site Prediction

A machine learning project that predicted binding sites of AP1 transcription factor, using ChIP-Seq data and local DNA shape information.

1 Jan 21, 2022
a distributed deep learning platform

Apache SINGA Distributed deep learning system http://singa.apache.org Quick Start Installation Examples Issues JIRA tickets Code Analysis: Mailing Lis

The Apache Software Foundation 2.7k Jan 05, 2023
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processi

Salesforce 2.8k Jan 05, 2023
Implementation of deep learning models for time series in PyTorch.

List of Implementations: Currently, the reimplementation of the DeepAR paper(DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

Yunkai Zhang 275 Dec 28, 2022
A Tools that help Data Scientists and ML engineers train and deploy ML models.

Domino Research This repo contains projects under active development by the Domino R&D team. We build tools that help Data Scientists and ML engineers

Domino Data Lab 73 Oct 17, 2022
Timeseries analysis for neuroscience data

=================================================== Nitime: timeseries analysis for neuroscience data ===============================================

NIPY developers 212 Dec 09, 2022
Climin is a Python package for optimization, heavily biased to machine learning scenarios

climin climin is a Python package for optimization, heavily biased to machine learning scenarios distributed under the BSD 3-clause license. It works

Biomimetic Robotics and Machine Learning at Technische Universität München 177 Sep 02, 2022
Laporan Proyek Machine Learning - Azhar Rizki Zulma

Laporan Proyek Machine Learning - Azhar Rizki Zulma Project Overview Domain proyek yang dipilih dalam proyek machine learning ini adalah mengenai hibu

Azhar Rizki Zulma 6 Mar 12, 2022
This is the material used in my free Persian course: Machine Learning with Python

This is the material used in my free Persian course: Machine Learning with Python

Yara Mohamadi 4 Aug 07, 2022
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Derek Snow 465 Jan 02, 2023
A simple machine learning python sign language detection project.

SST Coursework 2022 About the app A python application that utilises the tensorflow object detection algorithm to achieve automatic detection of ameri

Xavier Koh 2 Jun 30, 2022
Pydantic based mock data generation

This library offers powerful mock data generation capabilities for pydantic based models. It can also be used with other libraries that use pydantic as a foundation, for example SQLModel, Beanie and

Na'aman Hirschfeld 396 Dec 28, 2022
Tools for Optuna, MLflow and the integration of both.

HPOflow - Sphinx DOC Tools for Optuna, MLflow and the integration of both. Detailed documentation with examples can be found here: Sphinx DOC Table of

Telekom Open Source Software 17 Nov 20, 2022
Anytime Learning At Macroscale

On Anytime Learning At Macroscale Learning from sequential data dumps (key) Requirements Python 3.7 Pytorch 1.9.0 Hydra 1.1.0 (pip install hydra-core

Meta Research 8 Mar 29, 2022
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 02, 2023
Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

Hackerank-Nested-List Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any s

Sangeeth Mathew John 2 Dec 14, 2021
A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is

Pinar Oner 7 Dec 18, 2021
Bayesian optimization in JAX

Bayesian optimization in JAX

Predictive Intelligence Lab 26 May 11, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 04, 2023