InfiniteBoost: building infinite ensembles with gradient descent

Last update: Jan 03, 2023

Overview

InfiniteBoost

Code for a paper
InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109).
A. Rogozhnikov, T. Likhomanenko

Description

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting.

Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

Left: InfiniteBoost with automated search of capacity vs gradient boosting with different learning rates (shrinkages), right: random forest vs InfiniteBoost with small capacities.

More plots of comparison in research notebooks and in research/plots directory.

Reproducing research

Research is performed in jupyter notebooks (if you're not familiar, read why Jupyter notebooks are awesome).

You can use the docker image arogozhnikov/pmle:0.01 from docker hub. Dockerfile is stored in this repository (ubuntu 16 + basic sklearn stuff).

To run the environment (sudo is needed on Linux):

sudo docker run -it --rm -v /YourMountedDirectory:/notebooks -p 8890:8890 arogozhnikov/pmle:0.01

(and open localhost:8890 in your browser).

InfiniteBoost package

Self-written minimalistic implementation of trees as used for experiments against boosting.

Specific implementation was used to compare with random forest and based on the trees from scikit-learn package.

Code written in python 2 (expected to work with python 3, but not tested), some critical functions in fortran, so you need gfortran + openmp installed before installing the package (or simply use docker image).

pip install numpy
pip install .
# testing (optional)
cd tests && nosetests .

You can use implementation of trees from the package for your experiments, in this case please cite InfiniteBoost paper.

InfiniteBoost: building infinite ensembles with gradient descent

Related tags

Overview

InfiniteBoost

Description

Reproducing research

InfiniteBoost package

Owner

Alex Rogozhnikov

Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Stacked Generalization (Ensemble Learning)

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

Crunchdao - Python API for the Crunchdao machine learning tournament

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Python 3.6+ toolbox for submitting jobs to Slurm

Apache (Py)Spark type annotations (stub files).

Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Python module for performing linear regression for data with measurement errors and intrinsic scatter

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

🔬 A curated list of awesome machine learning strategies & tools in financial market.

Greykite: A flexible, intuitive and fast forecasting library

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

Probabilistic time series modeling in Python

This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

Python/Sage Tool for deriving Scattering Matrices for WDF R-Adaptors

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

InfiniteBoost: building infinite ensembles with gradient descent

Related tags

Overview

InfiniteBoost

Description

Reproducing research

InfiniteBoost package

Owner

Alex Rogozhnikov

Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Stacked Generalization (Ensemble Learning)

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

Crunchdao - Python API for the Crunchdao machine learning tournament

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Python 3.6+ toolbox for submitting jobs to Slurm

Apache (Py)Spark type annotations (stub files).

Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Python module for performing linear regression for data with measurement errors and intrinsic scatter

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

🔬 A curated list of awesome machine learning strategies & tools in financial market.

﻿Greykite: A flexible, intuitive and fast forecasting library

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

Probabilistic time series modeling in Python

This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

Python/Sage Tool for deriving Scattering Matrices for WDF R-Adaptors

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Greykite: A flexible, intuitive and fast forecasting library