Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

Last update: Jul 24, 2021

Overview

sklearn-compatible Random Bits Forest

Scikit-learn compatible wrapper of the Random Bits Forest program written by Wang et al., 2016, available as a binary on Sourceforge. All credits belong to the authors. This is just some quick and dirty wrapper and testing code.

The authors present "...a classification and regression algorithm called Random Bits Forest (RBF). RBF integrates neural network (for depth), boosting (for wideness) and random forest (for accuracy). It first generates and selects ~10,000 small three-layer threshold random neural networks as basis by gradient boosting scheme. These binary basis are then feed into a modified random forest algorithm to obtain predictions. In conclusion, RBF is a novel framework that performs strongly especially on data with large size."

Note: the executable supplied by the authors has been compiled for Linux, and for CPUs supporting SSE instructions.

Usage

Usage example of the Random Bits Forest:

from uci_loader import *
from randombitsforest import RandomBitsForest
X, y = getdataset('diabetes')

from sklearn.ensemble.forest import RandomForestClassifier

classifier = RandomBitsForest()
classifier.fit(X[:len(y)/2], y[:len(y)/2])
p = classifier.predict(X[len(y)/2:])
print "Random Bits Forest Accuracy:", np.mean(p == y[len(y)/2:])

classifier = RandomForestClassifier(n_estimators=20)
classifier.fit(X[:len(y)/2], y[:len(y)/2])
print "Random Forest Accuracy:", np.mean(classifier.predict(X[len(y)/2:]) == y[len(y)/2:])

Usage example for the UCI comparison:

from uci_comparison import compare_estimators
from sklearn.ensemble.forest import RandomForestClassifier, ExtraTreesClassifier
from randombitsforest import RandomBitsForest

estimators = {
              'RandomForest': RandomForestClassifier(n_estimators=200),
              'ExtraTrees': ExtraTreesClassifier(n_estimators=200),
              'RandomBitsForest': RandomBitsForest(number_of_trees=200)
            }

# optionally, pass a list of UCI dataset identifiers as the datasets parameter, e.g. datasets=['iris', 'diabetes']
# optionally, pass a dict of scoring functions as the metric parameter, e.g. metrics={'F1-score': f1_score}
compare_estimators(estimators)

"""
                          ExtraTrees F1score RandomBitsForest F1score RandomForest F1score
========================================================================================
  breastcancer (n=683)      0.960 (SE=0.003)      0.954 (SE=0.003)     *0.963 (SE=0.003)
       breastw (n=699)     *0.956 (SE=0.003)      0.951 (SE=0.003)      0.953 (SE=0.005)
      creditg (n=1000)     *0.372 (SE=0.005)      0.121 (SE=0.003)      0.371 (SE=0.005)
      haberman (n=306)      0.317 (SE=0.015)     *0.346 (SE=0.020)      0.305 (SE=0.016)
         heart (n=270)      0.852 (SE=0.004)     *0.854 (SE=0.004)      0.852 (SE=0.006)
    ionosphere (n=351)      0.740 (SE=0.037)     *0.741 (SE=0.037)      0.736 (SE=0.037)
          labor (n=57)      0.246 (SE=0.016)      0.128 (SE=0.014)     *0.361 (SE=0.018)
liverdisorders (n=345)      0.707 (SE=0.013)     *0.723 (SE=0.013)      0.713 (SE=0.012)
     tictactoe (n=958)      0.030 (SE=0.007)     *0.336 (SE=0.040)      0.030 (SE=0.007)
          vote (n=435)     *0.658 (SE=0.012)      0.228 (SE=0.017)     *0.658 (SE=0.012)
"""

Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

Related tags

Overview

sklearn-compatible Random Bits Forest

Usage

Owner

Tamas Madl

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

2021 Machine Learning Security Evasion Competition

Crunchdao - Python API for the Crunchdao machine learning tournament

Magenta: Music and Art Generation with Machine Intelligence

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

Official code for HH-VAEM

This machine learning model was developed for House Prices

Machine-care - A simple python script to take care of simple maintenance tasks

cleanlab is the data-centric ML ops package for machine learning with noisy labels.

A Streamlit demo to interactively visualize Uber pickups in New York City

Greykite: A flexible, intuitive and fast forecasting library

Adaptive: parallel active learning of mathematical functions

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

PySurvival is an open source python package for Survival Analysis modeling

A library to generate synthetic time series data by easy-to-use factors and generator

A machine learning project that predicts the price of used cars in the UK

An open-source library of algorithms to analyse time series in GPU and CPU.

Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

Related tags

Overview

sklearn-compatible Random Bits Forest

Usage

Owner

Tamas Madl

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

2021 Machine Learning Security Evasion Competition

Crunchdao - Python API for the Crunchdao machine learning tournament

Magenta: Music and Art Generation with Machine Intelligence

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

Official code for HH-VAEM

This machine learning model was developed for House Prices

Machine-care - A simple python script to take care of simple maintenance tasks

cleanlab is the data-centric ML ops package for machine learning with noisy labels.

A Streamlit demo to interactively visualize Uber pickups in New York City

﻿Greykite: A flexible, intuitive and fast forecasting library

Adaptive: parallel active learning of mathematical functions

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

PySurvival is an open source python package for Survival Analysis modeling

A library to generate synthetic time series data by easy-to-use factors and generator

A machine learning project that predicts the price of used cars in the UK

An open-source library of algorithms to analyse time series in GPU and CPU.

Greykite: A flexible, intuitive and fast forecasting library