Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

Overview

sklearn-compatible Random Bits Forest

Scikit-learn compatible wrapper of the Random Bits Forest program written by Wang et al., 2016, available as a binary on Sourceforge. All credits belong to the authors. This is just some quick and dirty wrapper and testing code.

The authors present "...a classification and regression algorithm called Random Bits Forest (RBF). RBF integrates neural network (for depth), boosting (for wideness) and random forest (for accuracy). It first generates and selects ~10,000 small three-layer threshold random neural networks as basis by gradient boosting scheme. These binary basis are then feed into a modified random forest algorithm to obtain predictions. In conclusion, RBF is a novel framework that performs strongly especially on data with large size."

Note: the executable supplied by the authors has been compiled for Linux, and for CPUs supporting SSE instructions.

Fig1 from Wang et al., 2016

Usage

Usage example of the Random Bits Forest:

from uci_loader import *
from randombitsforest import RandomBitsForest
X, y = getdataset('diabetes')

from sklearn.ensemble.forest import RandomForestClassifier

classifier = RandomBitsForest()
classifier.fit(X[:len(y)/2], y[:len(y)/2])
p = classifier.predict(X[len(y)/2:])
print "Random Bits Forest Accuracy:", np.mean(p == y[len(y)/2:])

classifier = RandomForestClassifier(n_estimators=20)
classifier.fit(X[:len(y)/2], y[:len(y)/2])
print "Random Forest Accuracy:", np.mean(classifier.predict(X[len(y)/2:]) == y[len(y)/2:])

Usage example for the UCI comparison:

from uci_comparison import compare_estimators
from sklearn.ensemble.forest import RandomForestClassifier, ExtraTreesClassifier
from randombitsforest import RandomBitsForest

estimators = {
              'RandomForest': RandomForestClassifier(n_estimators=200),
              'ExtraTrees': ExtraTreesClassifier(n_estimators=200),
              'RandomBitsForest': RandomBitsForest(number_of_trees=200)
            }

# optionally, pass a list of UCI dataset identifiers as the datasets parameter, e.g. datasets=['iris', 'diabetes']
# optionally, pass a dict of scoring functions as the metric parameter, e.g. metrics={'F1-score': f1_score}
compare_estimators(estimators)

"""
                          ExtraTrees F1score RandomBitsForest F1score RandomForest F1score
========================================================================================
  breastcancer (n=683)      0.960 (SE=0.003)      0.954 (SE=0.003)     *0.963 (SE=0.003)
       breastw (n=699)     *0.956 (SE=0.003)      0.951 (SE=0.003)      0.953 (SE=0.005)
      creditg (n=1000)     *0.372 (SE=0.005)      0.121 (SE=0.003)      0.371 (SE=0.005)
      haberman (n=306)      0.317 (SE=0.015)     *0.346 (SE=0.020)      0.305 (SE=0.016)
         heart (n=270)      0.852 (SE=0.004)     *0.854 (SE=0.004)      0.852 (SE=0.006)
    ionosphere (n=351)      0.740 (SE=0.037)     *0.741 (SE=0.037)      0.736 (SE=0.037)
          labor (n=57)      0.246 (SE=0.016)      0.128 (SE=0.014)     *0.361 (SE=0.018)
liverdisorders (n=345)      0.707 (SE=0.013)     *0.723 (SE=0.013)      0.713 (SE=0.012)
     tictactoe (n=958)      0.030 (SE=0.007)     *0.336 (SE=0.040)      0.030 (SE=0.007)
          vote (n=435)     *0.658 (SE=0.012)      0.228 (SE=0.017)     *0.658 (SE=0.012)
"""
Owner
Tamas Madl
Tamas Madl
Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

Chris Yuan 1 Feb 06, 2022
Learning --> Numpy January 2022 - winter'22

Numerical-Python Numpy NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along

Shahzaneer Ahmed 0 Mar 12, 2022
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 07, 2023
Course files for "Ocean/Atmosphere Time Series Analysis"

time-series This package contains all necessary files for the course Ocean/Atmosphere Time Series Analysis, an introduction to data and time series an

Jonathan Lilly 107 Nov 29, 2022
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in

Backprop 227 Dec 10, 2022
A Python library for choreographing your machine learning research.

A Python library for choreographing your machine learning research.

AI2 270 Jan 06, 2023
Code base of KU AIRS: SPARK Autonomous Vehicle Team

KU AIRS: SPARK Autonomous Vehicle Project Check this link for the blog post describing this project and the video of SPARK in simulation and on parkou

Mehmet Enes Erciyes 1 Nov 23, 2021
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model

A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model (Random Forest Classifier Model ) that helps the user to identify whether someone is showing positive Covid sym

Priyansh Sharma 2 Oct 06, 2022
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
#30DaysOfStreamlit is a 30-day social challenge for you to build and deploy Streamlit apps.

30 Days Of Streamlit 🎈 This is the official repo of #30DaysOfStreamlit — a 30-day social challenge for you to learn, build and deploy Streamlit apps.

Streamlit 53 Jan 02, 2023
WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can b

Shigang Li 6 Jun 18, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
Python bindings for MPI

MPI for Python Overview Welcome to MPI for Python. This package provides Python bindings for the Message Passing Interface (MPI) standard. It is imple

MPI for Python 604 Dec 29, 2022
Predict profitability of trades based on indicator buy / sell signals

Predict profitability of trades based on indicator buy / sell signals Trade profitability analysis for trades based on various indicators signals: MAC

Tomasz Porzycki 1 Dec 15, 2021
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

Model Search Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers sp

AriesTriputranto 1 Dec 13, 2021
AutoOED: Automated Optimal Experiment Design Platform

AutoOED is an optimal experiment design platform powered with automated machine learning to accelerate the discovery of optimal solutions. Our platform solves multi-objective optimization problems an

Yunsheng Tian 107 Jan 03, 2023
database for artificial intelligence/machine learning data

AIDB v0.0.1 database for artificial intelligence/machine learning data Overview aidb is a database designed for large dataset for machine learning pro

Aarush Gupta 1 Oct 24, 2021