Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Overview

Build Status Travis Build Status AppVeyor DOI arXiv.org Python Versions PyPI Version CRAN Version

Regularized Greedy Forest

Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better results than gradient boosted decision trees (GBDT) on a number of datasets and it has been used to win a few Kaggle competitions. Unlike the traditional boosted decision tree approach, RGF works directly with the underlying forest structure. RGF integrates two ideas: one is to include tree-structured regularization into the learning formulation; and the other is to employ the fully-corrective regularized greedy algorithm.

This repository contains the following implementations of the RGF algorithm:

  • RGF: original implementation from the paper;
  • FastRGF: multi-core implementation with some simplifications;
  • rgf_python: wrapper of both RGF and FastRGF implementations for Python;
  • R package: wrapper of rgf_python for R.

You may want to get interesting information about RGF from the posts collected in Awesome RGF.

Comments
  • Support wheels

    Support wheels

    Since rgf_python hasn't any special requirements (for compiler, environment, etc.), I think it good idea to have wheels on PyPI site (and the sources in .tar.gz, of course). I believe providing successfully compiled binaries will prevent many strange errors like recent ones.

    We need wheels for two platforms: first for macOS and Linux and second for Windows.

    The final result should be similar to this one: image

    But each wheel for each platform should have 32bit and 64bit version.

    Binaries we could get from Travis and Appveyor as artifacts (I can do this). The one problem I see now is that Travis hasn't 32bit machines, but I believe we'll overcome this problem 😃 .

    @fukatani When you'll have time, please search how to appropriate name wheels according to target platforms and how to post them at PyPI. Or I can do it more later.

    enhancement 
    opened by StrikerRUS 35
  • error:Exception: Model learning result is not found in /tmp/rgf. This is rgf_python error.

    error:Exception: Model learning result is not found in /tmp/rgf. This is rgf_python error.

    How to deal with this error:

    Ran 0 examples: 0 success, 0 failure, 0 error

    None Ran 0 examples: 0 success, 0 failure, 0 error

    None Ran 0 examples: 0 success, 0 failure, 0 error

    None Traceback (most recent call last): File "/Users/k.den/Desktop/For_Submission/1_source_code/test.py", line 25, in pred = rgf_model.predict_proba(X_eval)[:, 1] File "/usr/local/lib/python3.6/site-packages/rgf/sklearn.py", line 652, in predict_proba class_proba = clf.predict_proba(X) File "/usr/local/lib/python3.6/site-packages/rgf/sklearn.py", line 798, in predict_proba 'This is rgf_python error.'.format(_TEMP_PATH)) Exception: Model learning result is not found in /tmp/rgf. This is rgf_python error.

    Process finished with exit code 1

    opened by tianke0711 34
  • ModuleNotFoundError: No module named 'rgf.sklearn'; 'rgf' is not a package

    ModuleNotFoundError: No module named 'rgf.sklearn'; 'rgf' is not a package

    For bugs and unexpected issues, please provide the following information, so that we could reproduce them on our system.

    Environment Info

    Operating System: MacOS Sierra 10.12 | Ubuntu 16.04.3 LTS

    Python version: 3.6.1

    rgf_python version: HEAD (pulled from github)

    Whether test.py is passed or not: FAILED (errors=24)

    Error Message

    ModuleNotFoundError: No module named 'rgf.sklearn'; 'rgf' is not a package

    Reproducible Example

    from rgf.sklearn import RGFClassifier

    opened by vsedelnik 30
  • suggestion to integrate the R wrapper in the repository

    suggestion to integrate the R wrapper in the repository

    This issue is related with a previous one. A month ago I wrapped rgf_python using the reticulate package in R. It can be installed on Linux, and somehow cumbersome on Macintosh and Windows (on Windows currently it works only from the command prompt). I opened the issue as suggested by @fukatani

    opened by mlampros 20
  • Model learning result is not found in C:\Users\hp\temp\rgf. This is rgf_python error.

    Model learning result is not found in C:\Users\hp\temp\rgf. This is rgf_python error.

    Hello,

    i have read the previous thread on the same post, but it does not seem to solve my problem, because the previous case had string included in dataset and all i have got are all numbers. Could you please let me know what could be the problem??

    Much appreciated !

    skf = StratifiedKFold(n_splits = kfold, random_state=1)
    for i, (train_index, test_index) in enumerate(skf.split(X, y)):
        X_train, X_eval = X[train_index], X[test_index]
        y_train, y_eval = y[train_index], y[test_index]
       
        rgf_model = RGFClassifier(max_leaf=400,
                        algorithm="RGF_Sib",
                        test_interval=100,
                        verbose=True).fit( X_train, y_train)
        pred = rgf_model.predict_proba(X_eval)[:,1]
        print( "Gini = ", eval_gini(y_eval, pred) )
    

    and

    ---------------------------------------------------------------------------
    Exception                                 Traceback (most recent call last)
    <ipython-input-17-b27ba3506d06> in <module>()
         12                     test_interval=100,
         13                     verbose=True).fit( X_train, y_train)
    ---> 14     pred = rgf_model.predict_proba(X_eval)[:,1]
         15     print( "Gini = ", eval_gini(y_eval, pred) )
    
    C:\Anaconda3\lib\site-packages\rgf\sklearn.py in predict_proba(self, X)
        644                              % (self._n_features, n_features))
        645         if self._n_classes == 2:
    --> 646             y = self._estimators[0].predict_proba(X)
        647             y = _sigmoid(y)
        648             y = np.c_[y, 1 - y]
    
    C:\Anaconda3\lib\site-packages\rgf\sklearn.py in predict_proba(self, X)
        796         if not model_files:
        797             raise Exception('Model learning result is not found in {0}. '
    --> 798                             'This is rgf_python error.'.format(_TEMP_PATH))
        799         latest_model_loc = sorted(model_files, reverse=True)[0]
        800 
    
    Exception: Model learning result is not found in C:\Users\hp\temp\rgf. This is rgf_python error.
    
    
    opened by mike-m123 20
  • migrate from Appveyor to GitHub Actions

    migrate from Appveyor to GitHub Actions

    Fixed #122. Appveyor suggests only 1 parallel job at free tier, GitHub Actions - 20.

    Should be considered as a continuation of #328. Same changes as for *nix OSes: latest R version; stop producing 32bit artifacts.

    opened by StrikerRUS 16
  • New release

    New release

    I suppose it's time to release a new version with the support of warm start.

    @fukatani Please release new Python version, and then @mlampros please upload to CRAN new R version.

    opened by StrikerRUS 16
  • updated wheels building

    updated wheels building

    @fukatani Please attach Linux i686 executable file to GitHub release - I've just tested replacing files into wheels and it works locally, so should work on Travis too! :-)

    Refer to https://github.com/fukatani/rgf_python/issues/81#issuecomment-348662123.

    opened by StrikerRUS 15
  • More Travis tests

    More Travis tests

    Hi @fukatani ! Can you add more platforms (Windows, MacOS) to Travis? I don't know how, but it's possible šŸ˜„ : image [Screenshot from xgboost repo] Maybe it can help: https://github.com/dmlc/xgboost/blob/master/.travis.yml

    If there is a limitation to number of tests, maybe it's better to split Python version tests between platforms: Windows + 2.7, Linux + 3.4, MacOS + 3.5 (I think you understand me).

    opened by StrikerRUS 15
  • Cannot import name 'RGFClassifier'

    Cannot import name 'RGFClassifier'

    I am having the above error. I have made rgf1.2 and have tested using rgf1.2's own perl test script. This works. I have installed rgf_python and run the python setup as specified. I have changed the two folder locations to rgf1.2..\rgf executable and a temp folder that exist.

    In python when I try to import I get the error Cannot import name 'RGFClassifier'. I tried to run the exact code in the test.py script provided in with rgf_python and this same error occurs.

    Strangely, I have /usr/local/lib/python3.5/dist-packages/rgf_sklearn-0.0.0-py3.5.egg/rgf in my path when I do run

    import sys
    sys.path
    

    in python. I also in /usr/local/lib/python3.5/dist-packages I only have the rfg-sklearn-0.0.0-py3.5.egg and no rgf-sklearn as I would expect as the following appeared towards the end of the setup.py,

    Extracting rgf_sklearn-0.0.0-py3.5.egg to /usr/local/lib/python3.5/dist-packages
    Adding rgf-sklearn 0.0.0 to easy-install.pth file
    
    opened by JoshuaC3 15
  • [rgf_python] add warm-start

    [rgf_python] add warm-start

    Fixed #184.

    This PR adds the support of warm-start in RGF estimators, save_model() method which is needed to obtain binary model file and for further passing in init_model argument.

    Also, this PR adds tests with analysis of exception message (as I promised in https://github.com/RGF-team/rgf/pull/258#issuecomment-439685042).

    opened by StrikerRUS 14
  • Running RGF from R cmd

    Running RGF from R cmd

    For bugs and unexpected issues, please provide the following information, so that we could reproduce them on our system.

    Environment Info

    Operating System: Windows 10

    RGF/FastRGF/rgf_python version: 3.5.0-9

    Python version (for rgf_python errors): 3.5.0-9

    Error Message

    image

    image

    Reproducible Example

    Error when running RGF from R console as shown in the pic. Installation of RGF should be working fine as shown in the pic. RGF was installed via devtools.

    help wanted 
    opened by similang 2
  • Python cant find executables

    Python cant find executables

    Hi there

    I'm trying to install rgf/fastrgf and use the python wrapper to launch the executables.

    I've installed using pip install rgf_python

    However when i import the rgf module i get a user warning

    UserWarning: Cannot find FastRGF executable files. FastRGF estimators will be unavailable for usage.
      warnings.warn("Cannot find FastRGF executable files. FastRGF estimators will be unavailable for usage.")
    

    To fix this issue i've compiled the rgf and fastrgf binaries* and added them to my $PATH variable (confirmed in bash that they are in the PATH) however i still get the same error. I've looked a bit into the rgf/utils get_paths and is_fastrgf_executable functions however i'm not completely sure why it fails?

    *binaries: i was not sure which binaries are needed so i've added the following rgf, forest_predict, forest_train, discretized_trainer, discretized_gendata, auc

    System Python: conda 3.6.1 OS: ubuntu 16.04

    opened by casperkaae 29
  • dump RGF and FastRGF to the JSON file

    dump RGF and FastRGF to the JSON file

    Initial support for dumping the RGF model is already implemented in #161. At present it's possible to print the model to the console. But it's good idea to bring the possibility of dumping the model to the file (e.g. JSON).

    @StrikerRUS:

    Really like new features introduced in this PR. But please think about "real dump" of a model. I suppose it'll be more useful than just printing to the console.

    @fukatani:

    For example dump in JSON format like lightGBM. It's convenient and we may support it in the future, but we should do it with another PR.

    enhancement 
    opened by StrikerRUS 6
  • Support f_ratio?

    Support f_ratio?

    I found not documented parameter f_ratio in RGF. This corresponding to LightGBM feature_fraction and XGB colsample_bytree.

    I tried these parameter with boston regression example. In small max_leaf(300), f_ratio=0.9 improves score to 11.0 from 11.8, but in many max_leaf(5000), f_ratio=0.95 degrared score to 10.34 from 10.19810.

    After all, is there no value to use f_ratio < 1.0?

    opened by fukatani 10
  • [FastRGF] FastRGF doesn't work for small sample and need to fix integration test for FastRGF

    [FastRGF] FastRGF doesn't work for small sample and need to fix integration test for FastRGF

    #Now, sklearn integration tests for FastRGFClassifier and FastRGFClassifier.

    FastRGF doesn't work well for small samples, that is reason for test failed. I doubt inside Fast RGF executable inside. I inspect Fast RGF by debugger, discretization boundaries are invalid.

    At least we should raise understandable error from RGF python if discretization failed.

    bug 
    opened by fukatani 18
Releases(3.12.0)
Owner
RGF-team
RGF-team
OMLT: Optimization and Machine Learning Toolkit

OMLT is a Python package for representing machine learning models (neural networks and gradient-boosted trees) within the Pyomo optimization environment.

Cāš™G - Imperial College London 179 Jan 02, 2023
Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

12 Dec 12, 2022
Unrolled Generative Adversarial Networks

Unrolled Generative Adversarial Networks Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein arxiv:1611.02163 This repo contains an example notebo

Ben Poole 292 Dec 06, 2022
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021
Author's PyTorch implementation of TD3 for OpenAI gym tasks

Addressing Function Approximation Error in Actor-Critic Methods PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If y

Scott Fujimoto 1.3k Dec 25, 2022
Rename Images with Auto Generated Neural Image Captions

Recaption Images with Generated Neural Image Caption Example Usage: Commandline: Recaption all images from folder /home/feng/Downloads/images to folde

feng wang 3 May 01, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 05, 2022
The official implementation of CircleNet: Anchor-free Detection with Circle Representation, MICCAI 2030

CircleNet: Anchor-free Detection with Circle Representation The official implementation of CircleNet, MICCAI 2020 [PyTorch] [project page] [MICCAI pap

The Biomedical Data Representation and Learning Lab 45 Nov 18, 2022
[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

wseg Overview The Pytorch implementation of Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast. [arXiv] Though image-level weakly

Ye Du 96 Dec 30, 2022
Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL) This repository is for Zero-shot Natural Languag

Computer Vision Lab. @ GIST 37 Dec 27, 2022
Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019) To make better use of given limited labels, we propo

126 Sep 13, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology Self-Supervised Vision Transformers Learn Visual Concepts in Histopatholog

Richard Chen 95 Dec 24, 2022
Fermi Problems: A New Reasoning Challenge for AI

Fermi Problems: A New Reasoning Challenge for AI Fermi Problems are questions whose answer is a number that can only be reasonably estimated as a prec

AI2 15 May 28, 2022
Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

Raster Perspective Apply a perspective transformation to bitmap image using the selected path as envelope, without the need to use an external softwar

s.ouchene 19 Dec 22, 2022
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes, ICCV 2017

AdaptationSeg This is the Python reference implementation of AdaptionSeg proposed in "Curriculum Domain Adaptation for Semantic Segmentation of Urban

Yang Zhang 128 Oct 19, 2022
Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Deep-Rep-MFIR Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising Publication: Deep Reparametrization of M

Goutam Bhat 39 Jan 04, 2023
Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision

MLP Mixer Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision. Give us a star if you like this repo. Author: Github: bangoc123 Emai

Ngoc Nguyen Ba 86 Dec 10, 2022
Reinforcement Learning Theory Book (rus)

Reinforcement Learning Theory Book (rus)

qbrick 206 Nov 27, 2022
SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Wentao Zhu 24 May 20, 2022