Code for the paper "There is no Double-Descent in Random Forests"

This repository contains the code to run the experiments for our paper called "There is no Double-Descent in Random Forest". In the paper we highlight experiments on the 5 different datasets (adult, bank, eeg, magic, nomao), but this implementation also supports more datasets out of the box. Most of the code should be somewhat commented and self-explanatory given the two caveats below. To run the experiments simply clone this repository

[email protected]:sbuschjaeger/rf-double-descent.git

(Optional) Build the conda environment and activate it:

conda env creat -f environment.yml --force
conda activate rfdd

Run experiments on the adult dataset with M = 256 trees over a 5 fold cross validation with different number of max_nodes with 96 threads:

./run.py -x 5 -M 256 --n_jobs 96 --max_nodes 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 -d adult

Important 1: This will run all experiments with 96 threads. The experiments are executed in a multiprocessing.Pool environment which means that the entire dataset is copied for each cross-validation run. Hence this may take a decent amount of memory (up to 200GB) and some time.

Important 2: The command-line argument n_jobs only determines the total number of threads in the processing pool, but not the total number of threads used by this script. We currently supply n_jobs = n_jobs_per_forest = None to scikit-learns RandomForestClassifier when fitting the (initial) RF. Hence, scikit-learn uses a heuristic to choose the number of jobs used for fitting the RF. If required, then you can set n_jobs_per_forest in the script manually (line 132).

Important 3: Datasets which are not found in the tempfolder (issued by tempfile.gettmpdir() which likely points to /tmp on Linux systems) are automatically downloaded. If you have already downloaded the datasets or you simply do not like the temp folder you can set this via --tmpdir ${your_new_tmp_dir}.

Plot the results on the adult dataset and store the them in the current folder:

./plot.py -d adult -o .

Alternativley, plot.py is also divided into execution cells which you can run via an inline interpreter (e.g. VSCode or a Juypter Notebook).

Code for the paper "There is no Double-Descent in Random Forests"

Related tags

Overview

Code for the paper "There is no Double-Descent in Random Forests"

Owner

Bringing Computer Vision and Flutter together , to build an awesome app !!

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies

H&M Fashion Image similarity search with Weaviate and DocArray

Unadversarial Examples: Designing Objects for Robust Vision

Code for BMVC2021 paper "Boundary Guided Context Aggregation for Semantic Segmentation"

Dialect classification

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Algorithmic trading using machine learning.

This is an open source python repository for various python tests

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

:fire: 2D and 3D Face alignment library build using pytorch

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition

Cancer-and-Tumor-Detection-Using-Inception-model - In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks, specifically here the Inception model by google.

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

In this project, we'll be making our own screen recorder in Python using some libraries.

Jax/Flax implementation of Variational-DiffWave.

2.86% and 15.85% on CIFAR-10 and CIFAR-100

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"