This is an example of a reproducible modelling project

Last update: Oct 26, 2021

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

This example was created for the 2021 fall lecture series of Stanford's Center for Open and REproducible Science (CORES).

A video of the talk can be found at: https://youtu.be/JAQot6b1Cng

The goal of this exemplary analysis is to explore the effect of varying different hyper-parameters of the training of a simple classification model on its performance in scikit-learn's handwritten digit dataset.

Specifically, we will study the effect of varying the learning rate, regularisation strength, number of gradient descent steps, and random shuffling of the data on the 3-fold cross-validation performance of scikit-learn's linear support vector machine classifier.

Importantly, each hyper-parameter is varied separately while all other hyper-parameters are set to default values (for details, see scripts/evaluate_hyper_params_effect.py).

Project organization

├── LICENSE            <- MIT License
├── Makefile           <- Makefile with targets to 'load', 'evaluate', and 'plot' ('make all' runs all three analysis steps)
├── poetry.lock        <- Details of used package versions
├── pyproject.toml     <- Lists all dependencies
├── README.md          <- This README file.
├── docs/              
|    └──               <- Slides of the practical tutorial
├── data/
|    └──               <- A copy of the handwritten digit dataset provided by scikit-learn
|
├── results/
|    ├── estimates/
|    │    └──          <- Generated estimates of classifier performance
|    └── figures/
|         └──          <- Generated figures
|
├── scrips/
|    ├── load_data.py                       <- Downloads the dataset to specified 'data-path'
|    ├── evaluate_hyper_params_effect.py    <- Runs cross-validated hyper-parameter evaluation
|    ├── plot_hyper_params_effect.py        <- Summarizes results of evaluation in a figure
|    └── run_analysis.sh                    <- Runs all analysis steps
|
└── src/
    ├── hyper/
    │    ├──  __init__.py                   <- Makes 'hyper' a Python module
    │    ├── grid.py                        <- Functionality to sample hyper-parameter grid
    │    ├── evaluation.py                  <- Functionality to evaluate classifier performance, given hyper-parameters
    │    └── plotting.py                    <- Functionality to visualize results
    └── setup.py                            <- Makes 'hyper' pip-installable (pip install -e .)

Data description

We use the handwritten digits dataset provided by scikit-learn. For details on this dataset, see scikit-learn's documentation:

https://scikit-learn.org/stable/datasets/toy_dataset.html#digits-dataset

Installation

This project is written for Python 3.9.5 (we recommend pyenv for Python version management).

All software dependencies of this project are managed with Python Poetry. All details about the used package versions are provided in pyproject.toml.

To clone this repository to your local machine, run:

git clone https://github.com/athms/reproducible-modelling

To install all dependencies with poetry, run:

cd reproducible-modelling/
poetry install

To reproduce our analyses, you additionally need to install our custom Python module (src/hyper) in your poetry environment:

cd src/
poetry run pip install -e .

Reproducing our analysis

Our analysis can be reproduced either by running scripts/run_analysis.sh:

cd scripts
poetry run bash run_analysis.sh

..or by the use of make:

poetry run make <ANALYSIS TARGET>

We provide the following targets for make:

Analysis target	Description
all	Runs the entire analysis pipeline
load	Downloads scikit-learn's handwritten digit dataset
evaluate	Runs our cross-validated hyper-parameter evaluation
plot	Creates our results figure

This README file is strongly inspired by the Cookiecutter Data Science Structure

This is an example of a reproducible modelling project

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

Project organization

Data description

Installation

Reproducing our analysis

Owner

Armin Thomas

Tensorflow 2.x implementation of Vision-Transformer model

Generating Band-Limited Adversarial Surfaces Using Neural Networks

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

JAX + dataclasses

《DeepViT: Towards Deeper Vision Transformer》(2021)

Dirty Pixels: Towards End-to-End Image Processing and Perception

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectrograms, using the PyTorch Lightning.

A Deep learning based streamlit web app which can tell with which bollywood celebrity your face resembles.

Bravia core script for python

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

Implementation of Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

Nest Protect integration for Home Assistant. This will allow you to integrate your smoke, heat, co and occupancy status real-time in HA.

PyTorch implementation of MICCAI 2018 paper "Liver Lesion Detection from Weakly-labeled Multi-phase CT Volumes with a Grouped Single Shot MultiBox Detector"

(EI 2022) Controllable Confidence-Based Image Denoising

Continual Learning of Electronic Health Records (EHR).

SPEAR: Semi suPErvised dAta progRamming

Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]