Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Overview

Data Efficient Stagewise Knowledge Distillation

Stagewise Training Procedure

Table of Contents

This repository presents the code implementation for Stagewise Knowledge Distillation, a technique for improving knowledge transfer between a teacher model and student model.

Requirements

  • Install the dependencies using conda with the requirements.yml file
    conda env create -f environment.yml
    
  • Setup the stagewise-knowledge-distillation package itself
    pip install -e .
    
  • Apart from the above mentioned dependencies, it is recommended to have an Nvidia GPU (CUDA compatible) with at least 8 GB of video memory (most of the experiments will work with 6 GB also). However, the code works with CPU only machines as well.

Image Classification

Introduction

In this work, ResNet architectures are used. Particularly, we used ResNet10, 14, 18, 20 and 26 as student networks and ResNet34 as the teacher network. The datasets used are CIFAR10, Imagenette and Imagewoof. Note that Imagenette and Imagewoof are subsets of ImageNet.

Preparation

  • Before any experiments, you need to download the data and saved weights of teacher model to appropriate locations.

  • The following script

    • downloads the datasets
    • saves 10%, 20%, 30% and 40% splits of each dataset separately
    • downloads teacher model weights for all 3 datasets
    # assuming you are in the root folder of the repository
    cd image_classification/scripts
    bash setup.sh
    

Experiments

For detailed information on the various experiments, refer to the paper. In all the image classification experiments, the following common training arguments are listed with the possible values they can take:

  • dataset (-d) : imagenette, imagewoof, cifar10
  • model (-m) : resnet10, resnet14, resnet18, resnet20, resnet26, resnet34
  • number of epochs (-e) : Integer is required
  • percentage of dataset (-p) : 10, 20, 30, 40 (don't use this argument at all for full dataset experiments)
  • random seed (-s) : Give any random seed (for reproducibility purposes)
  • gpu (-g) : Don't use unless training on CPU (in which case, use -g 'cpu' as the argument). In case of multi-GPU systems, run CUDA_VISIBLE_DEVICES=id in the terminal before the experiment, where id is the ID of your GPU according to nvidia-smi output.
  • Comet ML API key (-a) (optional) : If you want to use Comet ML for tracking your experiments, then either put your API key as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.
  • Comet ML workspace (-w) (optional) : If you want to use Comet ML for tracking your experiments, then either put your workspace name as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.

In the following subsections, example commands for training are given for one experiment each.

No Teacher

Full Imagenette dataset, ResNet10

python3 no_teacher.py -d imagenette -m resnet10 -e 100 -s 0

Traditional KD (FitNets)

20% Imagewoof dataset, ResNet18

python3 traditional_kd.py -d imagewoof -m resnet18 -p 20 -e 100 -s 0

FSP KD

30% CIFAR10 dataset, ResNet14

python3 fsp_kd.py -d cifar10 -m resnet14 -p 30 -e 100 -s 0

Attention Transfer KD

10% Imagewoof dataset, ResNet26

python3 attention_transfer_kd.py -d imagewoof -m resnet26 -p 10 -e 100 -s 0

Hinton KD

Full CIFAR10 dataset, ResNet14

python3 hinton_kd.py -d cifar10 -m resnet14 -e 100 -s 0

Simultaneous KD (Proposed Baseline)

40% Imagenette dataset, ResNet20

python3 simultaneous_kd.py -d imagenette -m resnet20 -p 40 -e 100 -s 0

Stagewise KD (Proposed Method)

Full CIFAR10 dataset, ResNet10

python3 stagewise_kd.py -d cifar10 -m resnet10 -e 100 -s 0

Semantic Segmentation

Introduction

In this work, ResNet backbones are used to construct symmetric U-Nets for semantic segmentation. Particularly, we used ResNet10, 14, 18, 20 and 26 as the backbones for student networks and ResNet34 as the backbone for the teacher network. The dataset used is the Cambridge-driving Labeled Video Database (CamVid).

Preparation

  • The following script
    • downloads the data (and shifts it to appropriate folder)
    • saves 10%, 20%, 30% and 40% splits of each dataset separately
    • downloads the pretrained teacher weights in appropriate folder
    # assuming you are in the root folder of the repository
    cd semantic_segmentation/scripts
    bash setup.sh
    

Experiments

For detailed information on the various experiments, refer to the paper. In all the semantic segmentation experiments, the following common training arguments are listed with the possible values they can take:

  • dataset (-d) : camvid
  • model (-m) : resnet10, resnet14, resnet18, resnet20, resnet26, resnet34
  • number of epochs (-e) : Integer is required
  • percentage of dataset (-p) : 10, 20, 30, 40 (don't use this argument at all for full dataset experiments)
  • random seed (-s) : Give any random seed (for reproducibility purposes)
  • gpu (-g) : Don't use unless training on CPU (in which case, use -g 'cpu' as the argument). In case of multi-GPU systems, run CUDA_VISIBLE_DEVICES=id in the terminal before the experiment, where id is the ID of your GPU according to nvidia-smi output.
  • Comet ML API key (-a) (optional) : If you want to use Comet ML for tracking your experiments, then either put your API key as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.
  • Comet ML workspace (-w) (optional) : If you want to use Comet ML for tracking your experiments, then either put your workspace name as the argument or make it the default argument in the arguments.py file. Otherwise, no need of using this argument.

Note: Currently, there are no plans for adding Attention Transfer KD and FSP KD experiments for semantic segmentation.

In the following subsections, example commands for training are given for one experiment each.

No Teacher

Full CamVid dataset, ResNet10

python3 pretrain.py -d camvid -m resnet10 -e 100 -s 0

Traditional KD (FitNets)

20% CamVid dataset, ResNet18

python3 traditional_kd.py -d camvid -m resnet18 -p 20 -e 100 -s 0

Simultaneous KD (Proposed Baseline)

40% CamVid dataset, ResNet20

python3 simultaneous_kd.py -d camvid -m resnet20 -p 40 -e 100 -s 0

Stagewise KD (Proposed Method)

10 % CamVid dataset, ResNet10

python3 stagewise_kd.py -d camvid -m resnet10 -p 10 -e 100 -s 0

Citation

If you use this code or method in your work, please cite using

@misc{kulkarni2020data,
      title={Data Efficient Stagewise Knowledge Distillation}, 
      author={Akshay Kulkarni and Navid Panchi and Sharath Chandra Raparthy and Shital Chiddarwar},
      year={2020},
      eprint={1911.06786},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Built by Akshay Kulkarni, Navid Panchi and Sharath Chandra Raparthy.

Owner
IvLabs
Robotics and AI community of VNIT
IvLabs
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning PyTorch code for the ICCV 2021 paper: Always Be Dreaming: A New Approach f

49 Dec 21, 2022
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Octavio Arriaga 5.3k Dec 30, 2022
The code uses SegFormer for Semantic Segmentation on Drone Dataset.

SegFormer_Segmentation The code uses SegFormer for Semantic Segmentation on Drone Dataset. The details for the SegFormer can be obtained from the foll

Dr. Sander Ali Khowaja 1 May 08, 2022
Self Driving RC Car Code

Derp Learning Derp Learning is a Python package that collects data, trains models, and then controls an RC car for track racing. Hardware You will nee

Not Karol 39 Dec 07, 2022
Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Deep Unsupervised Image Hashing by Maximizing Bit Entropy This is the PyTorch implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hash

62 Dec 30, 2022
Dataset and codebase for NeurIPS 2021 paper: Exploring Forensic Dental Identification with Deep Learning

Repository under construction. Example dataset, checkpoints, and training/testing scripts will be avaible soon! 💡 Collated best practices from most p

4 Jun 26, 2022
Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems This repository is the official implementation of Rever

6 Aug 25, 2022
MINOS: Multimodal Indoor Simulator

MINOS Simulator MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environ

194 Dec 27, 2022
An implementation of RetinaNet in PyTorch.

RetinaNet An implementation of RetinaNet in PyTorch. Installation Training COCO 2017 Pascal VOC Custom Dataset Evaluation Todo Credits Installation In

Conner Vercellino 297 Jan 04, 2023
Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

Init Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger. 本项目基于 https://github.com/jaywalnut310/vits https://github.com/S

AmorTX 107 Dec 23, 2022
A library for graph deep learning research

Documentation | Paper [JMLR] | Tutorials | Benchmarks | Examples DIG: Dive into Graphs is a turnkey library for graph deep learning research. Why DIG?

DIVE Lab, Texas A&M University 1.3k Jan 01, 2023
A PyTorch Implementation of SphereFace.

SphereFace A PyTorch Implementation of SphereFace. The code can be trained on CASIA-Webface and the best accuracy on LFW is 99.22%. SphereFace: Deep H

carwin 685 Dec 09, 2022
Jittor implementation of PCT:Point Cloud Transformer

PCT: Point Cloud Transformer This is a Jittor implementation of PCT: Point Cloud Transformer.

MenghaoGuo 547 Jan 03, 2023
Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator

DRL-robot-navigation Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gra

87 Jan 07, 2023
This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

Generative Dynamic Patch Attack This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack". Requirements PyTo

Xiang Li 8 Nov 17, 2022
Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Self-Supervised Pillar Motion Learning for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Self-Supervised Pillar Motion Learning for Autono

QCraft 101 Dec 05, 2022
NLMpy - A Python package to create neutral landscape models

NLMpy is a Python package for the creation of neutral landscape models that are widely used by landscape ecologists to model ecological patterns

Manaaki Whenua – Landcare Research 1 Oct 08, 2022
Towards uncontrained hand-object reconstruction from RGB videos

Towards uncontrained hand-object reconstruction from RGB videos Yana Hasson, Gül Varol, Ivan Laptev and Cordelia Schmid Project page Paper Table of Co

Yana 69 Dec 27, 2022
A PyTorch re-implementation of Neural Radiance Fields

nerf-pytorch A PyTorch re-implementation Project | Video | Paper NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis Ben Mildenhall

Krishna Murthy 709 Jan 09, 2023