[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Last update: Nov 10, 2022

Related tags

Overview

MosaicKD

Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data"

1. Motivation

Natural images share common local patterns. In MosaicKD, these local patterns are first dissembled from OOD data and then assembled to synthesize in-domain data, making OOD-KD feasible.

2. Method

MosaicKD establishes a four-player minimax game between a generator G, a patch discriminator D, a teacher model T and a student model S. The generator, as those in prior GANs, takes as input a random noise vector and learns to mosaic synthetic in-domain samples with locally-authentic and globally-legitimate distributions, under the supervisions back-propagated from the other three players.

3. Reproducing our results

3.1 Prepare teachers

Please download our pre-trained models from Dropbox (266 M) and extract them as "checkpoints/pretrained/*.pth". You can also train your own models as follows:

python train_scratch.py --lr 0.1 --batch-size 256 --model wrn40_2 --dataset cifar100

3.2 OOD-KD: CIFAR-100 (ID) + CIFAR10 (OOD)

Vanilla KD (Blind KD)

python kd_vanilla.py --lr 0.1 --batch-size 128 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --gpu 0

Data-Free KD (DFQAD)

python kd_datafree.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --gpu 0

MosaicKD (This work)

python kd_mosaic.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --gpu 0

3.3 OOD-KD: CIFAR-100 (ID) + ImageNet/Places365 OOD Subset (OOD)

Prepare 32x32 datasets
Please prepare the 32x32 ImageNet following the instructions from https://patrykchrabaszcz.github.io/Imagenet32/ and extract them as "data/ImageNet_32x32/train" and "data/ImageNet_32x32/val". You can prepare Places365 in the same way.

MosaicKD on OOD subset
As ImageNet & Places365 contain a large number of in-domain samples, we construct OOD subset for training. Please run the scripts with ''--ood_subset'' to enable subset selection.

python kd_mosaic.py --lr 0.1 --batch-size 256 --teacher wrn40_2 --student wrn16_1 --dataset cifar100 --unlabeled cifar10 --epoch 200 --lr 0.1 --local 1 --align 1 --adv 1 --balance 10 --ood_subset --gpu 0

4. Visualization of synthetic data

5. Citation

If you found this work useful for your research, please cite our paper:

@article{fang2021mosaicking,
  title={Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data},
  author={Gongfan Fang and Yifan Bao and Jie Song and Xinchao Wang and Donglin Xie and Chengchao Shen and Mingli Song},
  journal={arXiv preprint arXiv:2110.15094},
  year={2021}
}

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Related tags

Overview

MosaicKD

1. Motivation

2. Method

3. Reproducing our results

3.1 Prepare teachers

3.2 OOD-KD: CIFAR-100 (ID) + CIFAR10 (OOD)

3.3 OOD-KD: CIFAR-100 (ID) + ImageNet/Places365 OOD Subset (OOD)

4. Visualization of synthetic data

5. Citation

Owner

ZJU-VIPA

Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

ROS-UGV-Control-Interface - Control interface which can be used in any UGV

Rule Based Classification Project For Python

Training Very Deep Neural Networks Without Skip-Connections

Classification models 1D Zoo - Keras and TF.Keras

A trashy useless Latin programming language written in python.

Methods to get the probability of a changepoint in a time series.

Randomizes the warps in a stock pokeemerald repo.

Outlier Exposure with Confidence Control for Out-of-Distribution Detection

Code for IntraQ, PyTorch implementation of our paper under review

Breast Cancer Detection 🔬 ITI "AI_Pro" Graduation Project

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Implementation of OpenAI paper with Simple Noise Scale on Fastai V2

Analysing poker data from home games with friends

scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

Reviving Iterative Training with Mask Guidance for Interactive Segmentation