MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

Overview

MEAL-V2

This is the official pytorch implementation of our paper: "MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks" by Zhiqiang Shen and Marios Savvides from Carnegie Mellon University.

In this paper, we introduce a simple yet effective approach that can boost the vanilla ResNet-50 to 80%+ Top-1 accuracy on ImageNet without any tricks. Generally, our method is based on the recently proposed MEAL, i.e., ensemble knowledge distillation via discriminators. We further simplify it through 1) adopting the similarity loss and discriminator only on the final outputs and 2) using the average of softmax probabilities from all teacher ensembles as the stronger supervision for distillation. One crucial perspective of our method is that the one-hot/hard label should not be used in the distillation process. We show that such a simple framework can achieve state-of-the-art results without involving any commonly-used tricks, such as 1) architecture modification; 2) outside training data beyond ImageNet; 3) autoaug/randaug; 4) cosine learning rate; 5) mixup/cutmix training; 6) label smoothing; etc.

Citation

If you find our code is helpful for your research, please cite:

@article{shen2020mealv2,
  title={MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks},
  author={Shen, Zhiqiang and Savvides, Marios},
  journal={arXiv preprint arXiv:2009.08453},
  year={2020}
}

News

[Dec. 5, 2021] New: Add FKD training support. We highly recommend to use FKD for training MEAL V2 models, which will be 2~4x faster with similar accuracy.

  • Download our soft label for MEAL V2.

  • run FKD_train.py with the desired model architecture, the path to the ImageNet dataset and the path to the soft label, for example:

    # 224 x 224 ResNet-50
    python FKD_train.py --save MEAL_V2_resnet50_224 \
    --batch-size 512 -j 48 \
    --model resnet50 --epochs 180 \
    --teacher-model gluon_senet154,gluon_resnet152_v1s \
    --imagenet [imagenet-folder with train and val folders] \
    --num_crops 8 --soft_label_type marginal_smoothing_k5 \
    --softlabel_path [path of soft label] \
    --schedule 100 180 --use-discriminator-loss

Add --cos if you would like to train with cosine learning rate.

New: Basically, adding back tricks (cosine lr, etc.) into MEAL V2 can consistently improve the accuracy:

New: Add CutMix training support, use --w-cutmix to enable it.

[Mar. 19, 2021] Long version of MEAL V2 is available on: arXiv or paper.

[Dec. 16, 2020] MEAL V2 is now available in PyTorch Hub.

[Nov. 3, 2020] Short version of MEAL V2 has been accepted in NeurIPS 2020 Beyond BackPropagation: Novel Ideas for Training Neural Architectures workshop. Long version is coming soon.

Preparation

1. Requirements:

This repo is tested with:

  • Python 3.6

  • CUDA 10.2

  • PyTorch 1.6.0

  • torchvision 0.7.0

  • timm 0.2.1 (pip install timm)

But it should be runnable with other PyTorch versions.

2. Data:

Results & Models

We provide pre-trained models with different trainings, we report in the table training/validation resolution, #parameters, Top-1 and Top-5 accuracy on ImageNet validation set:

Models Resolution #Parameters Top-1/Top-5 Trained models
MEAL-V1 w/ ResNet50 224 25.6M 78.21/94.01 GitHub
MEAL-V2 w/ ResNet18 224 11.7M 73.19/90.82 Download (46.8M)
MEAL-V2 w/ ResNet50 224 25.6M 80.67/95.09 Download (102.6M)
MEAL-V2 w/ ResNet50 380 25.6M 81.72/95.81 Download (102.6M)
MEAL-V2 + CutMix w/ ResNet50 224 25.6M 80.98/95.35 Download (102.6M)
MEAL-V2 w/ MobileNet V3-Small 0.75 224 2.04M 67.60/87.23 Download (8.3M)
MEAL-V2 w/ MobileNet V3-Small 1.0 224 2.54M 69.65/88.71 Download (10.3M)
MEAL-V2 w/ MobileNet V3-Large 1.0 224 5.48M 76.92/93.32 Download (22.1M)
MEAL-V2 w/ EfficientNet-B0 224 5.29M 78.29/93.95 Download (21.5M)

Training & Testing

1. Training:

  • To train a model, run script/train.sh with the desired model architecture and the path to the ImageNet dataset, for example:

    # 224 x 224 ResNet-50
    python train.py --save MEAL_V2_resnet50_224 --batch-size 512 -j 48 --model resnet50 --epochs 180 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders] 
    # 224 x 224 ResNet-50 w/ CutMix
    python train.py --save MEAL_V2_resnet50_224 --batch-size 512 -j 48 --model resnet50 --epochs 180 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders] --w-cutmix
    # 380 x 380 ResNet-50
    python train.py --save MEAL_V2_resnet50_380 --batch-size 512 -j 48 --model resnet50 --image-size 380 --teacher-model tf_efficientnet_b4_ns,tf_efficientnet_b4 --imagenet [imagenet-folder with train and val folders]
    # 224 x 224 MobileNet V3-Small 0.75
    python train.py --save MEAL_V2_mobilenetv3_small_075 --batch-size 512 -j 48 --model tf_mobilenetv3_small_075 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders] 
    # 224 x 224 MobileNet V3-Small 1.0
    python train.py --save MEAL_V2_mobilenetv3_small_100 --batch-size 512 -j 48 --model tf_mobilenetv3_small_100 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders] 
    # 224 x 224 MobileNet V3-Large 1.0
    python train.py --save MEAL_V2_mobilenetv3_large_100 --batch-size 512 -j 48 --model tf_mobilenetv3_large_100 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders] 
    # 224 x 224 EfficientNet-B0
    python train.py --save MEAL_V2_efficientnet_b0 --batch-size 512 -j 48 --model tf_efficientnet_b0 --teacher-model gluon_senet154,gluon_resnet152_v1s --imagenet [imagenet-folder with train and val folders] 

Please reduce the --batch-size if you get ''out of memory'' error. We also notice that more training epochs can slightly improve the performance.

  • To resume training a model, run script/resume_train.sh with the desired model architecture, starting number of training epoch and the path to the ImageNet dataset:

    sh script/resume_train.sh 

2. Testing:

  • To test a model, run inference.py with the desired model architecture, model path, resolution and the path to the ImageNet dataset:

    CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py -a resnet50 --res 224 --resume MODEL_PATH -e [imagenet-folder with train and val folders]

change --res with other image resolution [224/380] and -a with other model architecture [tf_mobilenetv3_small_100; tf_mobilenetv3_large_100; tf_efficientnet_b0] to test other trained models.

Contact

Zhiqiang Shen, CMU (zhiqians at andrew.cmu.edu)

Any comments or suggestions are welcome!

Comments
  •  what's the training result on imagenet when training from scratch ?

    what's the training result on imagenet when training from scratch ?

    Hi @MingSun-Tse, i have noticed that you said you may train you distillation from scratch (random initial) on imagenet , i am wondering whats your training result because i want to use your method to train on my own dataset , while all i have is a large model train on this dataset . should i train this model on resnet50 firstly and than use your code to finetune or i can directly use your code to distillation exists model ?

    opened by anxu829 9
  • resnet50 pretrained model has top1 ACC=79.02% ?

    resnet50 pretrained model has top1 ACC=79.02% ?

    Hi, I'm extremely interested with your work. But I'm confuse that your pretrained Resnet50 model already have top1 Acc=79.02%, which has a big gap from your paper baseline 76.5%. (The test code also use test.py in your porject) Have you try the pretrained model? Or did I go wrong? Thank you.

    (Resnet50 pretrained weight download from timm link: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/resnet50_ram-a26f946b.pth)

    good discussions 
    opened by yangydeng 9
  • some questions about experiment setting and discriminator

    some questions about experiment setting and discriminator

    HI~ @szq0214

    I'm highly intersted in your work! Here is a question, I hope you can give your thoughts about it.

    1. in experiment setting, why set weight_decay to 0, in general, weight_decay is important factor to the final performance, usually have 1% validation accuracy difference on ILSVRC2012 imagenet.

    2. about the discriminator, It contains three convolution operations, its inputs is the logits of student and combined logits of teachers, but the target for discriminator is not right, in code that is as following:

    target = torch.FloatTensor([[1, 0] for _ in range(batch_size//2)] + [[0, 1] for _ in range(batch_size//2)])

    I think the target should be [1,0] through the whole batch_size, so that is weird. are there any considerations? if so, the influence of discriminator loss is to make logit of students away from teachers, something like regularization?

    opened by freeman-1995 6
  • torch.nn.DataParallel error

    torch.nn.DataParallel error

    trian error :+1: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! how to solve this problem? i use to(device),but it do not work.error in @szq0214 image

    opened by gentlebreeze1 5
  • Discriminator LR Decay

    Discriminator LR Decay

    Thanks for your work and the code release!

    I have a small question about the lr decay schedule for the discriminator- the initial lr value for the discriminator is set to 1e-4 but it looks like it gets clobbered with the student lr value in _set_learning_rate:

    https://github.com/szq0214/MEAL-V2/blob/3558f37175f2a9e0514eb013a2021d344ef612b1/train.py#L94-L96

    Is this intentional? The discriminator is a simple model so I don't think this would make a big difference either way.

    Thanks

    opened by normster 5
  • torch.nn.DataParallel error

    torch.nn.DataParallel error

    I want to train MEAL-V2 on a machine with 4 gpus, the train script as follow : python train.py --gpus 0 1 2 3 --save MEAL_V2_resnet50_224 ...

    but get a error:

    ... 
    RuntimeError: Caught RuntimeError in replica 0 on device 0.
    ...
    RuntimeError: Caught RuntimeError in replica 1 on device 1.
    Original Traceback (most recent call last):
        File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
            output = module(*input, **kwargs)
        File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
            result = self.forward(*input, **kwargs)
        File "/mnt/codes/MEAL2-drink/models/discriminator.py", line 17, in forward
            out = F.relu(self.conv1(x))
        File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
             result = self.forward(*input, **kwargs)
        File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 399, in forward
             return self._conv_forward(input, self.weight, self.bias)
        File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
             self.padding, self.dilation, self.groups)
    RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
    
    opened by anonymoussss 2
  • why are top1 and top5  both 0.0?

    why are top1 and top5 both 0.0?

    I parpared my own data according to the imagenet format (train/ and val/ folders contain different classes of image folders.)and trained the model. But after 60 epoch ,the top1 and top5 are both still 0.0 What could be the problem?Looking forward to your reply.Thanks!

    INFO 2021-01-28 22:14:55,943: Epoch: [59][141/181] Time 1.25 (6.42) Data 0.00 (0.14) G_Loss 3.085 {3.283, 3.279} D_Loss 0.347 {0.347, 0.347} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.00, 0.00} LR 0.01000 INFO 2021-01-28 22:15:20,853: Epoch: [59][161/181] Time 1.24 (5.78) Data 0.00 (0.12) G_Loss 3.101 {3.267, 3.255} D_Loss 0.347 {0.347, 0.347} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.00, 0.00} LR 0.01000 INFO 2021-01-28 22:15:45,187: Epoch: [59][181/181] Time 0.65 (5.28) Data 0.00 (0.11) G_Loss 3.335 {3.266, 3.253} D_Loss 0.347 {0.347, 0.347} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.00, 0.00} LR 0.01000 INFO 2021-01-28 22:15:45,965: Epoch: [59] -- TRAINING SUMMARY Time 955.00 Data 19.59 G_Loss 3.266 D_Loss 0.347 Top-1 0.00 Top-5 0.00

    opened by zylxadz 2
  • 使用mobilenet_v2的预训练参数,top1的精度从0开始,请问这正常嘛?

    使用mobilenet_v2的预训练参数,top1的精度从0开始,请问这正常嘛?

    你好,感谢您优秀的工作。我使用ImageNet预训练的teacher模型为resnet101和resnet152, student模型为ImageNet的预训练模型mobilenet_v2,但是刚开始训练的top1精度为0,请问这正常嘛?我将student换成shufflenet_x1_0,top1精度为73.2%。谢谢您的回复!

    opened by yukaizhou 2
  • What is the performance of the teacher model

    What is the performance of the teacher model

    As the results of table II in your paper, trainning from scratch using Resnet obbtains 76.51% accuracy. When the input size is 224 x 224, the student model Resnet 50 obtains 80.67% accuracy with senet154 and resnet152 v1 applied as teacher models through MEAL-V2. So I am wondering what is the performance of the pre-trained teacher model since they are with larger and more effcitive architectures?

    opened by PyJulie 2
  • Paper Inconsistency with Code

    Paper Inconsistency with Code

    The initial LR in your "Experimental Settings" section in the ARXIV paper says you use 0.01.

    Screenshot from 2020-11-09 16-07-33

    Although, analyzing your source code your ResNet50 model uses an initial LR of 0.1.

    Screenshot from 2020-11-09 16-08-03

    I believe the paper is mistaken, as running your source code seems to be fine. In fact, the whole experimental setup is incorrect in comparison to this LR_REGIME.

    opened by nollied 2
  • Could not find the generator loss.

    Could not find the generator loss.

    Hi,

    thanks for your great job.

    When I read the code, I found there is only the discriminator loss and no generator loss. In other words, there is no adversarial training in MEALv2, which is different from my intuition. I want to know what is the advantage of just using the discriminator.

    opened by PeterouZh 2
Owner
Zhiqiang Shen
Zhiqiang Shen
Project page for our ICCV 2021 paper "The Way to my Heart is through Contrastive Learning"

The Way to my Heart is through Contrastive Learning: Remote Photoplethysmography from Unlabelled Video This is the official project page of our ICCV 2

36 Jan 06, 2023
This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

On Quantitative Evaluations of Counterfactuals Install To install required packages with conda, run the following command: conda env create -f requi

Frederik Hvilshøj 1 Jan 16, 2022
Image processing in Python

scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l

Image Processing Toolbox for SciPy 5.2k Dec 31, 2022
This is a clean and robust Pytorch implementation of DQN and Double DQN.

DQN/DDQN-Pytorch This is a clean and robust Pytorch implementation of DQN and Double DQN. Here is the training curve: All the experiments are trained

XinJingHao 15 Dec 27, 2022
Image Super-Resolution by Neural Texture Transfer

SRNTT: Image Super-Resolution by Neural Texture Transfer Tensorflow implementation of the paper Image Super-Resolution by Neural Texture Transfer acce

Zhifei Zhang 413 Nov 30, 2022
Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling" Pipeline of Tip-Adapter Tip-Adapter can provid

peng gao 187 Dec 28, 2022
Covid-19 Test AI (Deep Learning - NNs) Software. Accuracy is the %96.5, loss is the 0.09 :)

Covid-19 Test AI (Deep Learning - NNs) Software I developed a segmentation algorithm to understand whether Covid-19 Test Photos are positive or negati

Emirhan BULUT 28 Dec 04, 2021
Language Used: Python . Made in Jupyter(Anaconda) notebook.

FACE-DETECTION-ATTENDENCE-SYSTEM Made in Jupyter(Anaconda) notebook. Language Used: Python Steps to perform before running the program : Install Anaco

1 Jan 12, 2022
World Models with TensorFlow 2

World Models This repo reproduces the original implementation of World Models. This implementation uses TensorFlow 2.2. Docker The easiest way to hand

Zac Wellmer 234 Nov 30, 2022
O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

O-CNN This repository contains the implementation of our papers related with O-CNN. The code is released under the MIT license. O-CNN: Octree-based Co

Microsoft 607 Dec 28, 2022
Distance correlation and related E-statistics in Python

dcor dcor: distance correlation and related E-statistics in Python. E-statistics are functions of distances between statistical observations in metric

Carlos Ramos Carreño 108 Dec 27, 2022
Plaything for Autistic Children (demo for PaddlePaddle/Wechaty/Mixlab project)

星星的孩子 - 一款为孤独症孩子设计的聊天机器人游戏 孤独症儿童是目前常常被忽视的一类群体。他们有着类似性格内向的特征,实际却受着广泛性发育障碍的折磨。 项目背景 这类儿童在与人交往时存在着沟通障碍,其特点表现在: 社交交流差,互动障碍明显 认知能力有限,被动认知 兴趣狭窄,重复刻板,缺乏变化和想象

Tianyi Pan 35 Nov 24, 2022
A project to make Amazon Echo respond to sign language using your webcam

Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p

Abhishek Singh 444 Jan 03, 2023
Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

56 Dec 15, 2022
Few-shot Neural Architecture Search

One-shot Neural Architecture Search uses a single supernet to approximate the performance each architecture. However, this performance estimation is super inaccurate because of co-adaption among oper

Yiyang Zhao 38 Oct 18, 2022
Systemic Evolutionary Chemical Space Exploration for Drug Discovery

SECSE SECSE: Systemic Evolutionary Chemical Space Explorer Chemical space exploration is a major task of the hit-finding process during the pursuit of

64 Dec 16, 2022
Cobalt Strike teamserver detection.

Cobalt-Strike-det Cobalt Strike teamserver detection. usage: cobaltstrike_verify.py [-l TARGETS] [-t THREADS] optional arguments: -h, --help show this

TimWhite 17 Sep 27, 2022
Official source code of Fast Point Transformer, CVPR 2022

Fast Point Transformer Project Page | Paper This repository contains the official source code and data for our paper: Fast Point Transformer Chunghyun

182 Dec 23, 2022
implementation of the paper "MarginGAN: Adversarial Training in Semi-Supervised Learning"

MarginGAN This repository is the implementation of the paper "MarginGAN: Adversarial Training in Semi-Supervised Learning". 1."preliminary" is the imp

Van 7 Dec 23, 2022
Joint-task Self-supervised Learning for Temporal Correspondence (NeurIPS 2019)

Joint-task Self-supervised Learning for Temporal Correspondence Project | Paper Overview Joint-task Self-supervised Learning for Temporal Corresponden

Sifei Liu 167 Dec 14, 2022