Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

Overview

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

This is the code for the paper:

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
Presented at ICML 2018

Please note that this is not an officially supported Google product.

If you find this code useful in your research then please cite

@inproceedings{jiang2018mentornet,
  title={MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels},
  author={Jiang, Lu and Zhou, Zhengyuan and Leung, Thomas and Li, Li-Jia and Fei-Fei, Li},
  booktitle={ICML},
  year={2018}
}

Introduction

We are interested in training a deep network using curriculum learning (Bengio et al., 2009), i.e. learning examples with focus. Each curriculum is implemented as a network (called MentorNet).

  • During training, MentorNet supervises the training of the base network (called StudentNet).
  • At the test time, StudentNet makes prediction alone without MentorNet.

Training Overview

Setups

All code was developed and tested on Nvidia V100/P100 (16GB) the following environment.

  • Ubuntu 18.04
  • Python 2.7.15
  • TensorFlow 1.8.0
  • numpy 1.13.3
  • imageio 2.3.0

Download Cloud SDK to get data and models. Next we need to download the dataset and pre-trained MentorNet models. Put them into the same directory as the code directory.

gsutil -m cp -r gs://mentornet_project/data .
gsutil -m cp -r gs://mentornet_project/mentornet_models .

Alternatively, you may download the zip files: data and models.

Running MentorNet on CIFAR

export PYTHONPATH="$PYTHONPATH:$PWD/code/"

python code/cifar_train_mentornet.py \
  --dataset_name=cifar10   \
  --trained_mentornet_dir=mentornet_models/models/mentornet_pd1_g_1/mentornet_pd \
  --loss_p_precentile=0.75  \
  --nofixed_epoch_after_burn_in  \
  --burn_in_epoch=0  \
  --example_dropout_rates="0.5,17,0.05,83" \
  --data_dir=data/cifar10/0.2 \
  --train_log_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \
  --studentnet=resnet101 \
  --max_number_of_steps=39000

A full list of commands can be found in this file. The training script has a number of command-line flags that you can use to configure the model architecture, hyperparameters, and input / output settings:

  • --trained_mentornet_dir: Directory where to find the trained MentorNet model, created by mentornet_learning/train.py.
  • --loss_p_percentile: p-percentile used to compute the loss moving average. Default is 0.7.
  • --burn_in_epoch: Number of first epochs to perform burn-in. In the burn-in period, every sample has a fixed 1.0 weight. Default is 0.
  • --fixed_epoch_after_burn_in: Whether to use the fixed epoch as the MentorNet input feature after the burn-in period. Set True for MentorNet DD. Default is False.
  • --loss_moving_average_decay: Decay factor used in moving average. Default is 0.5.
  • --example_dropout_rates: Comma-separated list indicating the example drop-out rate for the total of 100 epochs. The format is [dropout rate, epoch_num]+, the piecewise drop-out rate from boundaries and values. The sum of epoch_num is 100. Drop-out means the probability of setting sample weights to zeros proposed (Liang et al., 2016). Default is 0.5, 17, 0.05, 78, 1.0, 5.

To evaluate a model, run the evaluation job in parallel with the training job (on a different GPU).

python cifar/cifar_eval.py \
 --dataset_name=cifar10 \
 --data_dir=cifar/data/cifar10/val/ \
 --checkpoint_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \
 --eval_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1//eval_val \
 --studentnet=resnet101 \
 --device_id=1

A complete list of commands of running experiments can be found at commands/train_studentnet_resnet.sh and commands/train_studentnet_inception.sh.

MentorNet Framework

MentorNet is a general framework for curriculum learning, where various curriculums can be learned by the same MentorNet structure of different parameters.

It is flexible as we can switch curriculums by attaching different MentorNets without modifying the pipeline.

We train a few MentorNets listed below. We can think of a MentorNet as a hyper-parameter and will be tuned for different problems.

Curriculum Visualization Intuition Model Name
No curriculum image Assign uniform weight to every sample uniform. baseline_mentornet
Self-paced
(Kuma et al. 2010)
image Favor samples of smaller loss. self_paced_mentornet
SPCL linear
(Jiang et al. 2015)
image Discount the weight by loss linearly. spcl_linear_mentornet
Hard example mining
(Felzenszwalb et al., 2008)
image Favor samples of greater loss. hard_example_mining_mentornet
Focal loss
(Lin et al., 2017)
image Increase the weight by loss by the exponential CDF. focal_loss_mentornet
Predefined Mixture image Mixture of SPL and SPCL changing by epoch. mentornet_pd
MentorNet Data-driven image Learned on a small subset of the CIFAR data. mentornet_dd

Note there are many more curriculums can be trained by MentorNet, for example, prediction variance (Chang et al., 2017), implicit regularizer (Fan et al. 2017), self-paced with diversity (Jiang et al. 2014), sample re-weighting (Dehghani et al., 2018, Ren et al., 2018), etc.

Performance

The numbers are slightly different from the ones reported in the paper due to the re-implementation on the third party library.

CIFAR-10 ResNet

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.796 0.822 0.797 0.910 0.914
0.4 0.568 0.802 0.634 0.776 0.887
0.8 0.238 0.297 0.25 0.283 0.463

CIFAR-100 ResNet

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.624 0.652 0.613 0.733 0.726
0.4 0.448 0.509 0.467 0.567 0.675
0.8 0.084 0.089 0.079 0.193 0.301

CIFAR-10 Inception

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.775 0.784 0.747 0.798 0.800
0.4 0.72 0.733 0.695 0.731 0.763
0.8 0.29 0.272 0.309 0.312 0.461

CIFAR-100 Inception

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.42 0.408 0.391 0.451 0.466
0.4 0.346 0.32 0.313 0.386 0.411
0.8 0.108 0.091 0.107 0.125 0.203

Algorithm

We propose an algorithm to optimize the StudentNet model parameter w jointly with a

given MentorNet. Unlike the alternating minimization, it minimizes w (StudentNet parameter) and v (sample weight) stochastically over mini-batches.

The curriculum can change during training, and MentorNet is updated a few times in the algorithm.

Algorithm

To learn new curriculums (Step 6), see this page.

We found specific MentorNet architectures do not matter that much.

References

  • Bengio, Yoshua, et al. "Curriculum learning". In ICML, 2009.
  • Kumar M. Pawan, Packer Benjamin, and Koller Daphne "Self-paced learning for latent variable models". In NIPS, 2010.
  • Jiang, Lu et al. "Self-paced Learning with Diversity", In NIPS 2014
  • Jiang, Lu, et al. "Self-Paced Curriculum Learning." In AAAI. 2015.
  • Liang, Junwei et al. Learning to Detect Concepts from Webly-Labeled Video Data, In IJCAI 2016.
  • Lin, Tsung-Yi, et al. "Focal loss for dense object detection." In ICCV. 2017.
  • Fan, Yanbo, et al. "Self-Paced Learning: an Implicit Regularization Perspective." In AAAI 2017.
  • Felzenszwalb, Pedro, et al. "A discriminatively trained, multiscale, deformable part model." In CVPR 2008.
  • Dehghani, Mostafa, et al. "Fidelity-Weighted Learning." In ICLR 2018.
  • Ren, Mengye, et al. "Learning to reweight examples for robust deep learning." In ICML 2018.
  • Fan, Yang, et al. "Learning to Teach." In ICLR 2018.
  • Chang, Haw-Shiuan, et al. "Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples." In NIPS 2017.
Owner
Google
Google ❤️ Open Source
Google
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks Code for “Efficient Sharpness-aware Minimization for Improved Training

Angusdu 32 Oct 18, 2022
MoCoGAN: Decomposing Motion and Content for Video Generation

MoCoGAN: Decomposing Motion and Content for Video Generation This repository contains an implementation and further details of MoCoGAN: Decomposing Mo

Sergey Tulyakov 514 Dec 18, 2022
Tutorials, assignments, and competitions for MIT Deep Learning related courses.

MIT Deep Learning This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. Tutorial: Deep Learning

Lex Fridman 9.5k Jan 07, 2023
Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch

Next Word Prediction Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch 🎬 Project Demo ✔ Application is hosted on Streamlit. You can see t

Vivek7 3 Aug 26, 2022
A sketch extractor for anime/illustration.

Anime2Sketch Anime2Sketch: A sketch extractor for illustration, anime art, manga By Xiaoyu Xiang Updates 2021.5.2: Upload more example results of anim

Xiaoyu Xiang 1.6k Jan 01, 2023
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX Foolbox is a Python li

Bethge Lab 2.4k Dec 25, 2022
A two-stage U-Net for high-fidelity denoising of historical recordings

A two-stage U-Net for high-fidelity denoising of historical recordings Official repository of the paper (not submitted yet): E. Moliner and V. Välimäk

Eloi Moliner Juanpere 57 Jan 05, 2023
This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

Dugh-NeurIPS-2021 This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroi

Ali Hashemi 5 Jul 12, 2022
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

Hugging Face 1.4k Jan 05, 2023
StyleGAN2-ADA-training-jupyter - Training custom datasets in styleGAN2-ADA by NVIDIA using Jupyter

styleGAN2-ADA-training-jupyter Training custom datasets in styleGAN2-ADA on Jupyter Official StyleGAN2-ADA by NIVIDIA Paper Training Generative Advers

Mang Su Hyun 2 Feb 24, 2022
This code finds bounding box of a single human mouth.

This code finds bounding box of a single human mouth. In comparison to other face segmentation methods, it is relatively insusceptible to open mouth conditions, e.g., yawning, surgical robots, etc. T

iThermAI 4 Nov 27, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
DeLag: Detecting Latency Degradation Patterns in Service-based Systems

DeLag: Detecting Latency Degradation Patterns in Service-based Systems Replication package of the work "DeLag: Detecting Latency Degradation Patterns

SEALABQualityGroup @ University of L'Aquila 2 Mar 24, 2022
Standalone pre-training recipe with JAX+Flax

Sabertooth Sabertooth is standalone pre-training recipe based on JAX+Flax, with data pipelines implemented in Rust. It runs on CPU, GPU, and/or TPU, b

Nikita Kitaev 26 Nov 28, 2022
A task Provided by A respective Artenal Ai and Ml based Company to complete it

A task Provided by A respective Alternal Ai and Ml based Company to complete it .

Parth Madan 1 Jan 25, 2022
Self-Regulated Learning for Egocentric Video Activity Anticipation

Self-Regulated Learning for Egocentric Video Activity Anticipation Introduction This is a Pytorch implementation of the model described in our paper:

qzhb 13 Sep 23, 2022
Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

Nabil Ibtehaz 308 Jan 05, 2023
Latex code for making neural networks diagrams

PlotNeuralNet Latex code for drawing neural networks for reports and presentation. Have a look into examples to see how they are made. Additionally, l

Haris Iqbal 18.6k Jan 01, 2023
Some methods for comparing network representations in deep learning and neuroscience.

Generalized Shape Metrics on Neural Representations In neuroscience and in deep learning, quantifying the (dis)similarity of neural representations ac

Alex Williams 45 Dec 27, 2022
A Japanese Medical Information Extraction Toolkit

JaMIE: a Japanese Medical Information Extraction toolkit Joint Japanese Medical Problem, Modality and Relation Recognition The Train/Test phrases requ

7 Dec 12, 2022