Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Last update: Jan 04, 2023

Related tags

Overview

Automated Learning Rate Scheduler for Large-Batch Training

The official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML).

Overview

AutoWU is an automated LR scheduler which consists of two phases: warmup and decay. Learning rate (LR) is increased in an exponential rate until the loss starts to increase, and in the decay phase LR is decreased following the pre-specified type of the decay (either cosine or constant-then-cosine, in our experiments).

Transition from the warmup to the decay phase is done automatically by testing whether the minimum of the predicted loss curve is attained in the past or not with high probability, and the prediction is made via Gaussian Process regression.

How to use

Setup

pip install -r requirements.txt

Quick use

You can use AutoWU as other PyTorch schedulers, except that it takes loss as an argument (like ReduceLROnPlateau in PyTorch). The following code snippet demonstrates a typical usage of AutoWU.

from autowu import AutoWU

...

scheduler = AutoWU(optimizer,
                   len(train_loader),  # the number of steps in one epoch 
                   total_epochs,  # total number of epochs
                   immediate_cooldown=True,
                   cooldown_type='cosine',
                   device=device)

...

for _ in range(total_epochs):
    for inputs, targets in train_loader:
        loss = loss_fn(model(inputs), targets)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        scheduler.step(loss)

The default decay phase schedule is ''cosine''. To use constant-then-cosine schedule rather than cosine, set immediate_cooldown=False and set cooldown_fraction to a desired value:

scheduler = AutoWU(optimizer,
                   len(train_loader),  # the number of steps in one epoch 
                   total_epochs,  # total number of epochs
                   immediate_cooldown=False,
                   cooldown_type='cosine',
                   cooldown_fraction=0.2,  # fraction of cosine decay at the end
                   device=device)

Reproduction of results

We provide an exemplar training script train.py which is based on Pytorch Image Models. The script supports training ResNet-50 and EfficientNet-B0 on ImageNet classification under the setting almost identical to the paper. We report the top-1 accuracy of ResNet-50 and EfficientNet-B0 on the validation set trained with batch sizes 4K (4096) and 16K (16384), along with the scores reported in our paper.

ResNet-50	This repo.	Reported (paper)
4K	75.54%	75.70%
16K	74.87%	75.22%

EfficientNet-B0	This repo.	Reported (paper)
4K	75.74%	75.81%
16K	75.66%	75.44%

You can use distributed.launch util to run the script. For instance, in case of ResNet-50 training with batch size 4096, execute the following line with variables set according to your environment:

python -m torch.distributed.launch \
--nproc_per_node=4 \
--nnodes=4 \
--node_rank=$NODE_RANK \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
train.py \
--data-root $DATA_ROOT \
--amp \
--batch-size 256

In addition, add --model efficientnet_b0 argument in case of EfficientNet-B0 training.

Citation

@inproceedings{
    kim2021automated,
    title={Automated Learning Rate Scheduler for Large-batch Training},
    author={Chiheon Kim and Saehoon Kim and Jongmin Kim and Donghoon Lee and Sungwoong Kim},
    booktitle={8th ICML Workshop on Automated Machine Learning (AutoML)},
    year={2021},
    url={https://openreview.net/forum?id=ljIl7KCNYZH}
}

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Related tags

Overview

Automated Learning Rate Scheduler for Large-Batch Training

Overview

How to use

Setup

Quick use

Reproduction of results

Citation

License

Owner

Kakao Brain

[ICML'21] Estimate the accuracy of the classifier in various environments through self-supervision

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

MTCNN face detection implementation for TensorFlow, as a PIP package.

Official Implementation of "DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization."

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

Reproduces ResNet-V3 with pytorch

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

this is a lite easy to use virtual keyboard project for anyone to use

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

Official implementation of TMANet.

Automatically creates genre collections for your Plex media

Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)

On the Analysis of French Phonetic Idiosyncrasies for Accent Recognition

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

Reproducing code of hair style replacement method from Barbershorp.

CURL: Contrastive Unsupervised Representations for Reinforcement Learning