auto-tuning momentum SGD optimizer

Overview

YellowFin Build Status

YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measures the objective landscape on-the-fly and tunes momentum as well as learning rate using local quadratic approximation.

The implementation here can be a drop-in replacement for any optimizer in PyTorch. It supports step and zero_grad functions like any PyTorch optimizer after from yellowfin import YFOptimizer. We also provide interface to manually set the learning rate schedule at every iteration for finer control (see Detailed Guideline Section).

For more technical details, please refer to our paper YellowFin and the Art of Momentum Tuning.

For more usage details, please refer to the inline documentation of tuner_utils/yellowfin.py. Example usage can be found here for ResNext on CIFAR10 and Tied LSTM on PTB.

YellowFin is under active development. Many members of the community have kindly submitted issues and pull requests. We are incorporating fixes and smoothing things out. As a result the repository code is in flux. Please make sure you use the latest version and submit any issues you might have!

Updates

[2017.07.03] Fixed a gradient clipping bug. Please pull our latest master branch to make gradient clipping great again in YellowFin.

[2017.07.28] Switched to logrithmic smoothing to accelerate adaptation to curvature range trends.

[2017.08.01] Added optional feature to enforce non-increasing value of lr * gradient norm for stablity in some rare cases.

[2017.08.05] Added feature to correct estimation bias from sparse gradient.

[2017.08.16] Replace numpy root solver with closed form solution using Vieta's substitution for cubic eqaution. It solves the stability issue of the numpy root solver.

[2017.10.29] Major fixe for stability. We added eps to protect fractions in our code, as well as an adaptive clipping feature to properly deal with exploding gradient (manual clipping is still supported as described in the detailed instruction below).

Setup instructions for experiments

Please clone the master branch and follow the instructions to run YellowFin on ResNext for CIFAR10 and tied LSTM on Penn Treebank for language modeling. The models are adapted from ResNext repo and PyTorch example tied LSTM repo respectively. Thanks to the researchers for developing the models. For more experiments on more convolutional and recurrent neural networks, please refer to our Tensorflow implementation of YellowFin.

Note YellowFin is tested with PyTorch v0.2.0 for compatibility. It is tested under Python 2.7.

Run CIFAR10 ResNext experiments

The experiments on 110 layer ResNet with CIFAR10 and 164 layer ResNet with CIFAR100 can be launched using

cd pytorch-cifar
python main.py --logdir=path_to_logs --opt_method=YF

Run Penn Treebank tied LSTM experiments

The experiments on multiple-layer LSTM on Penn Treebank can be launched using

cd word_language_model
python main.py --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --opt_method=YF --logdir=path_to_logs --cuda

For more experiments, please refer to our YellowFin Tensorflow Repo.

Detailed guidelines

  • Basic use: optimizer = YFOptimizer(parameter_list) uses the uniform setting (i.e. without tuning) for all the PyTorch and Tensorflow experiments in our paper.

  • Interface for manual finer control: If you want to more finely control the learning rate (say using a manually set constant learning rate), or you want to use the typical lr-dropping technique after a ceritain number of epochs, please use set_lr_factor() in the YFOptimizer class. E.g. if you want to use a manually set constant learning rate, you can run set_lr_factor(desired_lr / self._lr) before self.step() at each iteration. Or e.g., if you want to always multiply a factor 2.0 to the learning rate originally tuned by YellowFin, you may use optimizer.set_lr_factor(2.0) right after optimizer = YFOptimizer(parameter_list) and before training with YellowFin. More details can be found here. (The argument lr and mu during YFOptimizer initialization are dummy, only for backward compatibility)

  • Gradient clipping: The default setting uses adaptive gradient clipping to prevent gradient explosion, thresholding norm of gradient to the square root of our estimated maximal curvature. There are three cases regarding gradient clipping. We recommend first turning off gradient clipping, and only turning it on when necessary.

    • If you want to manually set threshold to clip the gradient, please first use adapt_clip=False to turn off the auto-clipping feature. Then, you can consider either using the clip_thresh=thresh_on_the_gradient_norm argument when initializing the YFOptimizer to clip acoording to your set threshold inside YFOptimizer, or clipping the gradient outside of YFOptimizer before step() is called.

    • If you want to totally turn off gradient clipping in YFOptimizer, please use clip_thresh=None, adapt_clip=False when initializing the YFOptimizer.

  • Normalization: When using log probability style losses, please make sure the loss is properly normalized. In some RNN/LSTM cases, the cross_entropy need to be averaged by the number of samples in a minibatch. Sometimes, it also needs to be averaged over the number of classes and the sequence length of each sample in some PyTorch loss functions. E.g. in nn.MultiLabelSoftMarginLoss, size_average=True needs to be set.

  • Non-increasing move: In some rare cases, we have observe increasing value of lr * || grad ||, i.e. the move, may result in unstableness. We implemented an engineering trick to enforce non-increasing value of lr * || grad ||. The default setting turns the feature off, you can turn it on with force_non_inc_step_after_iter=the starting iter you want to enforce the non-increasing value if it is really necessary. We recommend force_non_inc_step_after_iter to be at least a few hundreds because some models may need to gradually raise the magnitude of gradient in the beginning (e.g. a model, not properly initialized, may have near zero-gradient and need iterations to get reasonable gradient level).

Citation

If you use YellowFin in your paper, please cite the paper:

@article{zhang2017yellowfin,
  title={YellowFin and the Art of Momentum Tuning},
  author={Zhang, Jian and Mitliagkas, Ioannis and R{\'e}, Christopher},
  journal={arXiv preprint arXiv:1706.03471},
  year={2017}
}

Acknowledgement

We thank Olexa Bilaniuk, Andrew Drozdov, Paroma Varma, Bryan He, as well as github user @elPistolero @esvhd for the help in contributing to and testing the codebase.

Implementation for other platforms

For Tensorflow users, we implemented YellowFin Tensorflow Repo.

We thank the contributors for YellowFin in different deep learning frameworks.

Owner
Jian Zhang
PhD student in machine learning at Stanford University
Jian Zhang
Experiment about Deep Person Re-identification with EfficientNet-v2

We evaluated the baseline with Resnet50 and Efficienet-v2 without using pretrained models. Also Resnet50-IBN-A and Efficientnet-v2 using pretrained on ImageNet. We used two datasets: Market-1501 and

lan.nguyen2k 77 Jan 03, 2023
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
Functional deep learning

Pipeline abstractions for deep learning. Full documentation here: https://lf1-io.github.io/padl/ PADL: is a pipeline builder for PyTorch. may be used

LF1 101 Nov 09, 2022
This folder contains the python code of UR5E's advanced forward kinematics model.

This folder contains the python code of UR5E's advanced forward kinematics model. By entering the angle of the joint of UR5e, the detailed coordinates of up to 48 points around the robot arm can be c

Qiang Wang 4 Sep 17, 2022
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

DRSAN A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution Karam Park, Jae Woong Soh, and Nam Ik Cho Environments U

4 May 10, 2022
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Discriminative Sounding Objects Localization Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovis

51 Dec 11, 2022
Some experiments with tennis player aging curves using Hilbert space GPs in PyMC. Only experimental for now.

NOTE: This is still being developed! Setup notes This document uses Jeff Sackmann's tennis data. You can obtain it as follows: git clone https://githu

Martin Ingram 1 Jan 20, 2022
This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Ditch the Gold Standard: Re-evaluating Conversational Question Answering This is the repository for our paper Ditch the Gold Standard: Re-evaluating C

Princeton Natural Language Processing 38 Dec 16, 2022
ML model to classify between cats and dogs

Cats-and-dogs-classifier This is my first ML model which can classify between cats and dogs. Here the accuracy is around 75%, however , the accuracy c

Sharath V 4 Aug 20, 2021
Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Text Summarization WCN — Weighted Contextual N-gram method for evaluation of Text Summarization In this project, I fine tune T5 model on Extreme Summa

Aditya Shah 1 Jan 03, 2022
a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version

pytorch-unflow This is a personal reimplementation of UnFlow [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 134 Nov 20, 2022
Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

Facebook Research 1k Jan 08, 2023
A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

Benedek Rozemberczki 1.9k Jan 07, 2023
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CAC) Xin Lai*, Zhuotao Tian*, Li Jiang, Shu Liu, Hengshuang Zhao, Li

DV Lab 137 Dec 14, 2022
Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

AmbientGAN: Generative models from lossy measurements This repository provides code to reproduce results from the paper AmbientGAN: Generative models

Ashish Bora 87 Oct 19, 2022
Python implementation of "Elliptic Fourier Features of a Closed Contour"

PyEFD An Python/NumPy implementation of a method for approximating a contour with a Fourier series, as described in [1]. Installation pip install pyef

Henrik Blidh 71 Dec 09, 2022
This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Deep Conditional Gaussian Mixture Model for Constrained Clustering. This repository holds the code for the paper Deep Conditional Gaussian Mixture Mod

17 Oct 30, 2022
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022
SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

[arXiv] The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, wh

ZOZO, Inc. 138 Nov 24, 2022
Implementation of FitVid video prediction model in JAX/Flax.

FitVid Video Prediction Model Implementation of FitVid video prediction model in JAX/Flax. If you find this code useful, please cite it in your paper:

Google Research 62 Nov 25, 2022