A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

Overview

ffcv ImageNet Training

A minimal, single-file PyTorch ImageNet training script designed for hackability. Run train_imagenet.py to get...

  • ...high accuracies on ImageNet
  • ...with as many lines of code as the PyTorch ImageNet example
  • ...in 1/10th the time.

Results

Train models more efficiently, either with 8 GPUs in parallel or by training 8 ResNet-18's at once.

See benchmark setup here: https://docs.ffcv.io/benchmarks.html.

Citation

If you use this setup in your research, cite:

@misc{leclerc2022ffcv,
    author = {Guillaume Leclerc and Andrew Ilyas and Logan Engstrom and Sung Min Park and Hadi Salman and Aleksander Madry},
    title = {ffcv},
    year = {2022},
    howpublished = {\url{https://github.com/libffcv/ffcv/}},
    note = {commit xxxxxxx}
}

Configurations

The configuration files corresponding to the above results are:

Link to Config top_1 top_5 # Epochs Time (mins) Architecture Setup
Link 0.784 0.941 88 77.2 ResNet-50 8 x A100
Link 0.780 0.937 56 49.4 ResNet-50 8 x A100
Link 0.772 0.932 40 35.6 ResNet-50 8 x A100
Link 0.766 0.927 32 28.7 ResNet-50 8 x A100
Link 0.756 0.921 24 21.7 ResNet-50 8 x A100
Link 0.738 0.908 16 14.9 ResNet-50 8 x A100
Link 0.724 0.903 88 187.3 ResNet-18 1 x A100
Link 0.713 0.899 56 119.4 ResNet-18 1 x A100
Link 0.706 0.894 40 85.5 ResNet-18 1 x A100
Link 0.700 0.889 32 68.9 ResNet-18 1 x A100
Link 0.688 0.881 24 51.6 ResNet-18 1 x A100
Link 0.669 0.868 16 35.0 ResNet-18 1 x A100

Training Models

First pip install the requirements file in this directory:

pip install -r requirements.txt

Then, generate an ImageNet dataset; make the dataset used for the results above with the following command (IMAGENET_DIR should point to a PyTorch style ImageNet dataset:

# Required environmental variables for the script:
export IMAGENET_DIR=/path/to/pytorch/format/imagenet/directory/
export WRITE_DIR=/your/path/here/

# Starting in the root of the Git repo:
cd examples;

# Serialize images with:
# - 500px side length maximum
# - 50% JPEG encoded, 90% raw pixel values
# - quality=90 JPEGs
./write_dataset.sh 500 0.50 90

Then, choose a configuration from the configuration table. With the config file path in hand, train as follows:

# 8 GPU training (use only 1 for ResNet-18 training)
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

# Set the visible GPUs according to the `world_size` configuration parameter
# Modify `data.in_memory` and `data.num_workers` based on your machine
python train_imagenet.py --config-file rn50_configs/<your config file>.yaml \
    --data.train_dataset=/path/to/train/dataset.ffcv \
    --data.val_dataset=/path/to/val/dataset.ffcv \
    --data.num_workers=12 --data.in_memory=1 \
    --logging.folder=/your/path/here

Adjust the configuration by either changing the passed YAML file or by specifying arguments via fastargs (i.e. how the dataset paths were passed above).

Training Details

System setup. We trained on p4.24xlarge ec2 instances (8 A100s).

Dataset setup. Generally larger side length will aid in accuracy but decrease throughput:

  • ResNet-50 training: 50% JPEG 500px side length
  • ResNet-18 training: 10% JPEG 400px side length

Algorithmic details. We use a standard ImageNet training pipeline (à la the PyTorch ImageNet example) with only the following differences/highlights:

  • SGD optimizer with momentum and weight decay on all non-batchnorm parameters
  • Test-time augmentation over left/right flips
  • Progressive resizing from 160px to 192px: 160px training until 75% of the way through training (by epochs), then 192px until the end of training.
  • Validation set sizing according to "Fixing the train-test resolution discrepancy": 224px at test time.
  • Label smoothing
  • Cyclic learning rate schedule

Refer to the code and configuration files for a more exact specification. To obtain configurations we first gridded for hyperparameters at a 30 epoch schedule. Fixing these parameters, we then varied only the number of epochs (stretching the learning rate schedule across the number of epochs as motivated by Budgeted Training) and plotted the results above.

FAQ

Why is the first epoch slow?

The first epoch can be slow for the first epoch if the dataset hasn't been cached in memory yet.

What if I can't fit my dataset in memory?

See this guide here.

Other questions

Please open up a GitHub discussion for non-bug related questions; if you find a bug please report it on GitHub issues.

Owner
FFCV
FFCV
Vehicle speed detection with python

Vehicle-speed-detection In the project simulate the tracker.py first then simulate the SpeedDetector.py. Finally, a new window pops up and the output

3 Dec 15, 2022
Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

cim-misspelling Pytorch implementation of Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence, CHIL 2022. This model (

Juyong Kim 11 Dec 19, 2022
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
An Implicit Function Theorem (IFT) optimizer for bi-level optimizations

iftopt An Implicit Function Theorem (IFT) optimizer for bi-level optimizations. Requirements Python 3.7+ PyTorch 1.x Installation $ pip install git+ht

The Money Shredder Lab 2 Dec 02, 2021
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022
Config files for my GitHub profile.

Canalyst Candas Data Science Library Name Canalyst Candas Description Built by a former PM / analyst to give anyone with a little bit of Python knowle

Canalyst Candas 13 Jun 24, 2022
The codes I made while I practiced various TensorFlow examples

TensorFlow_Exercises The codes I made while I practiced various TensorFlow examples About the codes I didn't create these codes by myself, but re-crea

Terry Taewoong Um 614 Dec 08, 2022
Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Train neural network for semantic segmentation (deep lab V3) with pytorch in 50 lines of code Train net semantic segmentation net using Trans10K datas

17 Dec 19, 2022
CVPR 2021 Challenge on Super-Resolution Space

Learning the Super-Resolution Space Challenge NTIRE 2021 at CVPR Learning the Super-Resolution Space challenge is held as a part of the 6th edition of

andreas 104 Oct 26, 2022
The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

About The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting The demo program was only tested under Conda in a standard

Anh-Dzung Doan 5 Nov 28, 2022
Hydra Lightning Template for Structured Configs

Hydra Lightning Template for Structured Configs Template for creating projects with pytorch-lightning and hydra. How to use this template? Create your

Model-driven Machine Learning 4 Jul 19, 2022
Code for MarioNette: Self-Supervised Sprite Learning, in NeurIPS 2021

MarioNette | Webpage | Paper | Video MarioNette: Self-Supervised Sprite Learning Dmitriy Smirnov, Michaël Gharbi, Matthew Fisher, Vitor Guizilini, Ale

Dima Smirnov 28 Nov 18, 2022
Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Council-GAN Implementation of our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020) Paper Ori Nizan , Ayellet Tal, Breaking the Cycle

ori nizan 260 Nov 16, 2022
A containerized REST API around OpenAI's CLIP model.

OpenAI's CLIP — REST API This is a container wrapping OpenAI's CLIP model in a RESTful interface. Running the container locally First, build the conta

Santiago Valdarrama 48 Nov 06, 2022
For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

ImgAlign For auto aligning, cropping, and scaling HR and LR images for training image based neural networks Usage Make sure OpenCV is installed, 'pip

15 Dec 04, 2022
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) -- xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
Let Python optimize the best stop loss and take profits for your TradingView strategy.

TradingView Machine Learning TradeView is a free and open source Trading View bot written in Python. It is designed to support all major exchanges. It

Robert Roman 473 Jan 09, 2023
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 210 Dec 21, 2022
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Ranger-Deep-Learning-Optimizer Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead, and now GC (gradient centralization) i

Less Wright 1.1k Dec 21, 2022
计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

PyTorch实现多种计算机视觉中网络设计中用到的Attention机制,还收集了一些即插即用模块。由于能力有限精力有限,可能很多模块并没有包括进来,有任何的建议或者改进,可以提交issue或者进行PR。

PJDong 599 Dec 23, 2022