Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Related tags

Deep LearningbigBatch
Overview

Train longer, generalize better - Big batch training

This is a code repository used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks" By Elad Hoffer, Itay Hubara and Daniel Soudry.

It is based off convNet.pytorch with some helpful options such as:

  • Training on several datasets
  • Complete logging of trained experiment
  • Graph visualization of the training/validation loss and accuracy
  • Definition of preprocessing and optimization regime for each model

Dependencies

Data

  • Configure your dataset path at data.py.
  • To get the ILSVRC data, you should register on their site for access: http://www.image-net.org/

Experiment examples

python main_normal.py --dataset cifar10 --model resnet --save cifar10_resnet44_bs2048_lr_fix --epochs 100 --b 2048 --lr_bb_fix;
python main_normal.py --dataset cifar10 --model resnet --save cifar10_resnet44_bs2048_regime_adaptation --epochs 100 --b 2048 --lr_bb_fix --regime_bb_fix;
python main_gbn.py --dataset cifar10 --model resnet --save cifar10_resnet44_bs2048_ghost_bn256 --epochs 100 --b 2048 --lr_bb_fix --mini-batch-size 256;
python main_normal.py --dataset cifar100 --model resnet --save cifar100_wresnet16_4_bs1024_regime_adaptation --epochs 100 --b 1024 --lr_bb_fix --regime_bb_fix;
python main_gbn.py --model mnist_f1 --dataset mnist --save mnist_baseline_bs4096_gbn --epochs 50 --b 4096 --lr_bb_fix --no-regime_bb_fix --mini-batch-size 128;
  • See run_experiments.sh for more examples

Model configuration

Network model is defined by writing a .py file in models folder, and selecting it using the model flag. Model function must be registered in models/__init__.py The model function must return a trainable network. It can also specify additional training options such optimization regime (either a dictionary or a function), and input transform modifications.

e.g for a model definition:

class Model(nn.Module):

    def __init__(self, num_classes=1000):
        super(Model, self).__init__()
        self.model = nn.Sequential(...)

        self.regime = {
            0: {'optimizer': 'SGD', 'lr': 1e-2,
                'weight_decay': 5e-4, 'momentum': 0.9},
            15: {'lr': 1e-3, 'weight_decay': 0}
        }

        self.input_transform = {
            'train': transforms.Compose([...]),
            'eval': transforms.Compose([...])
        }
    def forward(self, inputs):
        return self.model(inputs)

 def model(**kwargs):
        return Model()
Owner
Elad Hoffer
Elad Hoffer
A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

mocap4face by Facemoji mocap4face by Facemoji is a free, multiplatform SDK for real-time facial motion capture based on Facial Action Coding System or

Facemoji 591 Dec 27, 2022
Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

Unofficial pytorch implementation of the paper "Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective"

16 Nov 21, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis This is a PyTorch implementation of the model described in our pape

qzhb 6 Jul 08, 2021
Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

taganomaly Anomaly detection labeling tool, specifically for multiple time series (one time series per category). Taganomaly is a tool for creating la

Microsoft 272 Dec 17, 2022
ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

ViViT is a collection of numerical tricks to efficiently access curvature from the generalized Gauss-Newton (GGN) matrix based on its low-rank structure. Provided functionality includes computing

Felix Dangel 12 Dec 08, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 34 Jan 19, 2022
Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Xiangyin Kong 7 Nov 08, 2022
PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

pix2pix-pytorch PyTorch implementation of Image-to-Image Translation Using Conditional Adversarial Networks. Based on pix2pix by Phillip Isola et al.

mrzhu 383 Dec 17, 2022
Cowsay - A rewrite of cowsay in python

Python Cowsay A rewrite of cowsay in python. Allows for parsing of existing .cow

James Ansley 3 Jun 27, 2022
HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images

HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images Histological Image Segmentation This

Saad Wazir 11 Dec 16, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
Final term project for Bayesian Machine Learning Lecture (XAI-623)

Mixquality_AL Final Term Project For Bayesian Machine Learning Lecture (XAI-623) Youtube Link The presentation is given in YoutubeLink Problem Formula

JeongEun Park 3 Jan 18, 2022
Migration of Edge-based Distributed Federated Learning

FedFly: Towards Migration in Edge-based Distributed Federated Learning About the research Due to mobility, a device participating in Federated Learnin

qub-blesson 11 Nov 13, 2022
Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction Requirements The code has been tested running under Python 3.7.4, with the foll

zshicode 84 Jan 01, 2023
python library for invisible image watermark (blind image watermark)

invisible-watermark invisible-watermark is a python library and command line tool for creating invisible watermark over image.(aka. blink image waterm

Shield Mountain 572 Jan 07, 2023
LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Improving Object Detection by Estimating Bounding Box Quality Accurately Abstract Object detection aims to locate and classify object instances in ima

IM Lab., POSTECH 0 Sep 28, 2022
The project was to detect traffic signs, based on the Megengine framework.

trafficsign 赛题 旷视AI智慧交通开源赛道,初赛1/177,复赛1/12。 本赛题为复杂场景的交通标志检测,对五种交通标志进行识别。 框架 megengine 算法方案 网络框架 atss + resnext101_32x8d 训练阶段 图片尺寸 最终提交版本输入图片尺寸为(1500,2

20 Dec 02, 2022
Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning. Users or developers can query Unicorn for a performance task.

AISys Lab 27 Jan 05, 2023
CCP dataset from Clothing Co-Parsing by Joint Image Segmentation and Labeling

Clothing Co-Parsing (CCP) Dataset Clothing Co-Parsing (CCP) dataset is a new clothing database including elaborately annotated clothing items. 2, 098

Wei Yang 434 Dec 24, 2022