Training Very Deep Neural Networks Without Skip-Connections

Overview

DiracNets

v2 update (January 2018):

The code was updated for DiracNets-v2 in which we removed NCReLU by adding per-channel a and b multipliers without weight decay. This allowed us to significantly simplify the network, which is now folds into a simple chain of convolution-ReLU layers, like VGG. On ImageNet DiracNet-18 and DiracNet-34 closely match corresponding ResNet with the same number of parameters.

See v1 branch for DiracNet-v1.


PyTorch code and models for DiracNets: Training Very Deep Neural Networks Without Skip-Connections

https://arxiv.org/abs/1706.00388

Networks with skip-connections like ResNet show excellent performance in image recognition benchmarks, but do not benefit from increased depth, we are thus still interested in learning actually deep representations, and the benefits they could bring. We propose a simple weight parameterization, which improves training of deep plain (without skip-connections) networks, and allows training plain networks with hundreds of layers. Accuracy of our proposed DiracNets is close to Wide ResNet (although DiracNets need more parameters to achieve it), and we are able to match ResNet-1000 accuracy with plain DiracNet with only 28 layers. Also, the proposed Dirac weight parameterization can be folded into one filter for inference, leading to easily interpretable VGG-like network.

DiracNets on ImageNet:

TL;DR

In a nutshell, Dirac parameterization is a sum of filters and scaled Dirac delta function:

conv2d(x, alpha * delta + W)

Here is simplified PyTorch-like pseudocode for the function we use to train plain DiracNets (with weight normalization):

def dirac_conv2d(input, W, alpha, beta)
    return F.conv2d(input, alpha * dirac(W) + beta * normalize(W))

where alpha and beta are per-channel scaling multipliers, and normalize does l_2 normalization over each feature plane.

Code

Code structure:

├── README.md # this file
├── diracconv.py # modular DiracConv definitions
├── test.py # unit tests
├── diracnet-export.ipynb # ImageNet pretrained models
├── diracnet.py # functional model definitions
└── train.py # CIFAR and ImageNet training code

Requirements

First install PyTorch, then install torchnet:

pip install git+https://github.com/pytorch/[email protected]

Install other Python packages:

pip install -r requirements.txt

To train DiracNet-34-2 on CIFAR do:

python train.py --save ./logs/diracnets_$RANDOM$RANDOM --depth 34 --width 2

To train DiracNet-18 on ImageNet do:

python train.py --dataroot ~/ILSVRC2012/ --dataset ImageNet --depth 18 --save ./logs/diracnet_$RANDOM$RANDOM \
                --batchSize 256 --epoch_step [30,60,90] --epochs 100 --weightDecay 0.0001 --lr_decay_ratio 0.1

nn.Module code

We provide DiracConv1d, DiracConv2d, DiracConv3d, which work like nn.Conv1d, nn.Conv2d, nn.Conv3d, but have Dirac-parametrization inside (our training code doesn't use these modules though).

Pretrained models

We fold batch normalization and Dirac parameterization into F.conv2d weight and bias tensors for simplicity. Resulting models are as simple as VGG or AlexNet, having only nonlinearity+conv2d as a basic block.

See diracnets.ipynb for functional and modular model definitions.

There is also folded DiracNet definition in diracnet.py, which uses code from PyTorch model_zoo and downloads pretrained model from Amazon S3:

from diracnet import diracnet18
model = diracnet18(pretrained=True)

Printout of the model above:

DiracNet(
  (features): Sequential(
    (conv): Conv2d (3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
    (max_pool0): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), ceil_mode=False)
    (group0.block0.relu): ReLU()
    (group0.block0.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block1.relu): ReLU()
    (group0.block1.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block2.relu): ReLU()
    (group0.block2.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block3.relu): ReLU()
    (group0.block3.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group1.block0.relu): ReLU()
    (group1.block0.conv): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block1.relu): ReLU()
    (group1.block1.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block2.relu): ReLU()
    (group1.block2.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block3.relu): ReLU()
    (group1.block3.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group2.block0.relu): ReLU()
    (group2.block0.conv): Conv2d (128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block1.relu): ReLU()
    (group2.block1.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block2.relu): ReLU()
    (group2.block2.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block3.relu): ReLU()
    (group2.block3.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group3.block0.relu): ReLU()
    (group3.block0.conv): Conv2d (256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block1.relu): ReLU()
    (group3.block1.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block2.relu): ReLU()
    (group3.block2.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block3.relu): ReLU()
    (group3.block3.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (last_relu): ReLU()
    (avg_pool): AvgPool2d(kernel_size=7, stride=7, padding=0, ceil_mode=False, count_include_pad=True)
  )
  (fc): Linear(in_features=512, out_features=1000)
)

The models were trained with OpenCV, so you need to use it too to reproduce stated accuracy.

Pretrained weights for DiracNet-18 and DiracNet-34:
https://s3.amazonaws.com/modelzoo-networks/diracnet18v2folded-a2174e15.pth
https://s3.amazonaws.com/modelzoo-networks/diracnet34v2folded-dfb15d34.pth

Pretrained weights for the original (not folded) model, functional definition only:
https://s3.amazonaws.com/modelzoo-networks/diracnet18-v2_checkpoint.pth
https://s3.amazonaws.com/modelzoo-networks/diracnet34-v2_checkpoint.pth

We plan to add more pretrained models later.

Bibtex

@inproceedings{Zagoruyko2017diracnets,
    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {DiracNets: Training Very Deep Neural Networks Without Skip-Connections},
    url = {https://arxiv.org/abs/1706.00388},
    year = {2017}}
GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

GAN-STEM-Conv2MultiSlice GAN method to help covert lower resolution STEM images generated by convolution methods to higher resolution STEM images gene

UW-Madison Computational Materials Group 2 Feb 10, 2021
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Mindee 1.5k Jan 01, 2023
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 01, 2023
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

148 Dec 30, 2022
This is the paddle code for SeBoW(Self-Born wiring for neural trees), a kind of neural tree born form a large search space

SeBoW: Self-Born Wiring for neural trees(PaddlePaddle version) This is the paddle code for SeBoW(Self-Born wiring for neural trees), a kind of neural

HollyLee 13 Dec 08, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
Place holder for HOPE: a human-centric and task-oriented MT evaluation framework using professional post-editing

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation Place holder for dat

Lifeng Han 1 Apr 25, 2022
This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Hansheng Jiang 6 Nov 18, 2022
DETReg: Unsupervised Pretraining with Region Priors for Object Detection

DETReg: Unsupervised Pretraining with Region Priors for Object Detection Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik

Amir Bar 283 Dec 27, 2022
TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Parameterization of Hypercomplex Multiplications (PHM) This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex

Aston Zhang 9 Oct 26, 2022
Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification

Less is More: Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification Suncheng Xiang Shanghai Jiao Tong University Over

SunchengXiang 68 Dec 13, 2022
[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

CrowdNav with Social-NCE This is an official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations by

VITA lab at EPFL 125 Dec 23, 2022
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023
Edge Restoration Quality Assessment

ERQA - Edge Restoration Quality Assessment ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR

MSU Video Group 27 Dec 17, 2022
Code for ICLR2018 paper: Improving GAN Training via Binarized Representation Entropy (BRE) Regularization - Y. Cao · W Ding · Y.C. Lui · R. Huang

code for "Improving GAN Training via Binarized Representation Entropy (BRE) Regularization" (ICLR2018 paper) paper: https://arxiv.org/abs/1805.03644 G

21 Oct 12, 2020
Weakly-supervised object detection.

Wetectron Wetectron is a software system that implements state-of-the-art weakly-supervised object detection algorithms. Project CVPR'20, ECCV'20 | Pa

NVIDIA Research Projects 342 Jan 05, 2023
Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.

Cut-Thumbnail (Accepted at ACM MULTIMEDIA 2021) Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng, Tao Zhou, Ming Liu This is the officia

3 Apr 12, 2022
Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

RARE Technologies 13.8k Jan 03, 2023