Deep Residual Networks with 1K Layers

By Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.

Microsoft Research Asia (MSRA).

Introduction
Notes
Usage

Introduction

This repository contains re-implemented code for the paper "Identity Mappings in Deep Residual Networks" (http://arxiv.org/abs/1603.05027). This work enables training quality 1k-layer neural networks in a super simple way.

Acknowledgement: This code is re-implemented by Xiang Ming from Xi'an Jiaotong Univeristy for the ease of release.

Seel Also: Re-implementations of ResNet-200 [a] on ImageNet from Facebook AI Research (FAIR): https://github.com/facebook/fb.resnet.torch/tree/master/pretrained

Notes

This code is based on the implementation of Torch ResNets (https://github.com/facebook/fb.resnet.torch).
The experiments in the paper were conducted in Caffe, whereas this code is re-implemented in Torch. We observed similar results within reasonable statistical variations.
To fit the 1k-layer models into memory without modifying much code, we simply reduced the mini-batch size to 64, noting that results in the paper were obtained with a mini-batch size of 128. Less expectedly, the results with the mini-batch size of 64 are slightly better:

mini-batch CIFAR-10 test error (%): (median (mean+/-std))

128 (as in [a]) 4.92 (4.89+/-0.14)

64 (as in this code) 4.62 (4.69+/-0.20)
Curves obtained by running this code with a mini-batch size of 64 (training loss: y-axis on the left; test error: y-axis on the right):

mini-batch	CIFAR-10 test error (%): (median (mean+/-std))
128 (as in [a])	4.92 (4.89+/-0.14)
64 (as in this code)	4.62 (4.69+/-0.20)

Usage

Install Torch ResNets (https://github.com/facebook/fb.resnet.torch) following instructions therein.
Add the file resnet-pre-act.lua from this repository to ./models.
To train ResNet-1001 as of the form in [a]:

th main.lua -netType resnet-pre-act -depth 1001 -batchSize 64 -nGPU 2 -nThreads 4 -dataset cifar10 -nEpochs 200 -shareGradInput false

Note: ``shareGradInput=true'' is not valid for this model yet.

Deep Residual Networks with 1K Layers

Related tags

Overview

Deep Residual Networks with 1K Layers

Table of Contents

Introduction

Notes

Usage

Owner

Kaiming He

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Demonstration of transfer of knowledge and generalization with distillation

Implementation of Online Label Smoothing in PyTorch

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

This code is an unofficial implementation of HiFiSinger.

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy

High performance distributed framework for training deep learning recommendation models based on PyTorch.

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Improving Machine Translation Systems via Isotopic Replacement

Code for the paper "Location-aware Single Image Reflection Removal"

Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones.

The official homepage of the (outdated) COCO-Stuff 10K dataset.

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

PyTorch implementation of normalizing flow models

Brain tumor detection using CNN (InceptionResNetV2 Model)