Pre-trained NFNets with 99% of the accuracy of the official paper

Overview

NFNet Pytorch Implementation

This repo contains pretrained NFNet models F0-F6 with high ImageNet accuracy from the paper High-Performance Large-Scale Image Recognition Without Normalization. The small models are as accurate as an EfficientNet-B7, but train 8.7 times faster. The large models set a new SOTA top-1 accuracy on ImageNet.

NFNet F0 F1 F2 F3 F4 F5 F6+SAM
Top-1 accuracy Brock et al. 83.6 84.7 85.1 85.7 85.9 86.0 86.5
Top-1 accuracy this implementation 82.82 84.63 84.90 85.46 85.66 85.62 TBD

All credits go to the authors of the original paper. This repo is heavily inspired by their nice JAX implementation in the official repository. Visit their repo for citing.

Get started

git clone https://github.com/benjs/nfnets_pytorch.git
pip3 install -r requirements.txt

Download pretrained weights from the official repository and place them in the pretrained folder.

from pretrained import pretrained_nfnet
model_F0 = pretrained_nfnet('pretrained/F0_haiku.npz')
model_F1 = pretrained_nfnet('pretrained/F1_haiku.npz')
# ...

The model variant is automatically derived from the parameter count in the pretrained weights file.

Validate yourself

python3 eval.py --pretrained pretrained/F0_haiku.npz --dataset path/to/imagenet/valset/

You can download the ImageNet validation set from the ILSVRC2012 challenge site after asking for access with, for instance, your .edu mail address.

Scaled weight standardization convolutions in your own model

Simply replace all your nn.Conv2d with WSConv2D and all your nn.ReLU with VPReLU or VPGELU (variance preserving ReLU/GELU).

import torch.nn as nn
from model import WSConv2D, VPReLU, VPGELU

# Simply replace your nn.Conv2d layers
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
 
        self.activation = VPReLU(inplace=True) # or VPGELU
        self.conv0 = WSConv2D(in_channels=128, out_channels=256, kernel_size=1, ...)
        # ...

    def forward(self, x):
      out = self.activation(self.conv0(x))
      # ...

SGD with adaptive gradient clipping in your own model

Simply replace your SGD optimizer with SGD_AGC.

from optim import SGD_AGC

optimizer = SGD_AGC(
        named_params=model.named_parameters(), # Pass named parameters
        lr=1e-3,
        momentum=0.9,
        clipping=0.1, # New clipping parameter
        weight_decay=2e-5, 
        nesterov=True)

It is important to exclude certain layers from clipping or momentum. The authors recommends to exclude the last fully convolutional from clipping and the bias/gain parameters from weight decay:

import re

for group in optimizer.param_groups:
    name = group['name'] 
    
    # Exclude from weight decay
    if len(re.findall('stem.*(bias|gain)|conv.*(bias|gain)|skip_gain', name)) > 0:
        group['weight_decay'] = 0

    # Exclude from clipping
    if name.startswith('linear'):
        group['clipping'] = None

Train your own NFNet

Adjust your desired parameters in default_config.yaml and start training.

python3 train.py --dataset /path/to/imagenet/

There is still some parts missing for complete training from scratch:

  • Multi-GPU training
  • Data augmentations
  • FP16 activations and gradients

Contribute

The implementation is still in an early stage in terms of usability / testing. If you have an idea to improve this repo open an issue, start a discussion or submit a pull request.

Development status

  • Pre-trained NFNet Models
    • F0-F5
    • F6+SAM
    • Scaled weight standardization
    • Squeeze and excite
    • Stochastic depth
    • FP16 activations
  • SGD with unit adaptive gradient clipping (SGD-AGC)
    • Exclude certain layers from weight-decay, clipping
    • FP16 gradients
  • PyPI package
  • PyTorch hub submission
  • Label smoothing loss from Szegedy et al.
  • Training on ImageNet
  • Pre-trained weights
  • Tensorboard support
  • general usability improvements
  • Multi-GPU support
  • Data augmentation
  • Signal propagation plots (from first paper)
Comments
  • ModuleNotFoundError: No module named 'haiku'

    ModuleNotFoundError: No module named 'haiku'

    when i try "python3 eval.py --pretrained pretrained/F0_haiku.npz --dataset ***" i got this error, have you ever met this error? how to fix this?

    opened by Rianusr 2
  • Trained without data augmentation?

    Trained without data augmentation?

    Thanks for the great work on the pytorch implementation of NFNet! The accuracies achieved by this implementation are pretty impressive also and I am wondering if these training results were simply derived from the training script, that is, without data augmentation.

    opened by nandi-zhang 2
  • from_pretrained_haiku

    from_pretrained_haiku

    https://github.com/benjs/nfnets_pytorch/blob/7b4d1cc701c7de4ee273ded01ce21cbdb1e60c48/nfnets/pretrained.py#L90

    model = from_pretrained_haiku(args.pretrained)

    where is 'from_pretrained_haiku' method?

    opened by vkmavani 0
  • About WSconv2d

    About WSconv2d

    I see the authoe's code, I find his WSconv2d pad_mod is 'same'. Pytorch's conv2d dono't have pad_mode, and I think your padding should greater 0, but I find your padding always be 0. I want to know why?

    I see you train.py your learning rate is constant, why? Thank you!

    opened by fancyshun 3
  • AveragePool

    AveragePool

    Hi, noticed that the AveragePool ('pool' layer) is not used in forward function. Instead, forward uses torch.mean. Removing the layer doesn't change pooling behavior. I tried using this model as a feature extractor and was a bit confused for a moment.

    opened by bogdankjastrzebski 1
Releases(v0.0.1)
Owner
Benjamin Schmidt
Engineering Student
Benjamin Schmidt
Image Completion with Deep Learning in TensorFlow

Image Completion with Deep Learning in TensorFlow See my blog post for more details and usage instructions. This repository implements Raymond Yeh and

Brandon Amos 1.3k Dec 23, 2022
Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

AmbientGAN: Generative models from lossy measurements This repository provides code to reproduce results from the paper AmbientGAN: Generative models

Ashish Bora 87 Oct 19, 2022
Flask101 - FullStack Web Development with Python & JS - From TAQWA

Task: Create a CLI Calculator Step 0: Creating Virtual Environment $ python -m

Hossain Foysal 1 May 31, 2022
(AAAI2022) Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation

SM-PPM This is a Pytorch implementation of our paper "Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Seman

W-zx-Y 10 Dec 07, 2022
Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 08, 2023
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
True Few-Shot Learning with Language Models

This codebase supports using language models (LMs) for true few-shot learning: learning to perform a task using a limited number of examples from a single task distribution.

Ethan Perez 124 Jan 04, 2023
Experiments for Neural Flows paper

Neural Flows: Efficient Alternative to Neural ODEs [arxiv] TL;DR: We directly model the neural ODE solutions with neural flows, which is much faster a

54 Dec 07, 2022
Example of semantic segmentation in Keras

keras-semantic-segmentation-example Example of semantic segmentation in Keras Single class example: Generated data: random ellipse with random color o

53 Mar 23, 2022
Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Triangle Multiplicative Module - Pytorch Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or c

Phil Wang 22 Oct 28, 2022
[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22) Preview version paper of this work is available at: https://arxiv.or

Xiaohao Xu 70 Dec 04, 2022
An Artificial Intelligence trying to drive a car by itself on a user created map

An Artificial Intelligence trying to drive a car by itself on a user created map

Akhil Sahukaru 17 Jan 13, 2022
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

Manipulator Learning This repository contains a set of manipulation environments that are compatible with OpenAI Gym and simulated in pybullet. In par

STARS Laboratory 5 Dec 08, 2022
TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture.

A Brain Tumor Detection and Classification Model built using RESNET50 architecture. The model is also deployed as a web application using Flask framework.

Pranav Khurana 0 Aug 17, 2021
Code & Data for Enhancing Photorealism Enhancement

Code & Data for Enhancing Photorealism Enhancement

Intel ISL (Intel Intelligent Systems Lab) 1.1k Jan 08, 2023
Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud This repository contains a reference implementation of our Part-Aware Data Augment

Jaeseok Choi 62 Jan 03, 2023
Implements Stacked-RNN in numpy and torch with manual forward and backward functions

Recurrent Neural Networks Implements simple recurrent network and a stacked recurrent network in numpy and torch respectively. Both flavours implement

Vishal R 1 Nov 16, 2021
CellRank's reproducibility repository.

CellRank's reproducibility repository We believe that reproducibility is key and have made it as simple as possible to reproduce our results. Please e

Theis Lab 8 Oct 08, 2022
Dialect classification

Dialect-Classification This repository presents the data that was used in a talk at ICKL-5 (5th International Conference on Kurdish Linguistics) at th

Kurdish-BLARK 0 Nov 12, 2021
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022