PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

Last update: Oct 09, 2022

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers. Codea and Models will be available soon.

Dynamic Token Normalization

We design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.

Comparisons of top-1 accuracies on the validation set of ImageNet, by using ViT trained with LN and DTN.

Model	Top-1	Top-5
ViT-T*-LN	72.3	91.4
ViT-T*-DTN	73.2	91.7
ViT-S*-LN	80.6	95.2
ViT-S*-DTN	81.7	95.8
ViT-B*-LN	81.7	95.8
ViT-B*-DTN	82.5	96.1

Getting Started

Install PyTorch

Clone the repo:

git clone https://github.com/dtn-anonymous/DTN.git

Requirements

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.2:

pip install timm==0.3.2

Data Preparation

Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.

Training a model from scratch

An example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN,

cd DTN/scripts   
sh train.sh layer vit_norm_s_star configs/ViT/vit.yaml

Number of GPUs and configuration file to use can be modified in train.sh

PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization

Getting Started

Requirements

Data Preparation

Training a model from scratch

Owner

Wenqi Shao

Clustering with variational Bayes and population Monte Carlo

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Clockwork Convnets for Video Semantic Segmentation

Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers

Control-Raspberry-Pi-Robot-using-Hand-Gestures - A 4WD Robot car based on Raspberry Pi that controlled by hand gestures(using openCV and mediapipe)

A simple Rock-Paper-Scissors game using CV in python

O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

Detecting drunk people through thermal images using Deep Learning (CNN)

Large dataset storage format for Pytorch

Rank 3 : Source code for OPPO 6G Data Generation Challenge

DeepLearning Anomalies Detection with Bluetooth Sensor Data

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

This is a demo app to be used in the video streaming applications

A hifiasm fork for metagenome assembly using Hifi reads.

Implementation of paper "DCS-Net: Deep Complex Subtractive Neural Network for Monaural Speech Enhancement"

Source code for ZePHyR: Zero-shot Pose Hypothesis Rating @ ICRA 2021

Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

An executor that performs image segmentation on fashion items

High frequency AI based algorithmic trading module.

Research code for Arxiv paper "Camera Motion Agnostic 3D Human Pose Estimation"