[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Last update: Nov 26, 2022

Overview

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

Codes for this paper: [CVPR 2022] The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy.

Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang.

Overview

Vision transformers (ViTs) have gained increasing popularity as they are commonly believed to own higher modeling capacity and representation flexibility, than traditional convolutional networks. However, it is questionable whether such potential has been fully unleashed in practice, as the learned ViTs often suffer from over-smoothening, yielding likely redundant models.

Recent works made preliminary attempts to identify and alleviate such redundancy, e.g., via regularizing embedding similarity or re-injecting convolution-like structures. However, a “head-to-toe assessment” regarding the extent of redundancy in ViTs, and how much we could gain by thoroughly mitigating such, has been absent for this field.

This paper, for the first time, systematically studies the ubiquitous existence of redundancy at all three levels: patch embedding, attention map, and weight space. In view of them, we advocate a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information.

Extensive experiments on ImageNet with a number of ViT backbones validate the effectiveness of our proposals, largely eliminating the observed ViT redundancy and significantly boosting the model generalization. For example, our diversified DeiT obtains 0.70% ∼1.76% accuracy boosts on ImageNet with highly reduced similarity.

Prerequisites

Install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch torchvision
pip install timm==0.3.2

Training on ImageNet

./script/run_deit_small_diverse.sh [data/imagenet] (Deit-Small-12layers)
./script/run_deit_small_24layer_diverse.sh [data/imagenet] (Deit-Small-24layers)

Citation

TBD

Acknowledgement

https://github.com/facebookresearch/deit

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Related tags

Overview

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

Overview

Prerequisites

Training on ImageNet

Citation

Acknowledgement

Owner

VITA

An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

Alternatives to Deep Neural Networks for Function Approximations in Finance

Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics

Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI-2022] Official implementations of MCL: Mutual Contrastive Learning for Visual Representation Learning

Distributed Asynchronous Hyperparameter Optimization better than HyperOpt.

A minimalist tool to display a network graph.

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

Starter kit for getting started in the Music Demixing Challenge.

IndoNLI: A Natural Language Inference Dataset for Indonesian

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

An algorithm that handles large-scale aerial photo co-registration, based on SURF, RANSAC and PyTorch autograd.

ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

code for paper -- "Seamless Satellite-image Synthesis"

Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers

Posterior predictive distributions quantify uncertainties ignored by point estimates.