Aggragrating Nested Transformer Official Jax Implementation

Last update: Dec 20, 2022

Overview

Aggragrating Nested Transformer Official Jax Implementation

NesT is a simple method, which aggragrates nested local transformers on image blocks. The idea makes vision transformers attain better accuracy, data efficiency, and convergence on the ImageNet benchmark. NesT can be scaled to small datasets to match convnet accuracy.

This is not an officially supported Google product.

Pretrained Models and Results

Model	Accuracy	Checkpoint path
Nest-B	83.8	gs://gresearch/nest-checkpoints/nest-b_imagenet
Nest-S	83.3	gs://gresearch/nest-checkpoints/nest-s_imagenet
Nest-T	81.5	gs://gresearch/nest-checkpoints/nest-t_imagenet

Note: Accuracy is evaluated on the ImageNet2012 validation set.

Tensorbord.dev

See ImageNet training logs at Tensorboard.dev.

Colab

Colab is available for test: https://colab.sandbox.google.com/github/google-research/nested-transformer/blob/main/colab.ipynb

Instruction on Image Classification

Environment setup

virtualenv -p python3 --system-site-packages nestenv
source nestenv/bin/activate

pip install -r requirements.txt

Evaluate on ImageNet

At the first time, download ImageNet following tensorflow_datasets instruction from command lines. Optionally, download all pre-trained checkpoints

bash ./checkpoints/download_checkpoints.sh

Run the evaluation script to evaluate NesT-B.

python main.py --config configs/imagenet_nest.py --config.eval_only=True \
  --config.init_checkpoint="./checkpoints/nest-b_imagenet/ckpt.39" \
  --workdir="./checkpoints/nest-t_imagenet_eval"

Train on ImageNet

The default configuration trains NesT-B on TPUv2 8x8 with per device batch size 16.

python main.py --config configs/imagenet_nest.py --jax_backend_target=<TPU_IP_ADDRESS> --jax_xla_backend="tpu_driver" --workdir="./checkpoints/nest-b_imagenet"

Note: See jax/cloud_tpu_colab for info about TPU_IP_ADDRESS.

Train NesT-T on 8 GPUs.

python main.py --config configs/imagenet_nest_tiny.py --workdir="./checkpoints/nest-t_imagenet_8gpu"

The codebase does not support multi-node GPU training (>8 GPUs). The models reported in our paper is trained using TPU with 1024 total batch size.

Train on CIFAR

# Recommend to train on 2 GPUs. Training NesT-T can use 1 GPU.
CUDA_VISIBLE_DEVICES=0,1 python  main.py --config configs/cifar_nest.py --workdir="./checkpoints/nest_cifar"

Cite

@inproceedings{zhang2021aggregating,
  title={Aggregating Nested Transformers},
  author={Zizhao Zhang and Han Zhang and Long Zhao and Ting Chen and Tomas Pfister},
  booktitle={arXiv preprint arXiv:2105.12723},
  year={2021}
}

Aggragrating Nested Transformer Official Jax Implementation

Related tags

Overview

Aggragrating Nested Transformer Official Jax Implementation

Pretrained Models and Results

Tensorbord.dev

Colab

Instruction on Image Classification

Environment setup

Evaluate on ImageNet

Train on ImageNet

Train NesT-T on 8 GPUs.

Train on CIFAR

Cite

Owner

Google Research

A community run, 5-day PyTorch Deep Learning Bootcamp

Adabelief-Optimizer - Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

GANfolk: Using AI to create portraits of fictional people to sell as NFTs

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

Face recognition. Redefined.

Bulk2Space is a spatial deconvolution method based on deep learning frameworks

Simulation of moving particles under microscopic imaging

This repository accompanies the ACM TOIS paper "What can I cook with these ingredients?" - Understanding cooking-related information needs in conversational search

Semantic Segmentation for Aerial Imagery using Convolutional Neural Network

Code of Puregaze: Purifying gaze feature for generalizable gaze estimation, AAAI 2022.

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

A Python framework for conversational search

MMFlow is an open source optical flow toolbox based on PyTorch

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Betafold - AlphaFold with tunings

This repository contains the code and models for the following paper.

RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks