Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Last update: Mar 14, 2022

Overview

Swin-Transformer

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows. For more details, please refer to "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

This repo is an implementation of MegEngine version Swin-Transformer. This is also a showcase for training on GPU with less memory by leveraging MegEngine DTR technique.

There is also an official PyTorch implementation.

Usage

Install

Clone this repo:

git clone https://github.com/MegEngine/swin-transformer.git
cd swin-transformer

Install megengine==1.6.0

pip3 install megengine==1.6.0 -f https://megengine.org.cn/whl/mge.html

Training

To train a Swin Transformer using random data, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> train_random.py

To train a Swin Transformer using AMP (Auto Mix Precision), run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --mode mp train_random.py

To train a Swin Transformer using DTR in dynamic graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --dtr [--dtr-thd <eviction-threshold-of-dtr>] train_random.py

To train a Swin Transformer using DTR in static graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --trace --symbolic --dtr --dtr-thd <eviction-threshold-of-dtr> train_random.py

For example, to train a Swin Transformer with a single GPU using DTR in static graph mode with threshold=8GB and AMP, run:

python3 -n 1 -b 340 -s 10 --trace --symbolic --dtr --dtr-thd 8 --mode mp train_random.py

For more usage, run:

python3 train_random.py -h

Benchmark

Testing Devices
- 2080Ti @ cuda-10.1-cudnn-v7.6.3-TensorRT-5.1.5.0 @ Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- Reserve all CUDA memory by setting MGB_CUDA_RESERVE_MEMORY=1, in order to alleviate memory fragmentation problem

Settings	Maximum Batch Size	Speed(s/step)	Throughput(images/s)
None	68	0.490	139
AMP	100	0.494	202
DTR in static graph mode	300	2.592	116
DTR in static graph mode + AMP	340	1.944	175

Acknowledgement

We are inspired by the Swin-Transformer repository, many thanks to microsoft!

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Related tags

Overview

Swin-Transformer

Usage

Install

Training

Benchmark

Acknowledgement

Owner

旷视天元 MegEngine

ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Main Results on ImageNet with Pretrained Models

An implementation of IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Bravia core script for python

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

Offline Reinforcement Learning with Implicit Q-Learning

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Retinal vessel segmentation based on GT-UNet

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

pytorch implementation of fast-neural-style

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

PyTorch implementation of the wavelet analysis from Torrence & Compo

Framework for evaluating ANNS algorithms on billion scale datasets.

A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Algebraic effect handlers in Python