Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

Last update: Dec 12, 2022

Overview

Swin Transformer V2: Scaling Up Capacity and Resolution

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu et al. (Microsoft Research Asia).

This repository includes a pure PyTorch implementation of the Swin Transformer V2.

The official Swin Transformer V1 implementation is available here. Currently (10.01.2022), an official implementation of the Swin Transformer V2 is not publicly available.

Installation

You can simply install the Swin Transformer V2 implementation as a Python package by using pip.

pip install git+https://github.com/ChristophReich1996/Involution

Alternatively, you can clone the repository and use the implementation in swin_transformer_v2 directly in your project.

Usage

This implementation provides the configurations reported in the paper (SwinV2-T, SwinV2-S, etc.). You can build the model by calling the corresponding function. Please note that the Swin Transformer V2 (SwinTransformerV2 class) implementation returns the feature maps of each stage of the network (List[torch.Tensor]). If you want to use this implementation for image classification simply wrap this model and take the final feature map.

from swin_transformer_v2 import SwinTransformerV2

from swin_transformer_v2 import swin_transformer_v2_t, swin_transformer_v2_s, swin_transformer_v2_b, \
    swin_transformer_v2_l, swin_transformer_v2_h, swin_transformer_v2_g

# SwinV2-T
swin_transformer: SwinTransformerV2 = swin_transformer_v2_t(in_channels=3,
                                                            window_size=8,
                                                            input_resolution=(256, 256),
                                                            sequential_self_attention=False,
                                                            use_checkpoint=False)

If you want to change the resolution and/or the window size for fine-tuning or inference pleas use the update_resolution method.

# Change resolution and window size of the model
swin_transformer.update_resolution(new_window_size=16, new_input_resolution=(512, 512))

In case you want to use a custom configuration you can use the SwinTransformerV2 class. The constructor method takes the following parameters.

Parameter	Description	Type
in_channels	Number of input channels	int
depth	Depth of the stage (number of layers)	int
downscale	If true input is downsampled (see Fig. 3 or V1 paper)	bool
input_resolution	Input resolution	Tuple[int, int]
number_of_heads	Number of attention heads to be utilized	int
window_size	Window size to be utilized	int
shift_size	Shifting size to be used	int
ff_feature_ratio	Ratio of the hidden dimension in the FFN to the input channels	int
dropout	Dropout in input mapping	float
dropout_attention	Dropout rate of attention map	float
dropout_path	Dropout in main path	float
use_checkpoint	If true checkpointing is utilized	bool
sequential_self_attention	If true sequential self-attention is performed	bool

This file includes a full example how to use this implementation.

Disclaimer

This is a very experimental implementation based on the Swin Transformer V2 paper and the official implementation of the Swin Transformer V1. Since an official implementation of the Swin Transformer V2 is not yet published, it is not possible to say to which extent this implementation might differ from the original one. If you have any issues with this implementation please raise an issue.

Reference

@article{Liu2021,
    title={{Swin Transformer V2: Scaling Up Capacity and Resolution}},
    author={Liu, Ze and Hu, Han and Lin, Yutong and Yao, Zhuliang and Xie, Zhenda and Wei, Yixuan and Ning, Jia and Cao, 
            Yue and Zhang, Zheng and Dong, Li and others},
    journal={arXiv preprint arXiv:2111.09883},
    year={2021}
}

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

Related tags

Overview

Swin Transformer V2: Scaling Up Capacity and Resolution

Installation

Usage

Disclaimer

Reference

Owner

Christoph Reich

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Repository of best practices for deep learning in Julia, inspired by fastai

HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

A framework for multi-step probabilistic time-series/demand forecasting models

Model-based reinforcement learning in TensorFlow

A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.

Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Supporting code for short YouTube series Neural Networks Demystified.

Revisiting Temporal Alignment for Video Restoration

Static-test - A playground to play with ideas related to testing the comparability of the code

CS50x-AI - Artificial Intelligence with Python from Harvard University

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

A Python parser that takes the content of a text file and then reads it into variables.

Automatic Data-Regularized Actor-Critic (Auto-DrAC)

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)