This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Last update: Dec 13, 2022

Related tags

Overview

ResT

By Qing-Long Zhang and Yu-Bin Yang

[State Key Laboratory for Novel Software Technology at Nanjing University]

This repo is the official implementation of "ResT: An Efficient Transformer for Visual Recognition". It currently includes code and models for the following tasks:

Image Classification: Included in this repo. See get_started.md for a quick start.

Object Detection and Instance Segmentation: Based on detectron2, coming soon.

ResT is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Main Results on ImageNet with Pretrained Models

ImageNet-1K Pretrained Models

name	resolution	[email protected]	[email protected]	#params	FLOPs	FPS	1K model
ResT-Lite	224x224	77.2	93.7	10.5M	1.4G	1246	baidu
ResT-Small	224x224	79.6	94.9	13.7M	1.9G	1043	baidu
ResT-Base	224x224	81.6	95.7	30.3M	4.3G	673	baidu
ResT-Large	224x224	83.6	96.3	51.6M	7.9G	429	baidu

Note: access code for baidu is rest.

Citing ResT

@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v2},
  year={2021}
}

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Related tags

Overview

ResT

Main Results on ImageNet with Pretrained Models

Citing ResT

Owner

zhql

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

An API-first distributed deployment system of deep learning models using timeseries data to analyze and predict systems behaviour

Contrastive Feature Loss for Image Prediction

这是一个facenet-pytorch的库，可以用于训练自己的人脸识别模型。

Research on Event Accumulator Settings for Event-Based SLAM

Adversarial Autoencoders

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

OoD Minimum Anomaly Score GAN - Code for the Paper 'OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary'

Robustness via Cross-Domain Ensembles

Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

Unsupervised Discovery of Object Radiance Fields

This repository includes code of my study about Asynchronous in Frequency domain of GAN images.