DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Last update: Dec 21, 2022

Related tags

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	[email protected]
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

Owner

CASIA-IVA-Lab

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

TensorFlow implementation of ENet

Implementation of C-RNN-GAN.

Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

HarDNeXt: Official HarDNeXt repository

A general python framework for visual object tracking and video object segmentation, based on PyTorch

Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation

:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".