UniFormer - official implementation of UniFormer

Overview

UniFormer

This repo is the official implementation of "Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning". It currently includes code and models for the following tasks:

Updates

01/13/2022

[Initial commits]:

  1. Pretrained models on ImageNet-1K, Kinetics-400, Kinetics-600, Something-Something V1&V2

  2. The supported code and models for image classification and video classification are provided.

Introduction

UniFormer (Unified transFormer) is introduce in arxiv, which effectively unifies 3D convolution and spatiotemporal self-attention in a concise transformer format. We adopt local MHRA in shallow layers to largely reduce computation burden and global MHRA in deep layers to learn global token relation.

UniFormer achieves strong performance on video classification. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10x fewer GFLOPs than other comparable methods (e.g., 16.7x fewer GFLOPs than ViViT with JFT-300M pre-training). For Something-Something V1 and V2, our UniFormer achieves 60.9% and 71.2% top-1 accuracy respectively, which are new state-of-the-art performances.

teaser

Main results on ImageNet-1K

Please see image_classification for more details.

More models with large resolution and token labeling will be released soon.

Model Pretrain Resolution Top-1 #Param. FLOPs
UniFormer-S ImageNet-1K 224x224 82.9 22M 3.6G
UniFormer-S† ImageNet-1K 224x224 83.4 24M 4.2G
UniFormer-B ImageNet-1K 224x224 83.9 50M 8.3G

Main results on Kinetics-400

Please see video_classification for more details.

Model Pretrain #Frame Sampling Method FLOPs K400 Top-1 K600 Top-1
UniFormer-S ImageNet-1K 16x1x4 16x4 167G 80.8 82.8
UniFormer-S ImageNet-1K 16x1x4 16x8 167G 80.8 82.7
UniFormer-S ImageNet-1K 32x1x4 32x4 438G 82.0 -
UniFormer-B ImageNet-1K 16x1x4 16x4 387G 82.0 84.0
UniFormer-B ImageNet-1K 16x1x4 16x8 387G 81.7 83.4
UniFormer-B ImageNet-1K 32x1x4 32x4 1036G 82.9 84.5*

* Since Kinetics-600 is too large to train (>1 month in single node with 8 A100 GPUs), we provide model trained in multi node (around 2 weeks with 32 V100 GPUs), but the result is lower due to the lack of tuning hyperparameters.

Main results on Something-Something

Please see video_classification for more details.

Model Pretrain #Frame FLOPs SSV1 Top-1 SSV2 Top-1
UniFormer-S K400 16x3x1 125G 57.2 67.7
UniFormer-S K600 16x3x1 125G 57.6 69.4
UniFormer-S K400 32x3x1 329G 58.8 69.0
UniFormer-S K600 32x3x1 329G 59.9 70.4
UniFormer-B K400 16x3x1 290G 59.1 70.4
UniFormer-B K600 16x3x1 290G 58.8 70.2
UniFormer-B K400 32x3x1 777G 60.9 71.1
UniFormer-B K600 32x3x1 777G 61.0 71.2

Main results on downstream tasks

We have conducted extensive experiments on downstream tasks and achieved comparable results with SOTA models.

Code and models will be released in two weeks.

Cite Uniformer

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{li2022uniformer,
      title={Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning}, 
      author={Kunchang Li and Yali Wang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
      year={2022},
      eprint={2201.04676},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Contributors and Contact Information

UniFormer is maintained by Kunchang Li.

For help or issues using UniFormer, please submit a GitHub issue.

For other communications related to UniFormer, please contact Kunchang Li ([email protected]).

Owner
SenseTime X-Lab
Powered by X-Lab, SenseTime Research
SenseTime X-Lab
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

405 Jan 06, 2023
A tensorflow implementation of an HMM layer

tensorflow_hmm Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms. See Keras example for an example of how to use

Zach Dwiel 283 Oct 19, 2022
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

[arXiv] The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, wh

ZOZO, Inc. 138 Nov 24, 2022
Kroomsa: A search engine for the curious

Kroomsa A search engine for the curious. It is a search algorithm designed to en

Wingify 7 Jun 20, 2022
Yolov3 pytorch implementation

YOLOV3 Pytorch实现 在bubbliiing大佬代码的基础上进行了修改,添加了部分注释。 预训练模型 预训练模型来源于bubbliiing。 链接:https://pan.baidu.com/s/1ncREw6Na9ycZptdxiVMApw 提取码:appk 训练自己的数据集 按照VO

4 Aug 27, 2022
A motion detection system with RaspberryPi, OpenCV, Python

Human Detection System using Raspberry Pi Functionality Activates a relay on detecting motion. You may need following components to get the expected R

Omal Perera 55 Dec 04, 2022
PyTorch implementation of the paper:A Convolutional Approach to Melody Line Identification in Symbolic Scores.

Symbolic Melody Identification This repository is an unofficial PyTorch implementation of the paper:A Convolutional Approach to Melody Line Identifica

Sophia Y. Chou 3 Feb 21, 2022
CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax ⚠️ Latest: Current repo is a complete version. But we delet

FishYuLi 341 Dec 23, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
Doge-Prediction - Coding Club prediction ig

Doge-Prediction Coding Club prediction ig Basically: Create an application that

1 Jan 10, 2022
Yolov5-opencv-cpp-python - Example of using ultralytics YOLO V5 with OpenCV 4.5.4, C++ and Python

yolov5-opencv-cpp-python Example of performing inference with ultralytics YOLO V

183 Jan 09, 2023
A developer interface for creating Chat AIs for the Chai app.

ChaiPy A developer interface for creating Chat AIs for the Chai app. Usage Local development A quick start guide is available here, with a minimal exa

Chai 28 Dec 28, 2022
MAGMA - a GPT-style multimodal model that can understand any combination of images and language

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning Authors repo (alphabetical) Constantin (CoEich), Mayukh (Mayukh

Aleph Alpha GmbH 331 Jan 03, 2023
Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework via Self-Supervised Multi-Task Learning. Code will be available soon.

Official-PyTorch-Implementation-of-TransMEF Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fu

117 Dec 27, 2022
Özlem Taşkın 0 Feb 23, 2022
SatelliteSfM - A library for solving the satellite structure from motion problem

Satellite Structure from Motion Maintained by Kai Zhang. Overview This is a libr

Kai Zhang 190 Dec 08, 2022
This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)

2.4k Jan 08, 2023
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Meta Research 5.3k Jan 03, 2023