Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Last update: Dec 28, 2022

Overview

STAM - Pytorch

Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in video classification. This corroborates the finding of TimeSformer. Attention is all we need.

Install

$ pip install stam-pytorch

Usage

import torch
from stam_pytorch import STAM

model = STAM(
    dim = 512,
    image_size = 256,     # size of image
    patch_size = 32,      # patch size
    num_frames = 5,       # number of image frames, selected out of video
    space_depth = 12,     # depth of vision transformer
    space_heads = 8,      # heads of vision transformer
    space_mlp_dim = 2048, # feedforward hidden dimension of vision transformer
    time_depth = 6,       # depth of time transformer (in paper, it was shallower, 6)
    time_heads = 8,       # heads of time transformer
    time_mlp_dim = 2048,  # feedforward hidden dimension of time transformer
    num_classes = 100,    # number of output classes
    space_dim_head = 64,  # space transformer head dimension
    time_dim_head = 64,   # time transformer head dimension
    dropout = 0.,         # dropout
    emb_dropout = 0.      # embedding dropout
)

frames = torch.randn(2, 5, 3, 256, 256) # (batch x frames x channels x height x width)
pred = model(frames) # (2, 100)

Citations

@misc{sharir2021image,
    title   = {An Image is Worth 16x16 Words, What is a Video Worth?}, 
    author  = {Gilad Sharir and Asaf Noy and Lihi Zelnik-Manor},
    year    = {2021},
    eprint  = {2103.13915},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

You might also like...

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

191 Dec 31, 2022

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

👁️ Hindsight AI: Crime Classification With Clip About For Educational Purposes Only This is a recursive neural net trained to classify specific crime

2 Jun 5, 2022

SOTA model in CIFAR10

A PyTorch Implementation of CIFAR Tricks 调研了CIFAR10数据集上各种trick，数据增强，正则化方法，并进行了实现。目前项目告一段落，如果有更好的想法，或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。 0. Requirement

58 Dec 21, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

159 Dec 22, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Simple-Image-Classification Simple Image Classification Code (PyTorch) Yechan Kim This repository contains: Python3 / Pytorch code for multi-class ima

8 Oct 29, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

1.4k Jan 1, 2023

Comments

train loss curve is realy weird

Hi Thanks for the great work. I am trying to train your transformer with my video data but the train loss curve looks really weird and its increasing. I tried with normal video data of shape [3,256,256] in range 0-255 and in range 0-1. Could you please tell what I am doing wrong. My data contains a person doing something and I want to recognize the action.

opened by saniazahan 1
regression

Beautiful work as usual, thanks for this implementation.

I'm curious if you tried using this for a regression task? I have tried using TimeSFormer without success yet, I know the signal is there because I can learn it with a small 3dcnn trained from scratch so I suspect my understanding of how and where to modify the transformer is the culprit. The output is a 1D vector with len == num_frames. Any suggestions very appreciated!

opened by raijinspecial 2

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Related tags

Overview

STAM - Pytorch

Install

Usage

Citations

You might also like...

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

SOTA model in CIFAR10

A toolkit for document-level event extraction, containing some SOTA model implementations

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Comments

train loss curve is realy weird

regression

Releases(0.0.4)

0.0.4(Mar 31, 2021)

0.0.2(Mar 28, 2021)

0.0.1(Mar 28, 2021)

Owner

Phil Wang

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

A Unified Generative Framework for Various NER Subtasks.

Phylogeny Partners

Le dataset des images du projet d'IA de 2021

Julia and Matlab codes to simulated all problems in El-Hachem, McCue and Simpson (2021)

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

particle tracking model, works with the ROMS output file(qck.nc, his.nc)

Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

joint detection and semantic segmentation, based on ultralytics/yolov5,

Source code for Transformer-based Multi-task Learning for Disaster Tweet Categorisation (UCD's participation in TREC-IS 2020A, 2020B and 2021A).

PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Grounding Representation Similarity with Statistical Testing

An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

PyTorch implementation of our ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.

This repository contains an implementation of the Permutohedral Attention Module in Pytorch

Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications

某学校选课系统GIF验证码数据集 + Baseline模型 + 上下游相关工具

Some useful blender add-ons for SMPL skeleton's poses and global translation.

CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.