Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Last update: Dec 07, 2022

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

This repository is the official implementation of "Relational Self-Attention: What's Missing in Attention for Video Understanding" by Manjin Kim*, Heeseung Kwon*, Chunyu Wang, Suha Kwak, and Minsu Cho (*equal contribution).

Requirements

Python: 3.7.9
Pytorch: 1.6.0
TorchVision: 0.2.1
Cuda: 10.1
Conda environment environment.yml

To install requirements:

    conda env create -f environment.yml
    conda activate rsa

Dataset Preparation

Download Something-Something v1 & v2 (SSv1 & SSv2) datasets and extract RGB frames. Download URLs: SSv1, SSv2
Make txt files that define training & validation splits. Each line in txt files is formatted as [video_path] [#frames] [class_label]. Please refer to any txt files in ./data directory.

Training

To train RSANet-R50 on SSv1 or SSv2 datasets in the paper, run this command:

    # For SSv1
    ./scripts/train_Something_v1.sh 
    
    
     
    # example: ./scripts/train_Something_v1.sh RSA_R50_SSV1_16frames 16
    
    # For SSv2
    ./scripts/train_Something_v2.sh 
      
      
       
    # example: ./scripts/train_Something_v2.sh RSA_R50_SSV2_16frames 16

Evaluation

To evaluate RSANet-R50 on SSv2 dataset in the paper, run:

    # For SSv1
    ./scripts/test_Something_v1.sh 
    
     
     
      
    # example: ./scripts/test_Something_v1.sh RSA_R50_SSV1_16frames resnet_rgb_model_best.pth.tar 16
    
    # For SSv2
    ./scripts/test_Something_v2.sh 
       
        
        
          # example: ./scripts/test_Something_v2.sh RSA_R50_SSV2_16frames resnet_rgb_model_best.pth.tar 16

Results

Our model achieves the following performance on Something-Something-V1 and Something-Something-V2:

model	dataset	frames	top-1 / top-5	logs	checkpoints
RSANet-R50	SSV1	16	54.0 % / 81.1 %	[log]	[checkpoint]
RSANet-R50	SSV2	16	66.0 % / 89.9 %	[log]	[checkpoint]

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

Requirements

Dataset Preparation

Training

Evaluation

Results

Qualitative Results

Owner

mandos

Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Gym Threat Defense

Mesh TensorFlow: Model Parallelism Made Easier

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Tgbox-bench - Simple TGBOX upload speed benchmark

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

Autoencoder - Reducing the Dimensionality of Data with Neural Network

scikit-learn: machine learning in Python

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Ladder Variational Autoencoders (LVAE) in PyTorch

PassAPI is a password generator in hash format and fully developed in Python, with the aim of teaching how to handle and build

Facebook AI Image Similarity Challenge: Descriptor Track

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Turi Create simplifies the development of custom machine learning models.

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

Classical OCR DCNN reproduction based on PaddlePaddle framework.

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

Requirements

Dataset Preparation

Training

Evaluation

Results

Qualitative Results

Owner

mandos

Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

Gym Threat Defense

Mesh TensorFlow: Model Parallelism Made Easier

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Tgbox-bench - Simple TGBOX upload speed benchmark

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

Autoencoder - Reducing the Dimensionality of Data with Neural Network

scikit-learn: machine learning in Python

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Ladder Variational Autoencoders (LVAE) in PyTorch

PassAPI is a password generator in hash format and fully developed in Python, with the aim of teaching how to handle and build

Facebook AI Image Similarity Challenge: Descriptor Track

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Turi Create simplifies the development of custom machine learning models.

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

Classical OCR DCNN reproduction based on PaddlePaddle framework.

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务