BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Last update: Dec 09, 2022

Overview

FS-QAT: Few Shot Temporal Action Localization using Query Adaptive Transformer

Accepted as Poster in BMVC 2021

This is an official implementation in PyTorch of FS-QAT. Our paper is available at Arxiv

Updates

(October, 2021) We released FS-QAT training and inference code for ActivityNet dataset.
(October, 2021) FS-QAT is accepted in BMVC2021.

Abstract

Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setting is not only unnatural – actions are typically captured in untrimmed videos, but also ignores background video segments containing vital contextual cues for foreground action segmentation. In this work, we first propose a new FS-TAL setting by proposing to use untrimmed training videos. Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously. This is achieved by introducing a query adaptive Transformer in the model. Extensive experiments on two action localization benchmarks demonstrate that our method can outperform all the stateof-the-art alternatives significantly in both single-domain and cross-domain scenarios.

Summary

First Few-Shot TAL setting to use Untrimmed Videos for both Support and Query
Unified Model can accomodate both Untrimmed and Trimmed Video without design change
Instead of meta-learning the entire network, only Transformer is meta-learned hence faster adaptation.
Intra-Class Variance is handled using this adaptation
Promising performance in Cross-Domain/Dataset settings.

Qualitative Performance

Training and Evaluation

Appologize for the messed up Code

Refactoring will be done soon ( delay due to CVPR workload )

To Train

python gtad_train_fs.py

To Test

sh test_fs.sh

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@misc{nag2021fewshot,
      title={Few-Shot Temporal Action Localization with Query Adaptive Transformer}, 
      author={Sauradip Nag and Xiatian Zhu and Tao Xiang},
      year={2021},
      eprint={2110.10552},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Related tags

Overview

FS-QAT: Few Shot Temporal Action Localization using Query Adaptive Transformer

Updates

Abstract

Summary

Qualitative Performance

Training and Evaluation

Citation

Owner

Sauradip Nag

Implementation of TabTransformer, attention network for tabular data, in Pytorch

CLOOB training (JAX) and inference (JAX and PyTorch)

Code for "Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation". [AAAI 2021]

Multi-Horizon-Forecasting-for-Limit-Order-Books

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

SemiNAS: Semi-Supervised Neural Architecture Search

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

https://sites.google.com/cornell.edu/recsys2021tutorial

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

A unified 3D Transformer Pipeline for visual synthesis

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

这是一个facenet-pytorch的库，可以用于训练自己的人脸识别模型。

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Astrostatistics class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)

CONditionals for Ordinal Regression and classification in PyTorch

CondNet: Conditional Classifier for Scene Segmentation