End-to-end Temporal Action Detection with Transformer. [Under review]

Overview

TadTR: End-to-end Temporal Action Detection with Transformer

By Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Song Bai, Xiang Bai.

This repo holds the code for TadTR, described in the technical report: End-to-end temporal action detection with Transformer

Introduction

TadTR is an end-to-end Temporal Action Detection TRansformer. It has the following advantages over previous methods:

  • Simple. It adopts a set-prediction pipeline and achieves TAD with a single network. It does not require a separate proposal generation stage.
  • Flexible. It removes hand-crafted design such as anchor setting and NMS.
  • Sparse. It produces very sparse detections (e.g. 10 on ActivityNet), thus requiring lower computation cost.
  • Strong. As a self-contained temporal action detector, TadTR achieves state-of-the-art performance on HACS and THUMOS14. It is also much stronger than concurrent Transformer-based methods.

We're still improving TadTR. Stay tuned for the future version.

Updates

[2021.9.15] Update the performance on THUMOS14.

[2021.9.1] Add demo code.

TODOs

  • add model code
  • add inference code
  • add training code
  • support training/inference with video input

Main Results

  • HACS Segments
Method Feature [email protected] [email protected] [email protected] Avg. mAP Model
TadTR I3D RGB 45.16 30.70 11.78 30.83 [OneDrive]
  • THUMOS14
Method Feature [email protected] [email protected] [email protected] [email protected] [email protected] Avg. mAP Model
TadTR I3D 2stream 72.92 66.86 58.59 46.31 32.32 55.40 [OneDrive]
TadTR TSN 2stream 64.24 58.34 50.01 40.79 29.07 48.49 [OneDrive]
  • ActivityNet-1.3
Method Feature [email protected] [email protected] [email protected] Avg. mAP Model
TadTR+BMN TSN 2stream 50.51 35.35 8.18 34.55 [OneDrive]

Install

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

  • Other requirements

    pip install -r requirements.txt

Compiling CUDA extensions

cd model/ops;

# If you have multiple installations of CUDA Toolkits, you'd better add a prefix
# CUDA_HOME=<your_cuda_toolkit_path> to specify the correct version. 
python setup.py build_ext --inplace

Run a quick test

python demo.py

Data Preparation

To be updated.

Training

Run the following command

bash scripts/train.sh DATASET

Testing

bash scripts/test.sh DATASET WEIGHTS

Acknowledgement

The code is based on the DETR and Deformable DETR. We also borrow the implementation of the RoIAlign1D from G-TAD. Thanks for their great works.

Citing

@article{liu2021end,
  title={End-to-end Temporal Action Detection with Transformer},
  author={Liu, Xiaolong and Wang, Qimeng and Hu, Yao and Tang, Xu and Bai, Song and Bai, Xiang},
  journal={arXiv preprint arXiv:2106.10271},
  year={2021}
}

Contact

For questions and suggestions, please contact Xiaolong Liu at "liuxl at hust dot edu dot cn".

Owner
Xiaolong Liu
PhD student @ HUST | Deep learning | computer vision | action recognition
Xiaolong Liu
Loopy belief propagation for factor graphs on discrete variables, in JAX!

PGMax implements general factor graphs for discrete probabilistic graphical models (PGMs), and hardware-accelerated differentiable loopy belief propagation (LBP) in JAX.

Vicarious 62 Dec 23, 2022
Server files for UltimateLabeling

UltimateLabeling server files Server files for UltimateLabeling. git clone https://github.com/alexandre01/UltimateLabeling_server.git cd UltimateLabel

Alexandre Carlier 4 Oct 10, 2022
End-to-end speech secognition toolkit

End-to-end speech secognition toolkit This is an E2E ASR toolkit modified from Espnet1 (version 0.9.9). This is the official implementation of paper:

Jinchuan Tian 147 Dec 28, 2022
Official implementation of Unfolded Deep Kernel Estimation for Blind Image Super-resolution.

Unfolded Deep Kernel Estimation for Blind Image Super-resolution Hongyi Zheng, Hongwei Yong, Lei Zhang, "Unfolded Deep Kernel Estimation for Blind Ima

Z80 15 Dec 26, 2022
ICLR2021 (Under Review)

Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning This repository contains the official PyTorch implementation o

Haoyi Fan 58 Dec 30, 2022
Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions"

Graph Convolution Simulator (GCS) Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions" Requirements: PyTor

yifan 10 Oct 18, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

356 Dec 23, 2022
HDMapNet: A Local Semantic Map Learning and Evaluation Framework

HDMapNet_devkit Devkit for HDMapNet. HDMapNet: A Local Semantic Map Learning and Evaluation Framework Qi Li, Yue Wang, Yilun Wang, Hang Zhao [Paper] [

Tsinghua MARS Lab 421 Jan 04, 2023
根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

Genshin-lyre-auto-play 简体中文 | English 简介 根据midi文件演奏“风物之诗琴”的脚本。由Python驱动,在此承诺, ⚠️ 项目内绝不含任何能够引起安全问题的代码。 前排提示:所有键盘在动但是原神没反应的都是因为没有管理员权限,双击run.bat或者以管理员模式

御坂17032号 386 Jan 01, 2023
PIXIE: Collaborative Regression of Expressive Bodies

PIXIE: Collaborative Regression of Expressive Bodies [Project Page] This is the official Pytorch implementation of PIXIE. PIXIE reconstructs an expres

Yao Feng 331 Jan 04, 2023
This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

RUAS This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision" A prelimin

Vision & Optimization Group (VOG) 2 May 05, 2022
Repository for the AugmentedPCA Python package.

Overview This Python package provides implementations of Augmented Principal Component Analysis (AugmentedPCA) - a family of linear factor models that

Billy Carson 6 Dec 07, 2022
This respository includes implementations on Manifoldron: Direct Space Partition via Manifold Discovery

Manifoldron: Direct Space Partition via Manifold Discovery This respository includes implementations on Manifoldron: Direct Space Partition via Manifo

dayang_wang 4 Apr 28, 2022
Python implementation of Bayesian optimization over permutation spaces.

Bayesian Optimization over Permutation Spaces This repository contains the source code and the resources related to the paper "Bayesian Optimization o

Aryan Deshwal 9 Dec 23, 2022
Código de um painel de auto atendimento feito em Python.

Painel de Auto-Atendimento O intuito desse projeto era fazer em Python um programa que simulasse um painel de auto atendimento, no maior estilo Mac Do

Calebe Alves Evangelista 2 Nov 09, 2022
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise

45 Dec 08, 2022
Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

Willi Menapace 136 Dec 31, 2022
RNN Predict Street Commercial Vitality

RNN-for-Predicting-Street-Vitality Code and dataset for Predicting the Vitality of Stores along the Street based on Business Type Sequence via Recurre

Zidong LIU 1 Dec 15, 2021
Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

stroke-predictions-ml-model machine learning model to predict individuals chance

Alex Volchek 1 Jan 03, 2022