The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Overview

Temporal Query Networks for Fine-grained Video Understanding

📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks for Fine-grained Video Understanding

Abstract

Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set.

We make the following four contributions: (i) We propose a new model — a Temporal Query Network — which enables the query-response functionality, and a structural undertanding of fine-grained actions. It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query. (ii) We propose a new way — stochastic feature bank update — to train a network on videos of various lengths with the dense sampling required to respond to fine-grained queries. (iii) we compare the TQN to other architectures and text supervision methods, and analyze their pros and cons. Finally, (iv) we evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features.

Getting Started

  1. Clone this repository
git clone https://github.com/Chuhanxx/Temporal_Query_Networks.git
  1. Create conda virtual env and install the requirements
    (This implementation requires CUDA and python > 3.7)
cd Temporal_Query_Networks
source build_venv.sh

Prepare Data and Weight Initialization

Please refer to data.md for data preparation.

Training

you can start training the model with the following steps, taking the Diving48 dataset as an example,:

  1. First stage training: Set the paths in the Diving48_first_stage.yaml config file first, and then run:
cd scripts
python train_1st_stage.py --name $EXP_NAME --dataset diving48 --dataset_config ../configs/Diving48_first_stage.yaml --gpus 0,1 --batch_size 16  
  1. Construct stochastically updated feature banks:
python construct_SUFB.py --dataset diving48 --dataset_config ../configs/Diving48_first_stage.yaml \
--gpus 0  --resume_file  $PATH_TO_BEST_FILE_FROM_1ST_STAGE --out_dir $DIR_FOR_SAVING_FEATURES 
  1. Second stage training: Set the paths in the Diving48_second_stage.yaml config file first, and then run:
python train_2nd_stage.py --name $EXP_NAME  --dataset diving48  \
--dataset_config ../configs/Diving48_second_stage.yaml   \
--batch_size 16 --gpus 0,1

Test

python test.py --name $EXP_NAME  --dataset diving48 --batch_size 1 \
--dataset_config ../configs/Diving48_second_stage.yaml 

Citation

If you use this code etc., please cite the following paper:

@inproceedings{zhangtqn,
  title={Temporal Query Networks for Fine-grained Video Understanding},
  author={Chuhan Zhang and Ankush Gputa and Andrew Zisserman},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

If you have any question, please contact [email protected] .

Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

UPT: Unary–Pairwise Transformers This repository contains the official PyTorch implementation for the paper Frederic Z. Zhang, Dylan Campbell and Step

Frederic Zhang 109 Dec 20, 2022
[CVPR 2021] Unsupervised Degradation Representation Learning for Blind Super-Resolution

DASR Pytorch implementation of "Unsupervised Degradation Representation Learning for Blind Super-Resolution", CVPR 2021 [arXiv] Overview Requirements

Longguang Wang 318 Dec 24, 2022
A Keras implementation of YOLOv3 (Tensorflow backend)

keras-yolo3 Introduction A Keras implementation of YOLOv3 (Tensorflow backend) inspired by allanzelener/YAD2K. Quick Start Download YOLOv3 weights fro

7.1k Jan 03, 2023
The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

ycj_project 1 Jan 18, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Google_Landmark_Retrieval_2021_2nd_Place_Solution The 2nd place solution of 2021 google landmark retrieval on kaggle. Environment We use cuda 11.1/pyt

229 Dec 13, 2022
A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

8 Nov 01, 2022
A PyTorch Implementation of SphereFace.

SphereFace A PyTorch Implementation of SphereFace. The code can be trained on CASIA-Webface and the best accuracy on LFW is 99.22%. SphereFace: Deep H

carwin 685 Dec 09, 2022
PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

StARformer This repository contains the PyTorch implementation for our paper titled StARformer: Transformer with State-Action-Reward Representations.

Jinghuan Shang 14 Dec 09, 2022
Implementation of FitVid video prediction model in JAX/Flax.

FitVid Video Prediction Model Implementation of FitVid video prediction model in JAX/Flax. If you find this code useful, please cite it in your paper:

Google Research 62 Nov 25, 2022
Implementation of the SUMO (Slim U-Net trained on MODA) model

SUMO - Slim U-Net trained on MODA Implementation of the SUMO (Slim U-Net trained on MODA) model as described in: TODO: add reference to paper once ava

6 Nov 19, 2022
This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom Binding Challenge

UmojaHack-Africa-2022-African-Snake-Antivenom-Binding-Challenge This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom

Mami Mokhtar 10 Dec 03, 2022
Vehicle Detection Using Deep Learning and YOLO Algorithm

VehicleDetection Vehicle Detection Using Deep Learning and YOLO Algorithm Dataset take or find vehicle images for create a special dataset for fine-tu

Maryam Boneh 96 Jan 05, 2023
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

yobi byte 29 Oct 09, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding 📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

55 Dec 21, 2022
MohammadReza Sharifi 27 Dec 13, 2022
Kaggle Ultrasound Nerve Segmentation competition [Keras]

Ultrasound nerve segmentation using Keras (1.0.7) Kaggle Ultrasound Nerve Segmentation competition [Keras] #Install (Ubuntu {14,16}, GPU) cuDNN requir

179 Dec 28, 2022
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
A 2D Visual Localization Framework based on Essential Matrices [ICRA2020]

A 2D Visual Localization Framework based on Essential Matrices This repository provides implementation of our paper accepted at ICRA: To Learn or Not

Qunjie Zhou 27 Nov 07, 2022
The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

Contrastive Spatio Temporal Pretext Learning for Self-supervised Video Representation (AAAI 2022) The code for paper "Contrastive Spatio-Temporal Pret

8 Jun 30, 2022
TLXZoo - Pre-trained models based on TensorLayerX

Pre-trained models based on TensorLayerX. TensorLayerX is a multi-backend AI fra

TensorLayer Community 13 Dec 07, 2022