TVNet: Temporal Voting Network for Action Localization

Related tags

Deep LearningTVNet
Overview

TVNet: Temporal Voting Network for Action Localization

This repo holds the codes of paper: "TVNet: Temporal Voting Network for Action Localization".

Paper Introduction

Temporal action localization is a vital task in video understranding. In this paper, we propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries.

Dependencies

  • Python == 2.7
  • Tensorflow == 1.9.0
  • CUDA==10.1.105
  • GCC >= 5.4

Note that the PEM code from BMN is implemented in Pytorch==1.1.0 or 1.3.0

Data Preparation

Datasets

Our experiments is based on ActivityNet 1.3 and THUMOS14 datasets.

Feature for THUMOS14

You can download the feature on THUMOS14 at here GooogleDrive.

Place it into a folder named thumos_features inside ./data.

You also need to download the feature for PEM (from BMN) at GooogleDrive. Please put it into a folder named Thumos_feature_hdf5 inside ./TVNet-THUMOS14/data/thumos_features.

If everything goes well, you can get the folder architecture of ./TVNet-THUMOS14/data like this:

data                       
└── thumos_features                    
		├── Thumos_feature_dim_400              
		├── Thumos_feature_hdf5               
		├── features_train.npy 
		└── features_test.npy

Feature for ActivityNet 1.3

You can download the feature on ActivityNet 1.3 at here GoogleCloud. Please put csv_mean_100 directory into ./TVNet-ANET/data/activitynet_feature_cuhk/.

If everything goes well, you can get the folder architecture of ./TVNet-ANET/data like this:

data                        
└── activitynet_feature_cuhk                    
		    └── csv_mean_100

Run all steps

Run all steps on THUMOS14

cd TVNet-THUMOS14

Run the following script with all steps on THUMOS14:

bash do_all.sh

Note: If you use BlueCrystal 4, you can directly run the following script without any dependencies setup.

bash do_all_BC4.sh

Run all steps on ActivityNet 1.3

cd TVNet-ANET
bash do_all.sh  or  bash do_all_BC4.sh

Run steps separately

Take TVNet-THUMOS14 as an example:

cd TVNet-THUMOS14

1. Temporal evaluation module

python TEM_train.py
python TEM_test.py

2. Creat training data for voting evidence module

python VEM_create_windows.py --window_length L --window_stride S

L is the window length and S is the sliding stride. We generate training windows for length 10 with stride 5, and length 5 with stride 2.

3. Voting evidence module

python VEM_train.py --voting_type TYPE --window_length L --window_stride S
python VEM_test.py --voting_type TYPE --window_length L --window_stride S

TYPE should be start or end. We train and test models with window length 10 (stride 5) and window length 5 (stride 2) for start and end separately.

4. Proposal evaluation module from BMN

python PEM_train.py

5. Proposal generation

python proposal_generation.py

6. Post processing and detection

python post_postprocess.py

Results

THUMOS14

tIoU [email protected]
0.3 0.5724681814413137
0.4 0.5060844218403346
0.5 0.430414918823808
0.6 0.3297164845828022
0.7 0.202971546242546

ActivityNet 1.3

tIoU [email protected]
Average 0.3460396513933088
0.5 0.5135151163296395
0.75 0.34955648726767025
0.95 0.10121803584836778

Reference

This implementation borrows from:

BSN: BSN-Boundary-Sensitive-Network

TEM_train/test.py -- for the TEM module we used in our paper
load_dataset.py -- borrow the part which load data for TEM

BMN: BMN-Boundary-Matching-Network

PEM_train.py -- for the PEM module we used in our paper

G-TAD: Sub-Graph Localization for Temporal Action Detection

post_postprocess.py -- for the multicore process to generate detection

Our main contribution is in:

VEM_create_windows.py -- generate training annotations for Voting Evidence Module (VEM)

VEM_train.py -- train Voting Evidence Module (VEM)

VEM_test.py -- test Voting Evidence Module (VEM)
Owner
hywang
hywang
Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs

Context-Aware-Healthcare Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs Download

LuChang 9 Dec 26, 2022
PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

Jacob Gildenblat 196 Nov 27, 2022
NeRD: Neural Reflectance Decomposition from Image Collections

NeRD: Neural Reflectance Decomposition from Image Collections Project Page | Video | Paper | Dataset Implementation for NeRD. A novel method which dec

Computergraphics (University of Tübingen) 195 Dec 29, 2022
Tensorflow-seq2seq-tutorials - Dynamic seq2seq in TensorFlow, step by step

seq2seq with TensorFlow Collection of unfinished tutorials. May be good for educational purposes. 1 - simple sequence-to-sequence model with dynamic u

Matvey Ezhov 1k Dec 17, 2022
A treasure chest for visual recognition powered by PaddlePaddle

简体中文 | English PaddleClas 简介 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 近期更新 2021.11.1 发布PP-ShiTu技术报告,新增饮料识别demo 2021.10.23 发

4.6k Dec 31, 2022
CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

Google Research 80 Dec 25, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 129 Dec 24, 2022
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Thank you for you

Weirui Ye 671 Jan 03, 2023
Boostcamp CV Serving For Python

Boostcamp-CV-Serving Prerequisites MySQL GCP Cloud Storage GCP key file Sentry Streamlit Cloud Secrets: .streamlit/secrets.toml #DO NOT SHARE THIS I

Jungwon Seo 19 Feb 22, 2022
Face Transformer for Recognition

Face-Transformer This is the code of Face Transformer for Recognition (https://arxiv.org/abs/2103.14803v2). Recently there has been great interests of

Zhong Yaoyao 153 Nov 30, 2022
Object detection (YOLO) with pytorch, OpenCV and python

Real Time Object/Face Detection Using YOLO-v3 This project implements a real time object and face detection using YOLO algorithm. You only look once,

1 Aug 04, 2022
codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"

Eigenlearning This repo contains code for replicating the experiments of the paper A Theory of the Inductive Bias and Generalization of Kernel Regress

Jamie Simon 45 Dec 02, 2022
Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

ProMo (Prosody Morph) Questions? Comments? Feedback? Chat with us on gitter! A library for manipulating pitch and duration in an algorithmic way, for

Tim 71 Jan 02, 2023
Code for the paper: Adversarial Machine Learning: Bayesian Perspectives

Code for the paper: Adversarial Machine Learning: Bayesian Perspectives This repository contains code for reproducing the experiments in the ** Advers

Roi Naveiro 2 Nov 11, 2022
Seq2seq - Sequence to Sequence Learning with Keras

Seq2seq Sequence to Sequence Learning with Keras Hi! You have just found Seq2Seq. Seq2Seq is a sequence to sequence learning add-on for the python dee

Fariz Rahman 3.1k Dec 18, 2022
This repository contains the code for the paper ``Identifiable VAEs via Sparse Decoding''.

Sparse VAE This repository contains the code for the paper ``Identifiable VAEs via Sparse Decoding''. Data Sources The datasets used in this paper wer

Gemma Moran 17 Dec 12, 2022
Public Code for NIPS submission SimiGrad: Fine-Grained Adaptive Batching for Large ScaleTraining using Gradient Similarity Measurement

Public code for NIPS submission "SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement" This repo co

Heyang Qin 0 Oct 13, 2021
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI G

Robin Henry 99 Dec 12, 2022
Code for MarioNette: Self-Supervised Sprite Learning, in NeurIPS 2021

MarioNette | Webpage | Paper | Video MarioNette: Self-Supervised Sprite Learning Dmitriy Smirnov, Michaël Gharbi, Matthew Fisher, Vitor Guizilini, Ale

Dima Smirnov 28 Nov 18, 2022