Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Last update: Jan 03, 2023

Overview

Learning-Action-Completeness-from-Points

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV 2021 Oral)

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
Pilhyeon Lee (Yonsei Univ.), Hyeran Byun (Yonsei Univ.)

Paper: https://arxiv.org/abs/2108.05029

Abstract: We tackle the problem of localizing temporal intervals of actions with only a single frame label for each action instance for training. Owing to label sparsity, existing work fails to learn action completeness, resulting in fragmentary action predictions. In this paper, we propose a novel framework, where dense pseudo-labels are generated to provide completeness guidance for the model. Concretely, we first select pseudo background points to supplement point-level action labels. Then, by taking the points as seeds, we search for the optimal sequence that is likely to contain complete action instances while agreeing with the seeds. To learn completeness from the obtained sequence, we introduce two novel losses that contrast action instances with background ones in terms of action score and feature similarity, respectively. Experimental results demonstrate that our completeness guidance indeed helps the model to locate complete action instances, leading to large performance gains especially under high IoU thresholds. Moreover, we demonstrate the superiority of our method over existing state-of-the-art methods on four benchmarks: THUMOS'14, GTEA, BEOID, and ActivityNet. Notably, our method even performs comparably to recent fully-supervised methods, at the 6 times cheaper annotation cost.

Prerequisites

Recommended Environment

Python 3.6
Pytorch 1.6
Tensorflow 1.15 (for Tensorboard)
CUDA 10.2

Depencencies

You can set up the environments by using $ pip3 install -r requirements.txt.

Data Preparation

Prepare THUMOS'14 dataset.
- We excluded three test videos (270, 1292, 1496) as previous work did.
Extract features with two-stream I3D networks
- We recommend extracting features using this repo.
- For convenience, we provide the features we used. You can find them here.
Place the features inside the dataset folder.
- Please ensure the data structure is as below.

├── dataset
   └── THUMOS14
       ├── gt.json
       ├── split_train.txt
       ├── split_test.txt
       ├── fps_dict.json
       ├── point_gaussian
           └── point_labels.csv
       └── features
           ├── train
               ├── rgb
                   ├── video_validation_0000051.npy
                   ├── video_validation_0000052.npy
                   └── ...
               └── flow
                   ├── video_validation_0000051.npy
                   ├── video_validation_0000052.npy
                   └── ...
           └── test
               ├── rgb
                   ├── video_test_0000004.npy
                   ├── video_test_0000006.npy
                   └── ...
               └── flow
                   ├── video_test_0000004.npy
                   ├── video_test_0000006.npy
                   └── ...

Usage

Running

You can easily train and evaluate the model by running the script below.

If you want to try other training options, please refer to options.py.

$ bash run.sh

Evaulation

The pre-trained model can be found here. You can evaluate the model by running the command below.

$ bash run_eval.sh

References

We note that this repo was built upon our previous models.

Background Suppression Network for Weakly-supervised Temporal Action Localization (AAAI 2020) [paper] [code]
Weakly-supervised Temporal Action Localization by Uncertainty Modeling (AAAI 2021) [paper] [code]

We referenced the repos below for the code.

In addition, we referenced a part of code in the following repo for the greedy algorithm implementation.

NeuralNetwork-Viterbi

Citation

If you find this code useful, please cite our paper.

@inproceedings{lee2021completeness,
  title={Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization},
  author={Pilhyeon Lee and Hyeran Byun},
  booktitle={IEEE/CVF International Conference on Computer Vision},
  year={2021},
}

Contact

If you have any question or comment, please contact the first author of the paper - Pilhyeon Lee ([email protected]).

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Related tags

Overview

Learning-Action-Completeness-from-Points

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV 2021 Oral)

Prerequisites

Recommended Environment

Depencencies

Data Preparation

Usage

Running

Evaulation

References

Citation

Contact

Owner

Pilhyeon Lee

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

PyTorch Implementation for "ForkGAN with SIngle Rainy NIght Images: Leveraging the RumiGAN to See into the Rainy Night"

retweet 4 satoshi ⚡️

A library for answering questions using data you cannot see

Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

A library for performing coverage guided fuzzing of neural networks

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

Depth image based mouse cursor visual haptic

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Learning Calibrated-Guidance for Object Detection in Aerial Images

Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

A demonstration of using a live Tensorflow session to create an interactive face-GAN explorer.

Educational 2D SLAM implementation based on ICP and Pose Graph

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Fully convolutional deep neural network to remove transparent overlays from images

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

An efficient 3D semantic segmentation framework for Urban-scale point clouds like SensatUrban, Campus3D, etc.

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

Classification Modeling: Probability of Default