Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Last update: Sep 18, 2022

Overview

Pytorch Code for VideoLT

[Website][Paper]

Updates

[10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zhangxing18 at fudan.edu.cn
[09/28/2021] Features uploaded to Aliyun Drive(deprecated), for access please send us an e-mail: zhangxing18 at fudan.edu.cn
[08/23/2021] Checkpoint links uploaded, sorry we are handling campus network bandwidth limitation, dataset will be released in this weeek.
[08/15/2021] Code released. Dataset download links and checkpoints links will be updated in a week.
[07/29/2021] Dataset released, visit https://videolt.github.io/ for downloading.
[07/23/2021] VideoLT is accepted by ICCV2021.

Overview

VideoLT is a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. We provide VideoLT dataset and long-tailed baselines in this repo including:

Data Preparation

Please visit https://videolt.github.io/ to obtain download links. We provide raw videos and extracted features.

For using extracted features, please modify dataset/dutils.py and set the correct path to features.

Model Zoo

The baseline scripts and checkpoints are provided in MODELZOO.md.

FrameStack

FrameStack is simple yet effective approach for long-tailed video recognition which re-samples training data at the frame level and adopts a dynamic sampling strategy based on knowledge learned by the network. The rationale behind FrameStack is to dynamically sample more frames from videos in tail classes and use fewer frames for those from head classes.

Usage

Requirement

pip install -r requirements.txt

Prepare Data Path

Modify FEATURE_NAME, PATH_TO_FEATURE and FEATURE_DIM in dataset/dutils.py.
Set ROOT in dataset/dutils.py to labels folder. The directory structure is:

    labels
    |-- count-labels-train.lst
    |-- test.lst
    |-- test_videofolder.txt
    |-- train.lst
    |-- train_videofolder.txt
    |-- val_videofolder.txt
    `-- validate.lst

Train

We provide scripts for training. Please refer to MODELZOO.md.

Example training scripts:

FEATURE_NAME='ResNet101'

export CUDA_VISIBLE_DEVICES='2'
python base_main.py  \
     --augment "mixup" \
     --feature_name $FEATURE_NAME \
     --lr 0.0001 \
     --gd 20 --lr_steps 30 60 --epochs 100 \
     --batch-size 128 -j 16 \
     --eval-freq 5 \
     --print-freq 20 \
     --root_log=$FEATURE_NAME-log \
     --root_model=$FEATURE_NAME'-checkpoints' \
     --store_name=$FEATURE_NAME'_bs128_lr0.0001_lateavg_mixup' \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Note: Set args.resample, args.augment and args.loss_func can apply multiple long-tailed stratigies.

Options:

    args.resample: ['None', 'CBS','SRS']
    args.augment : ['None', 'mixup', 'FrameStack']
    args.loss_func: ['BCELoss', 'LDAM', 'EQL', 'CBLoss', 'FocalLoss']

Test

We provide scripts for testing in scripts. Modify CKPT to saved checkpoints.

Example testing scripts:

FEATURE_NAME='ResNet101'
CKPT='VideoLT_checkpoints/ResNet-101/ResNet101_bs128_lr0.0001_lateavg_mixup/ckpt.best.pth.tar'

export CUDA_VISIBLE_DEVICES='1'
python base_test.py \
     --resume $CKPT \
     --feature_name $FEATURE_NAME \
     --batch-size 128 -j 16 \
     --print-freq 20 \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Citing

If you find VideoLT helpful for your research, please consider citing:

@misc{zhang2021videolt,
title={VideoLT: Large-scale Long-tailed Video Recognition}, 
author={Xing Zhang and Zuxuan Wu and Zejia Weng and Huazhu Fu and Jingjing Chen and Yu-Gang Jiang and Larry Davis},
year={2021},
eprint={2105.02668},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Related tags

Overview

Pytorch Code for VideoLT

[Website][Paper]

Updates

Overview

Data Preparation

Model Zoo

FrameStack

Usage

Requirement

Prepare Data Path

Train

Test

Citing

Owner

Skye

PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

Finetuning Pipeline

PiRapGenerator - Make anyone rap the digits of pi

基于Paddle框架的fcanet复现

Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

labelpix is a graphical image labeling interface for drawing bounding boxes

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

FMA: A Dataset For Music Analysis

RefineMask (CVPR 2021)

TensorFlow implementation of Elastic Weight Consolidation

AbelNN: Deep Learning Python module from scratch

Dataset for the Research2Clinics @ NeurIPS 2021 Paper: What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Temporally Coherent GAN SIGGRAPH project.

SMPL-X: A new joint 3D model of the human body, face and hands together

Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Related tags

Overview

Pytorch Code for VideoLT

[Website][Paper]

Updates

Overview

Data Preparation

Model Zoo

FrameStack

Usage

Requirement

Prepare Data Path

Train

Test

Citing

Owner

Skye

PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

Finetuning Pipeline

PiRapGenerator - Make anyone rap the digits of pi

基于Paddle框架的fcanet复现

Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

labelpix is a graphical image labeling interface for drawing bounding boxes

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

FMA: A Dataset For Music Analysis

RefineMask (CVPR 2021)

TensorFlow implementation of Elastic Weight Consolidation

AbelNN: Deep Learning Python module from scratch

Dataset for the Research2Clinics @ NeurIPS 2021 Paper: What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Temporally Coherent GAN SIGGRAPH project.

SMPL-X: A new joint 3D model of the human body, face and hands together

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务