[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Last update: Oct 16, 2022

Overview

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Project Website;

Datasets list and some visualizations/provided weights are preparing now.

1. Introduction (scene-dominated to motion-dominated)

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

Python3
pytorch1.1+
PIL
Intel (on the fly decode)

3. Structure

datasets
- list
  - hmdb51: the train/val lists of HMDB51
  - ucf101: the train/val lists of UCF101
  - kinetics-400: the train/val lists of kinetics-400
  - diving48: the train/val lists of diving48
experiments
- logs: experiments record in detials
- gradientes: grad check
- visualization:
src
- data: load data
- loss: the loss evaluate in this paper
- model: network architectures
- scripts: train/eval scripts
- augment: detail implementation of Spatio-temporal Augmentation
- utils
- feature_extract.py: feature extractor given pretrained model
- main.py: the main function of finetune
- trainer.py
- option.py
- pt.py: self-supervised pretrain
- ft.py: supervised finetune

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics

bash scripts/kinetics/pt.sh

UCF101

bash scripts/ucf101/pt.sh

Supervised Finetune (Clip-level)

HMDB51

bash scripts/hmdb51/ft.sh

UCF101

bash scripts/ucf101/ft.sh

Kinetics

bash scripts/kinetics/ft.sh

Video-level Evaluation

Following common practice TSN and Non-local. The final video-level result is average by 10 temporal window sampling + corner crop, which lead to better result than clip-level. Refer test.py for details.

Pretrain And Eval In one step

bash scripts/hmdb51/pt_and_ft_hmdb51.sh

Notice: More Training Options and ablation study Can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

UCF101 Pretrained (I3D)

Method	UCF101	HMDB51
Random Initialization	47.9	29.6
MoCo Baseline	62.3	36.5
DSM(Triplet)	70.7	48.5
DSM	74.8	52.5

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Method	@1	@5	@10	@20	@50
DSM	16.8	33.4	43.4	54.6	70.7

Video Retrieve (HMDB51-C3D)

Method	@1	@5	@10	@20	@50
DSM	8.2	25.9	38.1	52.0	75.0

More Visualization

Acknowledgement

This work is partly based on STN, UEL and MoCo.

License

Citation

If you use our code in your research or wish to refer to the baseline results, pleasuse use the followint BibTex entry.

@inproceedings{wang2020enhancing,
  author    = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan},
  title     = {Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion},
  booktitle = {AAAI},
  year      = {2021},
}

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Related tags

Overview

DSM

1. Introduction (scene-dominated to motion-dominated)

What DSM learned?

2. Installation

Dataset

Requirements

3. Structure

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics

UCF101

Supervised Finetune (Clip-level)

HMDB51

UCF101

Kinetics

Video-level Evaluation

Pretrain And Eval In one step

Video Retrieve and other visualization

(1). Feature Extractor

(2). Reterival Evaluation

Results

Action Recognition

UCF101 Pretrained (I3D)

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Video Retrieve (HMDB51-C3D)

More Visualization

Acknowledgement

License

Citation

Owner

Jinpeng Wang

Generate indoor scenes with Transformers

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Global Filter Networks for Image Classification

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs

Anonymize BLM Protest Images

Miscellaneous and lightweight network tools

This is the official Pytorch-version code of FlatGCN (Flattened Graph Convolutional Networks for Recommendation).

Deep Distributed Control of Port-Hamiltonian Systems

Structured Data Gradient Pruning (SDGP)

Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

Library for machine learning stacking generalization.

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

The FIRST GANs-based omics-to-omics translation framework

A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning