Video Contrastive Learning with Global Context

Overview

Video Contrastive Learning with Global Context (VCLR)

This is the official PyTorch implementation of our VCLR paper.

Install dependencies

  • environments
    conda create --name vclr python=3.7
    conda activate vclr
    conda install numpy scipy scikit-learn matplotlib scikit-image
    pip install torch==1.7.1 torchvision==0.8.2
    pip install opencv-python tqdm termcolor gcc7 ffmpeg tensorflow==1.15.2
    pip install mmcv-full==1.2.7

Prepare datasets

Please refer to PREPARE_DATA to prepare the datasets.

Prepare pretrained MoCo weights

In this work, we follow SeCo and use the pretrained weights of MoCov2 as initialization.

cd ~
git clone https://github.com/amazon-research/video-contrastive-learning.git
cd video-contrastive-learning
mkdir pretrain && cd pretrain
wget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_200ep/moco_v2_200ep_pretrain.pth.tar
cd ..

Self-supervised pretraining

bash shell/main_train.sh

Checkpoints will be saved to ./results

Downstream tasks

Linear evaluation

In order to evaluate the effectiveness of self-supervised learning, we conduct a linear evaluation (probing) on Kinetics400 dataset. Basically, we first extract features from the pretrained weight and then train a SVM classifier to see how the learned features perform.

bash shell/eval_svm.sh
  • Results

    Arch Pretrained dataset Epoch Pretrained model Acc. on K400
    ResNet50 Kinetics400 400 Download link 64.1

Video retrieval

bash shell/eval_retrieval.sh

Action recognition & action localization

Here, we use mmaction2 for both tasks. If you are not familiar with mmaction2, you can read the official documentation.

Installation

  • Step1: Install mmaction2

    To make sure the results can be reproduced, please use our forked version of mmaction2 (version: 0.11.0):

    conda activate vclr
    cd ~
    git clone https://github.com/KuangHaofei/mmaction2
    
    cd mmaction2
    pip install -v -e .
  • Step2: Prepare the pretrained weights

    Our pretrained backbone have different format with the backbone of mmaction2, it should be transferred to mmaction2 format. We provide the transferred version of our K400 pretrained weights, TSN and TSM. We also provide the script for transferring weights, you can find it here.

    Moving the pretrained weights to checkpoints directory:

    cd ~/mmaction2
    mkdir checkpoints
    wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm.pth
    wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm_tsm.pth

Action recognition

Make sure you have prepared the dataset and environments following the previous step. Now suppose you are in the root directory of mmaction2, follow the subsequent steps to fine tune the TSN or TSM models for action recognition.

For each dataset, the train and test setting can be found in the configuration files.

  • UCF101

    • config file: tsn_ucf101.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_ucf101.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_ucf101.py \
        work_dirs/vclr/ucf101/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • HMDB51

    • config file: tsn_hmdb51.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_hmdb51.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_hmdb51.py \
        work_dirs/vclr/hmdb51/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • SomethingSomethingV2: TSN

    • config file: tsn_sthv2.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_sthv2.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_sthv2.py \
        work_dirs/vclr/tsn_sthv2/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • SomethingSomethingV2: TSM

    • config file: tsm_sthv2.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsm/vclr/tsm_sthv2.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsm/vclr/tsm_sthv2.py \
        work_dirs/vclr/tsm_sthv2/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • ActivityNet

    • config file: tsn_activitynet.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_activitynet.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_activitynet.py \
        work_dirs/vclr/tsn_activitynet/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • Results

    Arch Dataset Finetuned model Acc.
    TSN UCF101 Download link 85.6
    TSN HMDB51 Download link 54.1
    TSN SomethingSomethingV2 Download link 33.3
    TSM SomethingSomethingV2 Download link 52.0
    TSN ActivityNet Download link 71.9

Action localization

  • Step 1: Follow the previous section, suppose the finetuned model is saved at work_dirs/vclr/tsn_activitynet/latest.pth

  • Step 2: Extract ActivityNet features

    cd ~/mmaction2/tools/data/activitynet/
    
    python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
      --data-list /home/ubuntu/data/ActivityNet/anet_train_video.txt \
      --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
      --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
    
    python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
      --data-list /home/ubuntu/data/ActivityNet/anet_val_video.txt \
      --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
      --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
    
    python activitynet_feature_postprocessing.py \
      --rgb /home/ubuntu/data/ActivityNet/rgb_feat \
      --dest /home/ubuntu/data/ActivityNet/mmaction_feat

    Note, the root directory of ActivityNey is /home/ubuntu/data/ActivityNet/ in our case. Please replace it according to your real directory.

  • Step 3: Train and test the BMN model

    • train
      cd ~/mmaction2
      ./tools/dist_train.sh configs/localization/bmn/bmn_acitivitynet_feature_vclr.py 2 \
        --work-dir work_dirs/vclr/bmn_activitynet --validate --seed 0 --deterministic --bmn
    • test
      python tools/test.py configs/localization/bmn/bmn_acitivitynet_feature_vclr.py \
        work_dirs/vclr/bmn_activitynet/latest.pth \
        --bmn --eval [email protected] --out result.json
  • Results

    Arch Dataset Finetuned model AUC [email protected]
    BMN ActivityNet Download link 65.5 73.8

Feature visualization

We provide our feature visualization code at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

catch-22: CAnonical Time-series CHaracteristics

catch22 - CAnonical Time-series CHaracteristics About catch22 is a collection of 22 time-series features coded in C that can be run from Python, R, Ma

Carl H Lubba 229 Oct 21, 2022
Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

605 Jan 03, 2023
A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

Pytorch ReID Strong, Small, Friendly A tiny, friendly, strong baseline code for Person-reID (based on pytorch). Strong. It is consistent with the new

Zhedong Zheng 3.5k Jan 08, 2023
💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena.

Heidelberg-NLP 17 Nov 07, 2022
Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"

GRAF This repository contains official code for the paper GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. You can find detailed usage i

349 Dec 29, 2022
Wide Residual Networks (WideResNets) in PyTorch

Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than

Jason Kuen 296 Dec 27, 2022
Official Repo of my work for SREC Nandyal Machine Learning Bootcamp

About the Bootcamp A 3-day Machine Learning Bootcamp organised by Department of Electronics and Communication Engineering, Santhiram Engineering Colle

MS 1 Nov 29, 2021
Its a Plant Leaf Disease Detection System based on Machine Learning.

My_Project_Code Its a Plant Leaf Disease Detection System based on Machine Learning. I have used Tomato Leaves Dataset from kaggle. This system detect

Sanskriti Sidola 3 Jun 15, 2022
Code for Learning to Segment The Tail (LST)

Learning to Segment the Tail [arXiv] In this repository, we release code for Learning to Segment The Tail (LST). The code is directly modified from th

47 Nov 07, 2022
keyframes-CNN-RNN(action recognition)

keyframes-CNN-RNN(action recognition) Environment: python=3.7 pytorch=1.2 Datasets: Following the format of UCF101 action recognition. Run steps: Mo

4 Feb 09, 2022
Alex Pashevich 62 Dec 24, 2022
Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

Pop-Out Motion Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022) Jihyun Lee*, Minhyuk Sung*, Hyunjin Kim, Tae-Ky

Jihyun Lee 88 Nov 22, 2022
The final project for "Applying AI to Wearable Device Data" course from "AI for Healthcare" - Udacity.

Motion Compensated Pulse Rate Estimation Overview This project has 2 main parts. Develop a Pulse Rate Algorithm on the given training data. Then Test

Omar Laham 2 Oct 25, 2022
Distributionally robust neural networks for group shifts

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization This code implements the g

151 Dec 25, 2022
Toolkit for collecting and applying prompts

PromptSource Promptsource is a toolkit for collecting and applying prompts to NLP datasets. Promptsource uses a simple templating language to programa

BigScience Workshop 998 Jan 03, 2023
E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

11 Nov 08, 2022
Fully Automatic Page Turning on Real Scores

Fully Automatic Page Turning on Real Scores This repository contains the corresponding code for our extended abstract Henkel F., Schwaiger S. and Widm

Florian Henkel 7 Jan 02, 2022
Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

MRefG Wanli Li and Tieyun Qian: "Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction", IJCNN 2021 1. Requirements To reproduc

万理 5 Jul 26, 2022
The Agriculture Domain of ERPNext comes with features to record crops and land

Agriculture The Agriculture Domain of ERPNext comes with features to record crops and land, track plant, soil, water, weather analytics, and even trac

Frappe 21 Jan 02, 2023
YOLOX-Paddle - A reproduction of YOLOX by PaddlePaddle

YOLOX-Paddle A reproduction of YOLOX by PaddlePaddle 数据集准备 下载COCO数据集,准备为如下路径 /ho

QuanHao Guo 6 Dec 18, 2022