This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Last update: Jan 09, 2023

Related tags

Deep Learning ActionCLIP

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites
Data Preparation
Uodates
Pretrained Models
- Kinetics-400
- Hmdb51 && UCF101
Testing
Training
Contributors
Citing_ActionClip
Acknowledgments

Prerequisites

The code is built with following libraries:

PyTorch >= 1.8
wandb
RandAugment
pprint
tqdm
dotmap
yaml
csv

For video data pre-processing, you may need ffmpeg.

More detail information about libraries see INSTALL.md.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Kinetics, UCF101, HMDB51, Charades.

Updates

We now support single crop validation(including zero-shot) on Kinetics-400, UCF101 and HMDB51. The pretrained models see MODEL_ZOO.md for more information.
we now support the model-training on Kinetics-400, UCF101 and HMDB51 on 8, 16 and 32 frames. The model-training configs see configs/README.md for more information.
We now support the model-training on your own datasets. The detail information see configs/README.md.

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. We provide a large set of trained models in the ActionCLIP MODEL_ZOO.md.

Kinetics-400

We experiment ActionCLIP with different backbones(we choose Transf as our final visual prompt since it obtains the best results) and input frames configurations on k400. Here is a list of pre-trained models that we provide (see Table 6 of the paper).

model	n-frame	top1 Acc(single-crop)	top5 Acc(single-crop)	checkpoint
ViT-B/32	8	78.36%	94.25%	link pwd:8hg2
ViT-B/16	8	81.09%	95.49%	link
ViT-B/16	16	81.68%	95.87%	link
ViT-B/16	32	82.32%	96.20%	link pwd:v7nn

HMDB51 && UCF101

On HMDB51 and UCF101 datasets, the accuracy(k400 pretrained) is reported under the accurate setting.

HMDB51

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	76.2%	link

UCF101

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	97.1%	link

Testing

To test the downloaded pretrained models on Kinetics or HMDB51 or UCF101, you can run scripts/run_test.sh. For example:

# test
bash scripts/run_test.sh  ./configs/k400/k400_ft_tem.yaml

Zero-shot

We provide several examples to do zero-shot validation on kinetics-400, UCF101 and HMDB51.

To do zero-shot validation on Kinetics from CLIP pretrained models, you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/k400/k400_ft_zero_shot.yaml

To do zero-shot validation on UCF101 and HMDB51 from Kinetics pretrained models, you need first prepare the k400 pretrained model and then you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/hmdb51/hmdb_ft_zero_shot.yaml

Training

We provided several examples to train ActionCLIP with this repo:

To train on Kinetics from CLIP pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/k400/k400_ft_tem_test.yaml

To train on HMDB51 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/hmdb51/hmdb_ft.yaml

To train on UCF101 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/ucf101/ucf_ft.yaml

More training details, you can find in configs/README.md

Contributors

ActionCLIP is written and maintained by Mengmeng Wang and Jiazheng Xing.

Citing ActionCLIP

If you find ActionClip useful in your research, please use the following BibTex entry for citation.

@inproceedings{wang2022ActionCLIP,
  title={ActionCLIP: A New Paradigm for Video Action Recognition},
  author={Mengmeng Wang, Jiazheng Xing and Yong Liu},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Acknowledgments

Our code is based on CLIP and STM.

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Related tags

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites

Data Preparation

Updates

Pretrained Models

Kinetics-400

HMDB51 && UCF101

HMDB51

UCF101

Testing

Zero-shot

Training

Contributors

Citing ActionCLIP

Acknowledgments

Owner

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Python code for loading the Aschaffenburg Pose Dataset.

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

Implementation of SwinTransformerV2 in TensorFlow.

Object Detection with YOLOv3

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Api's bulid in Flask perfom to manage Todo Task.

3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos

FS-Mol: A Few-Shot Learning Dataset of Molecules

JAX-based neural network library

this is a lite easy to use virtual keyboard project for anyone to use

WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

HyperLib: Deep learning in the Hyperbolic space

KoCLIP: Korean port of OpenAI CLIP, in Flax

Multi agent DDPG algorithm written in Python + Pytorch

The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)