This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Last update: Jan 09, 2023

Related tags

Deep Learning ActionCLIP

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites
Data Preparation
Uodates
Pretrained Models
- Kinetics-400
- Hmdb51 && UCF101
Testing
Training
Contributors
Citing_ActionClip
Acknowledgments

Prerequisites

The code is built with following libraries:

PyTorch >= 1.8
wandb
RandAugment
pprint
tqdm
dotmap
yaml
csv

For video data pre-processing, you may need ffmpeg.

More detail information about libraries see INSTALL.md.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Kinetics, UCF101, HMDB51, Charades.

Updates

We now support single crop validation(including zero-shot) on Kinetics-400, UCF101 and HMDB51. The pretrained models see MODEL_ZOO.md for more information.
we now support the model-training on Kinetics-400, UCF101 and HMDB51 on 8, 16 and 32 frames. The model-training configs see configs/README.md for more information.
We now support the model-training on your own datasets. The detail information see configs/README.md.

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. We provide a large set of trained models in the ActionCLIP MODEL_ZOO.md.

Kinetics-400

We experiment ActionCLIP with different backbones(we choose Transf as our final visual prompt since it obtains the best results) and input frames configurations on k400. Here is a list of pre-trained models that we provide (see Table 6 of the paper).

model	n-frame	top1 Acc(single-crop)	top5 Acc(single-crop)	checkpoint
ViT-B/32	8	78.36%	94.25%	link pwd:8hg2
ViT-B/16	8	81.09%	95.49%	link
ViT-B/16	16	81.68%	95.87%	link
ViT-B/16	32	82.32%	96.20%	link pwd:v7nn

HMDB51 && UCF101

On HMDB51 and UCF101 datasets, the accuracy(k400 pretrained) is reported under the accurate setting.

HMDB51

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	76.2%	link

UCF101

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	97.1%	link

Testing

To test the downloaded pretrained models on Kinetics or HMDB51 or UCF101, you can run scripts/run_test.sh. For example:

# test
bash scripts/run_test.sh  ./configs/k400/k400_ft_tem.yaml

Zero-shot

We provide several examples to do zero-shot validation on kinetics-400, UCF101 and HMDB51.

To do zero-shot validation on Kinetics from CLIP pretrained models, you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/k400/k400_ft_zero_shot.yaml

To do zero-shot validation on UCF101 and HMDB51 from Kinetics pretrained models, you need first prepare the k400 pretrained model and then you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/hmdb51/hmdb_ft_zero_shot.yaml

Training

We provided several examples to train ActionCLIP with this repo:

To train on Kinetics from CLIP pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/k400/k400_ft_tem_test.yaml

To train on HMDB51 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/hmdb51/hmdb_ft.yaml

To train on UCF101 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/ucf101/ucf_ft.yaml

More training details, you can find in configs/README.md

Contributors

ActionCLIP is written and maintained by Mengmeng Wang and Jiazheng Xing.

Citing ActionCLIP

If you find ActionClip useful in your research, please use the following BibTex entry for citation.

@inproceedings{wang2022ActionCLIP,
  title={ActionCLIP: A New Paradigm for Video Action Recognition},
  author={Mengmeng Wang, Jiazheng Xing and Yong Liu},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Acknowledgments

Our code is based on CLIP and STM.

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Related tags

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites

Data Preparation

Updates

Pretrained Models

Kinetics-400

HMDB51 && UCF101

HMDB51

UCF101

Testing

Zero-shot

Training

Contributors

Citing ActionCLIP

Acknowledgments

Owner

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Learning where to learn - Gradient sparsity in meta and continual learning

QueryFuzz implements a metamorphic testing approach to test Datalog engines.

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

A collection of inference modules for fastai2

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

MAterial del programa Misión TIC 2022

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

Interactive Image Segmentation via Backpropagating Refinement Scheme

Experiments and examples converting Transformers to ONNX

AI Summer's complete catalog of articles