Diverse Object-Scene Compositions For Zero-Shot Action Recognition

This repository contains the source code for the use of object-scene compositions for zero-shot action recognition.

This repository includes:

object and scene predictions for UCF-101, UCF-Sports, J-HMDB
script to retrieve object and scene predictions for Kinetics
scripts to obtain word and sentence embeddings for all datasets used and for object-scene compositions
script to obtain action predictions from any given action dataset, given the object and scene predictions and the respective action labels

Software used

python 3.8.8
pytorch 1.7.1
numpy 1.19.2
fasttext 0.9.2
sentence-transformers 1.2.0
scikit-learn 0.24.1

Downloading the object and scene predictions for Kinetics

While the action labels and video annotations for Kinetics are already present in the repo, the object and scene predictions need to be retrieved using:

bash kineticsdownload.sh

Obtaining word and sentence embeddings for all datasets

To compute the word and sentence embeddings for all the video and image datasets run:

python getfasttextembs.py; python getbertembs.py

This will additionally compute the embeddings for all object-scene compositions and the similarities between all action labels and objects-scene compositions.

Using the main script

The main script can be run using the default arguments as follows: To compute the word and sentence embeddings for all the video and image datasets run:

python zero-shot-actions.py

There are several flags that can be used. Descriptions for these can be shown by running:

python zero-shot-actions.py --help

Lastly, a helper function to compute results for different datasets and for different flag values is available:

python make_results.py

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Related tags

Overview

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Software used

Downloading the object and scene predictions for Kinetics

Obtaining word and sentence embeddings for all datasets

Using the main script

Owner

An implementation of Deep Graph Infomax (DGI) in PyTorch

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

AnimationKit: AI Upscaling & Interpolation using Real-ESRGAN+RIFE

K-Means Clustering and Hierarchical Clustering Unsupervised Learning Solution in Python3.

This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

LERP : Label-dependent and event-guided interpretable disease risk prediction using EHRs

A repository for generating stylized talking 3D and 3D face

Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

Analysis of rationale selection in neural rationale models

Accuracy Aligned. Concise Implementation of Swin Transformer

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

GluonMM is a library of transformer models for computer vision and multi-modality research

A solution to ensure Crowd Management with Contactless and Safe systems.

Image Segmentation Evaluation

TimeSHAP explains Recurrent Neural Network predictions.

A short and easy PyTorch implementation of E(n) Equivariant Graph Neural Networks

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning