Towards Long-Form Video Understanding

Last update: Dec 26, 2022

Related tags

Deep Learning lvu

Overview

Towards Long-Form Video Understanding

Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021

[Paper] [Project Page] [Dataset]

Citation

@inproceedings{lvu2021,
  Author    = {Chao-Yuan Wu and Philipp Kr\"{a}henb\"{u}hl},
  Title     = {{Towards Long-Form Video Understanding}},
  Booktitle = {{CVPR}},
  Year      = {2021}}

Overview

This repo implements Object Transformers for long-form video understanding.

Getting Started

Please organize data/ as follows

data
|_ ava
|_ features
|_ instance_meta
|_ lvu_1.0

ava, features, and instance_meta could be found at this Google Drive folder. lvu_1.0 can be found at here.

Please also download pre-trained weights at this Google Drive folder and put them in pretrained_models/.

Pre-training

python3 -u run_pretrain.py

This pretrains on a small demo dataset data/instance_meta/instance_meta_pretrain_demo.pkl as an example. Please follow its file format if you'd like to pretrain on a larger dataset (e.g., latest full version of MovieClips).

Training and evaluating on AVA v2.2

python3 -u run_ava.py

This should achieve 31.0 mAP.

Training and evaluating on LVU tasks

python3 -u run.py [1-9]

The argument selects a task to run on. Please see run.py for details.

Acknowledgment

This implementation largely borrows from Huggingface Transformers. Please consider citing it if you use this repo.

Towards Long-Form Video Understanding

Related tags

Overview

Towards Long-Form Video Understanding

[Paper] [Project Page] [Dataset]

Citation

Overview

Getting Started

Pre-training

Training and evaluating on AVA v2.2

Training and evaluating on LVU tasks

Acknowledgment

Owner

Chao-Yuan Wu

Implementation of RegretNet with Pytorch

SAN for Product Attributes Prediction

Official code for the publication "HyFactor: Hydrogen-count labelled graph-based defactorization Autoencoder".

repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

Official code release for: EditGAN: High-Precision Semantic Image Editing

Python library for tracking human heads with FLAME (a 3D morphable head model)

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

MLPs for Vision and Langauge Modeling (Coming Soon)

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

NHL 94 AI contests

deep_image_prior_extension

CCAFNet: Crossflow and Cross-scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images

Source code for our paper "Empathetic Response Generation with State Management"

Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1).

Dist2Dec: A Simplicial Neural Network for Homology Localization

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Codes for "Template-free Prompt Tuning for Few-shot NER".