ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

Related tags

Deep Learningpytorch
Overview

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

This repository contains the code for our ICCV 2021 paper:

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee*, Jiwan Chung*, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song (*: equal contribution)
[paper]

@inproceedings{lee2021acav100m,
    title="{ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning}",
    author={Sangho Lee and Jiwan Chung and Youngjae Yu and Gunhee Kim and Thomas Breuel and Gal Chechik and Yale Song},
    booktitle={ICCV},
    year=2021
}

System Requirements

  • Python >= 3.8.5
  • FFMpeg 4.3.1

Installation

  1. Install PyTorch 1.6.0, torchvision 0.7.0 and torchaudio 0.6.0 for your environment. Follow the instructions in HERE.

  2. Install the other required packages.

pip install -r requirements.txt
python -m nltk.downloader 'punkt'
pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/<cuda version>/torch1.6/index.html
pip install git+https://github.com/jiwanchung/slowfast
pip install torch-scatter==2.0.5 -f https://pytorch-geometric.com/whl/torch-1.6.0+<cuda version>.html

e.g. Replace <cuda version> with cu102 for CUDA 10.2.

Input File Structure

  1. Create the data directory
mkdir data
  1. Prepare the input file.

data/metadata.tsv should be structured as follows. We provide an example input file in examples/metadata.tsv

YOUTUBE_ID\t{"LatestDAFeature": {"Title": TITLE, "Description": DESCRIPTION, "YouTubeCategory": YOUTUBE_CATEGORY, "VideoLength": VIDEO_LENGTH}, "MediaVersionList": [{"Duration": DURATION}]}

Data Curation Pipeline

One-Liner

bash ./run.sh

To enable GPU computation, modify the CUDA_VISIBLE_DEVICES environment variable accordingly. For example, run the above command as export CUDA_VISIBLE_DEVICES=2,3; bash ./run.sh.

Step-by-Step

  1. Filter the videos with metadata.
bash ./metadata_filtering/code/run.sh

The above command will build the data/filtered.tsv file.

  1. Download the actual video files from youtube.
bash ./video_download/code/run.sh

Although we provide a simple download script, we recommend more scalable solutions for downloading large-scale data.

The above command will download the files to data/videos/raw directory.

  1. Segment the videos into 10-second clips.
bash ./clip_segmentation/code/run.sh

The above command will save the segmented clips to data/videos directory.

  1. Extract features from the clips.
bash ./feature_extraction/code/run.sh

The above command will save the extracted features to data/features directory.

This step requires GPU for faster computation.

  1. Perform clustering with the extracted features.
bash ./clustering/code/run.sh

The above command will save the extracted features to data/clusters directory.

This step requires GPU for faster computation.

  1. Select subset with high audio-visual correspondence using the clustering results.
bash ./subset_selection/code/run.sh

The above command will save the selected clip indices to data/datasets directory.

This step requires GPU for faster computation.

The final output should be saved in the data/output.csv file.

Output File Structure

output.csv is structured as follows. We provide an example output file at examples/output.csv.

# SHARD_NAME,FILENAME,YOUTUBE_ID,SEGMENT
shard-000009,qpxektwhzra_292.mp4,qpxektwhzra,"[292.3329999997, 302.3329999997]"

Evaluation

Instructions on downstream evaluation are provided in Evaluation.

Correspondence Retrieval

Instructions on correspondence retrieval experiments are provided in Correspondence Retrieval.

Owner
sangho.lee
sangho.lee
PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks This repo contains the PyTorch implementation of the ACL, 2021 pa

Rabeeh Karimi Mahabadi 98 Dec 28, 2022
Official PyTorch implementation of "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation" (ICCV 21).

CenterGroup This the official implementation of our ICCV 2021 paper The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person P

Dynamic Vision and Learning Group 43 Dec 25, 2022
Tensorboard for pytorch (and chainer, mxnet, numpy, ...)

tensorboardX Write TensorBoard events with simple function call. The current release (v2.3) is tested on anaconda3, with PyTorch 1.8.1 / torchvision 0

Tzu-Wei Huang 7.5k Dec 28, 2022
Make a surveillance camera from your raspberry pi!

rpi-surveillance Make a surveillance camera from your Raspberry Pi 4! The surveillance is built as following: the camera records 10 seconds video and

Vladyslav 62 Feb 03, 2022
render sprites into your desktop environment as shaped windows using GTK

spritegtk render static or animated sprites into your desktop environment as dynamic shaped windows using GTK requires pycairo and PYGobject: pip inst

hermit 20 Oct 27, 2022
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

The Ultimate PyTorch Source-Build Template Translations: 한국어 TL;DR PyTorch built from source can be x4 faster than a naïve PyTorch install. This repos

Joonhyung Lee/이준형 651 Dec 12, 2022
Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images

Lung Segmentation (2D) Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images. Demo See the application of the

163 Sep 21, 2022
A Python Package for Convex Regression and Frontier Estimation

pyStoNED pyStoNED is a Python package that provides functions for estimating multivariate convex regression, convex quantile regression, convex expect

Sheng Dai 17 Jan 08, 2023
SFD implement with pytorch

S³FD: Single Shot Scale-invariant Face Detector A PyTorch Implementation of Single Shot Scale-invariant Face Detector Description Meanwhile train hand

Jun Li 251 Dec 22, 2022
Project NII pytorch scripts

project-NII-pytorch-scripts By Xin Wang, National Institute of Informatics, since 2021 I am a new pytorch user. If you have any suggestions or questio

Yamagishi and Echizen Laboratories, National Institute of Informatics 184 Dec 23, 2022
Non-stationary GP package written from scratch in PyTorch

NSGP-Torch Examples gpytorch model with skgpytorch # Import packages import torch from regdata import NonStat2D from gpytorch.kernels import RBFKernel

Zeel B Patel 1 Mar 06, 2022
This is an official implementation for "PlaneRecNet".

PlaneRecNet This is an official implementation for PlaneRecNet: A multi-task convolutional neural network provides instance segmentation for piece-wis

yaxu 50 Nov 17, 2022
A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Minimal Hand A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run. This project provides the

Yuxiao Zhou 824 Jan 07, 2023
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning

Mammoth - An Extendible (General) Continual Learning Framework for Pytorch NEWS STAY TUNED: We are working on an update of this repository to include

AImageLab 277 Dec 28, 2022
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

67 Dec 05, 2022
Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

Graph Convolutional Gated Recurrent Neural Network (GCGRNN) Improved from Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF

Lei Lin 21 Dec 18, 2022
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

Shichao Li 138 Dec 09, 2022
Training a deep learning model on the noisy CIFAR dataset

Training-a-deep-learning-model-on-the-noisy-CIFAR-dataset This repository contai

1 Jun 14, 2022
Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

keven 198 Dec 20, 2022