Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Last update: Dec 23, 2022

Related tags

Deep Learning PanoAVQA

Overview

Pano-AVQA

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

[Paper] [Poster] [Video]

Getting Started

This code is based on following libraries:

python=3.8
pytorch=1.7.0 (with cuda 10.2)

To create virtual environment with all necessary libraries:

conda env create -f environment.yml

By default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:

ln -s {path_to_your_data_directory} data

We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).

Dataset

We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!

Model

Training

Default configuration is provided in code/config.py. To run with this configuration:

python cli.py

To run with custom configuration, either modify code/config.py or execute:

python cli.py with {{flags_at_your_disposal}}

Inference

Model weight is saved under ./data/log directory. To run inference only:

python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yun2021PanoAVQA,
    author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
    title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
    booktitle = {ICCV},
    year = {2021}
}

Contact

If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Related tags

Overview

Pano-AVQA

[Paper] [Poster] [Video]

Getting Started

Dataset

Model

Training

Inference

Citation

Contact

Owner

Heeseung Yun

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

Self-describing JSON-RPC services made easy

RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

Using Machine Learning to Create High-Res Fine Art

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Covid19-Forecasting - An interactive website that tracks, models and predicts COVID-19 Cases

Deep learning PyTorch library for time series forecasting, classification, and anomaly detection

Code for KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

PartImageNet is a large, high-quality dataset with part segmentation annotations

[NeurIPS 2020] Code for the paper "Balanced Meta-Softmax for Long-Tailed Visual Recognition"

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Human segmentation models, training/inference code, and trained weights, implemented in PyTorch

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

A3C LSTM Atari with Pytorch plus A3G design

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.