🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Overview

SGLKT-VisDial

Pytorch Implementation for the paper:

Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer
Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang*, and Jin-Hwa Kim* (* corresponding authors)
In EMNLP 2021 Findings

Setup and Dependencies

This code is implemented using PyTorch v1.0+, and provides out of the box support with CUDA 9+ and CuDNN 7+. Anaconda/Miniconda is the recommended to set up this codebase:

  1. Install Anaconda or Miniconda distribution based on Python3+ from their downloads' site.
  2. Clone this repository and create an environment:
git clone https://www.github.com/gicheonkang/sglkt-visdial
conda create -n visdial-ch python=3.6

# activate the environment and install all dependencies
conda activate sglkt
cd sglkt-visdial/
pip install -r requirements.txt

# install this codebase as a package in development version
python setup.py develop

Download Data

  1. We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under $PROJECT_ROOT/data/{SPLIT_NAME}_feature directory. We need image_id to RCNN bounding box index file ({SPLIT_NAME}_imgid2idx.pkl) because the number of bounding box per image is not fixed (ranging from 10 to 100).
  • train_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of train split (32GB).
  • val_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of validation split (0.5GB).
  • test_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of test split (2GB).
  1. Download the pre-trained, pre-processed word vectors from here (glove840b_init_300d.npy), and keep them under $PROJECT_ROOT/data/ directory. You can manually extract the vectors by executing data/init_glove.py.

  2. Download visual dialog dataset from here (visdial_1.0_train.json, visdial_1.0_val.json, visdial_1.0_test.json, and visdial_1.0_val_dense_annotations.json) under $PROJECT_ROOT/data/ directory.

  3. Download the additional data for Sparse Graph Learning and Knowledge Transfer under $PROJECT_ROOT/data/ directory.

Training

Train the model provided in this repository as:

python train.py --gpu-ids 0 1 # provide more ids for multi-GPU execution other args...

Saving model checkpoints

This script will save model checkpoints at every epoch as per path specified by --save-dirpath. Default path is $PROJECT_ROOT/checkpoints.

Evaluation

Evaluation of a trained model checkpoint can be done as follows:

python evaluate.py --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0 1

Validation scores can be checked in offline setting. But if you want to check the test split score, you have to submit a json file to EvalAI online evaluation server. You can make json format with --save_ranks True option.

Pre-trained model & Results

We provide the pre-trained models for SGL+KT and SGL.
To reproduce the results reported in the paper, please run the command below.

python evaluate.py --load-pthpath SGL+KT.pth --split test --gpu-ids 0 1 --save-ranks True

Performance on v1.0 test-std (trained on v1.0 train):

Model Overall NDCG MRR [email protected] [email protected] [email protected] Mean
SGL+KT 65.31 72.60 58.01 46.20 71.01 83.20 5.85

Citation

If you use this code in your published research, please consider citing:

@article{kang2021reasoning,
  title={Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer},
  author={Kang, Gi-Cheon and Park, Junseok and Lee, Hwaran and Zhang, Byoung-Tak and Kim, Jin-Hwa},
  journal={arXiv preprint arXiv:2004.06698},
  year={2021}
}

License

MIT License

Acknowledgements

We use Visual Dialog Challenge Starter Code and MCAN-VQA as reference code.

Owner
Gi-Cheon Kang
Grounded language learning, visual dialog
Gi-Cheon Kang
Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

VoCapXLM Code for EMNLP2021 paper Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training Environment DockerFile: dancingso

Bo Zheng 15 Jul 28, 2022
A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization A complete speech segmentation system using Kaldi and x-vectors for voice activit

BBC 16 Oct 27, 2022
LowRankModels.jl is a julia package for modeling and fitting generalized low rank models.

LowRankModels.jl LowRankModels.jl is a Julia package for modeling and fitting generalized low rank models (GLRMs). GLRMs model a data array by a low r

Madeleine Udell 183 Dec 17, 2022
HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

HiPAL Code for KDD'22 Applied Data Science Track submission -- HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electro

Hanyang Liu 4 Aug 08, 2022
Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

Sergei Belousov 25 Jul 19, 2022
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 02, 2023
Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

meshtalk This repository contains code to run MeshTalk for face animation from audio. If you use MeshTalk, please cite @inproceedings{richard2021mesht

Meta Research 221 Jan 06, 2023
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Overview BisQue is a web-based platform specifically designed to provide researchers with organizational and quantitative analysis tools for up to 5D

Vision Research Lab @ UCSB 26 Nov 29, 2022
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

DeepSound 261 Dec 14, 2022
(ICCV 2021) ProHMR - Probabilistic Modeling for Human Mesh Recovery

ProHMR - Probabilistic Modeling for Human Mesh Recovery Code repository for the paper: Probabilistic Modeling for Human Mesh Recovery Nikos Kolotouros

Nikos Kolotouros 209 Dec 13, 2022
(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Dressing in Order (DiOr) 👚 [Paper] 👖 [Webpage] 👗 [Running this code] The official implementation of "Dressing in Order: Recurrent Person Image Gene

Aiyu Cui 277 Dec 28, 2022
Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Monopol Indonesian Car License Plate (Indonesia Mobil Nomor Polisi) Character Recognition using Tensorflow, Keras and OpenCV. Background This applicat

Jayaku Briliantio 3 Apr 07, 2022
Learning to Prompt for Vision-Language Models.

CoOp Paper: Learning to Prompt for Vision-Language Models Authors: Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu CoOp (Context Optimization)

Kaiyang 679 Jan 04, 2023
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
Adjust Decision Boundary for Class Imbalanced Learning

Adjusting Decision Boundary for Class Imbalanced Learning This repository is the official PyTorch implementation of WVN-RS, introduced in Adjusting De

Peyton Byungju Kim 16 Jan 04, 2023
OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021)

OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021) Video demo We here provide a video demo from co

20 Nov 25, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 05, 2023
Source code of article "Towards Toxic and Narcotic Medication Detection with Rotated Object Detector"

Towards Toxic and Narcotic Medication Detection with Rotated Object Detector Introduction This is the source code of article: Towards Toxic and Narcot

Woody. Wang 3 Oct 29, 2022