Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Last update: Aug 27, 2022

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider
coco-caption
tensorboardX

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Related tags

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

Requirements

Training AAT

Prepare data (with python2)

Training

Evaluation

Reference

Acknowledgements

Owner

Lun Huang

Robust Consistent Video Depth Estimation

Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

Train Dense Passage Retriever (DPR) with a single GPU

Mercer Gaussian Process (MGP) and Fourier Gaussian Process (FGP) Regression

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Code for "Localization with Sampling-Argmax", NeurIPS 2021

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

DANet for Tabular data classification/ regression.

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

Code for "Learning to Segment Rigid Motions from Two Frames".

Implicit Deep Adaptive Design (iDAD)

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Fast methods to work with hydro- and topography data in pure Python.

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Pytorch implementation of Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks."

Library for time-series-forecasting-as-a-service.

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Code for our NeurIPS 2021 paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains