Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Related tags

Deep Learningasg2cap
Overview

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

This repository contains PyTorch implementation of our paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (CVPR 2020).

Overview of ASG2Caption Model

Prerequisites

Python 3 and PyTorch 1.3.

# clone the repository
git clone https://github.com/cshizhe/asg2cap.git
cd asg2cap
# clone caption evaluation codes
git clone https://github.com/cshizhe/eval_cap.git
export PYTHONPATH=$(pwd):${PYTHONPATH}

Training & Inference

cd controlimcap/driver

# support caption models: [node, node.role, 
# rgcn, rgcn.flow, rgcn.memory, rgcn.flow.memory]
# see our paper for details
mtype=rgcn.flow.memory 

# setup config files
# you should modify data paths in configs/prepare_*_imgsg_config.py
python configs/prepare_coco_imgsg_config.py $mtype
resdir='' # copy the output string of the previous step

# training
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_loss --is_train --num_workers 8

# inference
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_set tst --num_workers 8

Datasets

Annotations

Annotations for MSCOCO and VisualGenome datasets can be download from GoogleDrive.

  • (Image, ASG, Caption) annotations: regionfiles/image_id.json
JSON Format:
{
	"region_id": {
		"objects":[
			{
	     		"object_id": int, 
	     		"name": str, 
	     		"attributes": [str],
				"x": int,
				"y": int, 
				"w": int, 
				"h": int
			}],
  	  "relationships": [
			{
				"relationship_id": int,
				"subject_id": int,
				"object_id": int,
				"name": str
			}],
  	  "phrase": str,
  }
}
  • vocabularies int2word.npy: [word] word2int.json: {word: int}

  • data splits: public_split directory trn_names.npy, val_names.npy, tst_names.npy

Features

Features for MSCOCO and VisualGenome datasets are available at BaiduNetdisk (code: 6q32).

We also provide pretrained models and codes to extract features for new images.

format: npy array, shape=(num_fts, dim_ft) corresponding to the order in data_split names

format: hdf5 files, "image_id".jpg.hdf5

key: 'image_id'.jpg

attrs: {"image_w": int, "image_h": int, "boxes": 4d array (x1, y1, x2, y2)}

Result Visualization

Examples

Citations

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@article{chen2020say,
  title={Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs},
  author={Chen, Shizhe and Jin, Qin and Wang, Peng and Wu, Qi},
  journal={CVPR},
  year={2020}
}

License

MIT License

Owner
Shizhe Chen
Shizhe Chen
Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

Cognitive Tools Lab 38 Jan 06, 2023
For the paper entitled ''A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining''

Summary This is the source code for the paper "A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining", which was accepted as fu

1 Nov 10, 2021
Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos This repository is the official tensorflow python implementation

Yasamin Jafarian 287 Jan 06, 2023
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 542 Dec 03, 2022
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 03, 2022
Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

TL;DR: CrossVIS (Crossover Learning for Fast Online Video Instance Segmentation) proposes a novel crossover learning paradigm to fully leverage rich c

Hust Visual Learning Team 79 Nov 25, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022
Capsule endoscopy detection DACON challenge

capsule_endoscopy_detection (DACON Challenge) Overview Yolov5, Yolor, mmdetection기반의 모델을 사용 (총 11개 모델 앙상블) 모든 모델은 학습 시 Pretrained Weight을 yolov5, yolo

MAILAB 11 Nov 25, 2022
Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Vansh Wassan 15 Jun 17, 2021
the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

Wanquan Feng 205 Nov 09, 2022
Fusion-in-Decoder Distilling Knowledge from Reader to Retriever for Question Answering

This repository contains code for: Fusion-in-Decoder models Distilling Knowledge from Reader to Retriever Dependencies Python 3 PyTorch (currently tes

Meta Research 323 Dec 19, 2022
Some bravo or inspiring research works on the topic of curriculum learning.

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtu

131 Jan 07, 2023
Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Yixuan Su 61 Jan 03, 2023
Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

Li Shengyan 270 Dec 31, 2022
A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat

Hendrik Schröter 292 Dec 25, 2022
Spectralformer: Rethinking hyperspectral image classification with transformers

The code in this toolbox implements the "Spectralformer: Rethinking hyperspectral image classification with transformers". More specifically, it is detailed as follow.

Danfeng Hong 104 Jan 04, 2023
VIsually-Pivoted Audio and(N) Text

VIP-ANT: VIsually-Pivoted Audio and(N) Text Code for the paper Connecting the Dots between Audio and Text without Parallel Data through Visual Knowled

Yän.PnG 16 Nov 04, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders Are Scalable Vision Learners A TensorFlow implementation of Masked Autoencoders Are Scalable Vision Learners [1]. Our implementati

Aritra Roy Gosthipaty 59 Dec 10, 2022
SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

Training Deep Learning Models on The Edge Training on the Edge enables continuous learning from new data for deployed neural networks on memory-constr

Brown University Scale Lab 4 Nov 18, 2022