An unreferenced image captioning metric (ACL-21)

Last update: Nov 20, 2022

Related tags

Overview

UMIC

This repository provides an unferenced image captioning metric from our ACL 2021 paper UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning.
Here, we provide the code to compute UMIC.

Usage (Updating the Descriptions)

Our code is based on UNITER. Therefore, please follow the install guideline for using Docker to load UNITER. In the next few weeks, we try to release the version without using the docker.

1. Install Prerequisites

We used the Docker image provided by the official repo of UNITER. Using the guideline in the repo, please install the docker.

2. Download the Visual Features

For image captioning task, COCO dataset is widely used. To download the visual features for coco captions, just download the image features for coco validation splits using the following command.

wget https://acvrpublicycchen.blob.core.windows.net/uniter/img_db/coco_val2014.tar

Please refer to the offical repo of UNITER for downloading other visual features.

3. Pre-processing the Textual Features (Captions)

The format of textual feature file(python dictionary, json format) is as follows:
'cands' : [list of candidate captions]
'img_fs' : [list of image file names]

4. Running the Script

Launching Docker

source launch_activate.sh $PATH_TO_STORAGE

Compute Score

python compute_score.py --data_type capeval1k \
                              --ckpt /storage/umic.pt \
                              --img_type \ coco_val2014 \

Reference

If you find this repo useful, please consider citing:

@inproceedings{lee-etal-2021-umic,
    title = "{UMIC}: An Unreferenced Metric for Image Captioning via Contrastive Learning",
    author = "Lee, Hwanhee  and
      Yoon, Seunghyun  and
      Dernoncourt, Franck  and
      Bui, Trung  and
      Jung, Kyomin",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-short.29",
    doi = "10.18653/v1/2021.acl-short.29",
    pages = "220--226",
}

An unreferenced image captioning metric (ACL-21)

Related tags

Overview

UMIC

Usage (Updating the Descriptions)

1. Install Prerequisites

2. Download the Visual Features

3. Pre-processing the Textual Features (Captions)

4. Running the Script

Reference

Owner

hwanheelee

Tensorflow implementation of Swin Transformer model.

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Neural style in TensorFlow! 🎨

Unofficial implementation of Google "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization" in PyTorch

An index of algorithms for learning causality with data

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Code for the paper: Adversarial Machine Learning: Bayesian Perspectives

Fast SHAP value computation for interpreting tree-based models

Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch]

A Simulation Environment to train Robots in Large Realistic Interactive Scenes

MTA:SA Server Configer.

PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

TransReID: Transformer-based Object Re-Identification

Code for Understanding Pooling in Graph Neural Networks

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Semantic Bottleneck Scene Generation