UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Last update: Dec 30, 2022

Overview

UNION

Automatic Evaluation Metric described in the paper UNION: An UNreferenced MetrIc for Evaluating Open-eNded Story Generation (EMNLP 2020). Please refer to the Paper List for more information about Open-eNded Language Generation (ONLG) tasks. Hopefully the paper list will help you know more about this field.

Prerequisites
Computing Infrastructure
Quick Start
Data Instruction
Citation

Prerequisites

The code is written in TensorFlow library. To use the program the following prerequisites need to be installed.

Python 3.7.0
tensorflow-gpu 1.14.0
numpy 1.18.1
regex 2020.2.20
nltk 3.4.5

Computing Infrastructure

We train UNION based on the platform:

OS: Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-98-generic x86_64)
GPU: NVIDIA TITAN Xp

Quick Start

1. Constructing Negative Samples

Execute the following command:

cd ./Data
python3 ./get_vocab.py your_mode
python3 ./gen_train_data.py your_mode

your_mode is roc for ROCStories corpus or wp for WritingPrompts dataset. Then the summary of vocabulary and the corresponding frequency and pos-tagging will be found under ROCStories/ini_data/entitiy_vocab.txt or WritingPrompts/ini_data/entity_vocab.txt.
Negative samples and human-written stories will be constructed based on the original training set. The training set will be found under ROCStories/train_data or WritingPrompts/train_data.
Note: currently only 10 samples of the full original data and training data are provided. The full data can be downloaded from THUcloud or GoogleDrive.

2. Training of UNION

Execute the following command:

python3 ./run_union.py --data_dir your_data_dir \
    --output_dir ./model/union \
    --task_name train \
    --init_checkpoint ./model/uncased_L-12_H-768_A-12/bert_model.ckpt

your_data_dir is ./Data/ROCStories or ./Data/WritingPrompts.
The initial checkpoint of BERT can be downloaded from bert. We use the uncased base version of BERT (about 110M parameters). We train the model for 40000 steps at most. The training process will task about 1~2 days.

3. Prediction with UNION

Execute the following command:

python3 ./run_union.py --data_dir your_data_dir \
    --output_dir ./model/output \
    --task_name pred \
    --init_checkpoint your_model_name

your_data_dir is ./Data/ROCStories or ./Data/WritingPrompts. If you want to evaluate your custom texts, you only need tp change your file format into ours.
your_model_name is ./model/union_roc/union_roc or ./model/union_wp/union_wp. The fine-tuned checkpoint can be downloaded from the following link:

Dataset	Fine-tuned Model
ROCStories	THUcloud; GoogleDrive
WritingPrompts	THUcloud; GoogleDrive

The union score of the stories under your_data_dir/ant_data can be found under the output_dir ./model/output.

4. Correlation Calculation

Execute the following command:

python3 ./correlation.py your_mode

Then the correlation between the human judgements under your_data_dir/ant_data and the scores of metrics under your_data_dir/metric_output will be output. The figures under "./figure" show the score graph between metric scores and human judgments for ROCStories corpus.

Data Instruction for files under `./Data`

├── Data
   └── `negation.txt`             # manually constructed negation word vocabulary.
   └── `conceptnet_antonym.txt`   # triples with antonym relations extracted from ConceptNet.
   └── `conceptnet_entity.csv`    # entities acquired from ConceptNet.
   └── `ROCStories`
       ├── `ant_data`        # sampled stories and corresponding human annotation.
              └── `ant_data.txt`        # include only binary annotation for reasonable(1) or unreasonable(0)
              └── `ant_data_all.txt`    # include the annotation for specific error types: reasonable(0), repeated plots(1), bad coherence(2), conflicting logic(3), chaotic scenes(4), and others(5). 
              └── `reference.txt`       # human-written stories with the same leading context with annotated stories.
              └── `reference_ipt.txt`
              └── `reference_opt.txt`
       ├── `ini_data`        # original dataset for training/validation/testing.
              └── `train.txt`
              └── `dev.txt`
              └── `test.txt`
              └── `entity_vocab.txt`    # generated by `get_vocab.py`, consisting of all the entities and the corresponding tagged POS followed by the mention frequency in the dataset.
       ├── `train_data`      # negative samples and corresponding human-written stories for training, which are constructed by `gen_train_data.py`.
              └── `train_human.txt`
              └── `train_negative.txt`
              └── `dev_human.txt`
              └── `dev_negative.txt`
              └── `test_human.txt`
              └── `test_negative.txt`
       ├── `metric_output`   # the scores of different metrics, which can be used to replicate the correlation in Table 5 of the paper. 
              └── `bleu.txt`
              └── `bleurt.txt`
              └── `ppl.txt`             # the sign of the result of Perplexity needs to be changed to get the result for *minus* Perplexity.
              └── `union.txt`
              └── `union_recon.txt`     # the ablated model without the reconstruction task
              └── ...
   └── `WritingPrompts`
       ├── ...

The annotated data file ant_data.txt and ant_data_all.txt are formatted as Story ID ||| Story ||| Seven Annotated Scores.
ant_data_all.txt is only available for ROCStories corpus. ant_data_all.txt is the same with ant_data.txt for WrintingPrompts dataset.

Citation

Please kindly cite our paper if this paper and the code are helpful.

@misc{guan2020union,
    title={UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation},
    author={Jian Guan and Minlie Huang},
    year={2020},
    eprint={2009.07602},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Related tags

Overview

UNION

Contents

Prerequisites

Computing Infrastructure

Quick Start

1. Constructing Negative Samples

2. Training of UNION

3. Prediction with UNION

4. Correlation Calculation

Data Instruction for files under `./Data`

Citation

Owner

Transformer Huffman coding - Complete Huffman coding through transformer

GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

PyTorch Lightning + Hydra. A feature-rich template for rapid, scalable and reproducible ML experimentation with best practices. ⚡🔥⚡

Code release for General Greedy De-bias Learning

Temporal-Relational CrossTransformers

Official repository accompanying a CVPR 2022 paper EMOCA: Emotion Driven Monocular Face Capture And Animation. EMOCA takes a single image of a face as input and produces a 3D reconstruction. EMOCA sets the new standard on reconstructing highly emotional images in-the-wild

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

GEA - Code for Guided Evolution for Neural Architecture Search

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

Simply enable or disable your Nvidia dGPU

Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

a simple, efficient, and intuitive text editor

PyTorch implementation of Constrained Policy Optimization

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Related tags

Overview

UNION

Contents

Prerequisites

Computing Infrastructure

Quick Start

1. Constructing Negative Samples

2. Training of UNION

3. Prediction with UNION

4. Correlation Calculation

Data Instruction for files under ./Data

Citation

Owner

Transformer Huffman coding - Complete Huffman coding through transformer

GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

PyTorch Lightning + Hydra. A feature-rich template for rapid, scalable and reproducible ML experimentation with best practices. ⚡🔥⚡

Code release for General Greedy De-bias Learning

Temporal-Relational CrossTransformers

Official repository accompanying a CVPR 2022 paper EMOCA: Emotion Driven Monocular Face Capture And Animation. EMOCA takes a single image of a face as input and produces a 3D reconstruction. EMOCA sets the new standard on reconstructing highly emotional images in-the-wild

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

GEA - Code for Guided Evolution for Neural Architecture Search

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

Simply enable or disable your Nvidia dGPU

Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

a simple, efficient, and intuitive text editor

PyTorch implementation of Constrained Policy Optimization

Data Instruction for files under `./Data`