Findings of ACL 2021

Last update: Feb 24, 2022

Overview

Assessing Dialogue Systems with Distribution Distances

We propose to measure the performance of a dialogue system by computing the distributionwise distance between its generated conversations and real-world conversations.

To appear in Findings of ACL 2021.

Note that this is not an officially supported Tencent product.

1. Configuratin

This repository requires the packages:

pytorch
huggingface/transformers.

2. Usage

To evaluate the system-level human correlations of metrics:

python eval_metric.py \
  --data_path ./datasets/convai2_annotation.json \
  --metric fbd \
  --sample_num 10 \
  --model_type roberta-base \
  --batch_size 32

Currently, our repo supports the common metrics used in text generation field, inclduing bleu, meteor, rouge, greedy, average, extrema, bert_score, fbd and prd.

Here are some details of the six corpura compared in the main paper:

File Name	Dataset Name	Num. of Samples	Reference
`personam_annotation.json`	Persona(M)	60	Shikib/usr
`dailyh_annotation.json`	Daily(H)	150	li3cmz/GRADE
`convai2_annotation.json`	Convai2	150	li3cmz/GRADE
`empathetic_annotation.json`	Empathetic	150	li3cmz/GRADE
`dailyz_annotation.json`	Daily(Z)	100	ZHAOTING/dialog-processing
`personaz_annotation.json`	Persona(Z)	150	ZHAOTING/dialog-processing

Citation

If you use this research/codebase/dataset, please cite our paper:

@article{xiang2021assessing,
  title={Assessing Dialogue Systems with Distribution Distances},
  author={Xiang, Jiannan and Liu, Yahui and Cai, Deng and Li, Huayang and Lian, Defu and Liu, Lemao},
  journal={arXiv preprint arXiv:2105.02573},
  year={2021}
}

Other related papers:

[1] FID, GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, NIPS 2017
[2] PRD, Assessing Generative Models via Precision and Recall, NIPS 2018
[3] BERTScore, BERTScore: Evaluating Text Generation with BERT, ICLR 2020

Findings of ACL 2021

Related tags

Overview

Assessing Dialogue Systems with Distribution Distances

1. Configuratin

2. Usage

Citation

Owner

Yahui Liu

The tool to make NLP datasets ready to use

Translation to python of Chris Sims' optimization function

Share constant definitions between programming languages and make your constants constant again

Simple Text-To-Speech Bot For Discord

LeBenchmark: a reproducible framework for assessing SSL from speech

Training code for Korean multi-class sentiment analysis

Various Algorithms for Short Text Mining

This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Higher quality textures for the Metal Gear Solid series.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Textpipe: clean and extract metadata from text

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.