Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Overview

Evaluating the Factual Consistency of Abstractive Text Summarization

Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher

Introduction

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sentences of source documents. The factual consistency model is then trained jointly for three tasks:

  1. identify whether sentences remain factually consistent after transformation,
  2. extract a span in the source documents to support the consistency prediction,
  3. extract a span in the summary sentence that is inconsistent if one exists. Transferring this model to summaries generated by several state-of-the art models reveals that this highly scalable approach substantially outperforms previous models, including those trained with strong supervision using standard datasets for natural language inference and fact checking. Additionally, human evaluation shows that the auxiliary span extraction tasks provide useful assistance in the process of verifying factual consistency.

Paper link: https://arxiv.org/abs/1910.12840

Table of Contents

  1. Updates
  2. Citation
  3. License
  4. Usage
  5. Get Involved

Updates

1/27/2020

Updated manually annotated data files - fixed filepaths in misaligned examples.

Updated model checkpoint files - recomputed evaluation metrics for fixed examples.

Citation

@article{kryscinskiFactCC2019,
  author    = {Wojciech Kry{\'s}ci{\'n}ski and Bryan McCann and Caiming Xiong and Richard Socher},
  title     = {Evaluating the Factual Consistency of Abstractive Text Summarization},
  journal   = {arXiv preprint arXiv:1910.12840},
  year      = {2019},
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from violence, hate, and division, environmental destruction, abuse of human rights, or the destruction of people's physical and mental health.

Usage

Code repository uses Python 3. Prior to running any scripts please make sure to install required Python packages listed in the requirements.txt file.

Example call: pip3 install -r requirements.txt

Training and Evaluation Datasets

Generated training data can be found here.

Manually annotated validation and test data can be found here.

Both generated and manually annotated datasets require pairing with the original CNN/DailyMail articles.

To recreate the datasets follow the instructions:

  1. Download CNN Stories and Daily Mail Stories from https://cs.nyu.edu/~kcho/DMQA/
  2. Create a cnndm directory and unpack downloaded files into the directory
  3. Download and unpack FactCC data (do not rename directory)
  4. Run the pair_data.py script to pair the data with original articles

Example call:

python3 data_pairing/pair_data.py <dir-with-factcc-data> <dir-with-stories>

Generating Data

Synthetic training data can be generated using code available in the data_generation directory.

The data generation script expects the source documents input as one jsonl file, where each source document is embedded in a separate json object. The json object is required to contain an id key which stores an example id (uniqness is not required), and a text field that stores the text of the source document.

Certain transformations rely on NER tagging, thus for best results use source documents with original (proper) casing.

The following claim augmentations (transformations) are available:

  • backtranslation - Paraphrasing claim via backtranslation (requires Google Translate API key; costs apply)
  • pronoun_swap - Swapping a random pronoun in the claim
  • date_swap - Swapping random date/time found in the claim with one present in the source article
  • number_swap - Swapping random number found in the claim with one present in the source article
  • entity_swap - Swapping random entity name found in the claim with one present in the source article
  • negation - Negating meaning of the claim
  • noise - Injecting noise into the claim sentence

For a detailed description of available transformations please refer to Section 3.1 in the paper.

To authenticate with the Google Cloud API follow these instructions.

Example call:

python3 data_generation/create_data.py <source-data-file> [--augmentations list-of-augmentations]

Model Code

FactCC and FactCCX models can be trained or initialized from a checkpoint using code available in the modeling directory.

Quickstart training, fine-tuning, and evaluation scripts are shared in the scripts directory. Before use make sure to update *_PATH variables with appropriate, absolute paths.

To customize training or evaluation settings please refer to the flags in the run.py file.

To utilize Weights&Biases dashboards login to the service using the following command: wandb login <API KEY>.

Trained FactCC model checkpoint can be found here.

Trained FactCCX model checkpoint can be found here.

IMPORTANT: Due to data pre-processing, the first run of training or evaluation code on a large dataset can take up to a few hours before the actual procedure starts.

Running on other data

To run pretrained FactCC or FactCCX models on your data follow the instruction:

  1. Download pre-trained model checkpoint, linked above
  2. Prepare your data in jsonl format. Each example should be a separate json object with id, text, claim keys representing example id, source document, and claim sentence accordingly. Name file as data-dev.jsonl
  3. Update corresponding *-eval.sh script

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Owner
Salesforce
A variety of vendor agnostic projects which power Salesforce
Salesforce
A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN Please follow Faster R-CNN and DAF to complete the environment confi

2 Jan 12, 2022
pq is a jq-like Pickle file viewer

pq PQ is a jq-like viewer/processing tool for pickle files. howto # pq '' file.pkl {'other': 456, 'test': 123} # pq 'table' file.pkl |other|test| | 45

3 Mar 15, 2022
TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

TransPrompt This code is implement for our EMNLP 2021's paper 《TransPrompt:Towards an Automatic Transferable Prompting Framework for Few-shot Text Cla

WangJianing 23 Dec 21, 2022
Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)

Install first pip3 install -e . Training python3 training/unsupervised_tuning.py python3 training/supervised_tuning.py python3 training/multilingual_

yanzhang_nlp 26 Jul 22, 2022
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 04, 2023
"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-

Khanh Nguyen 131 Oct 21, 2022
Consensus Learning from Heterogeneous Objectives for One-Class Collaborative Filtering

Consensus Learning from Heterogeneous Objectives for One-Class Collaborative Filtering This repository provides the source code of "Consensus Learning

SeongKu-Kang 6 Apr 29, 2022
BrainGNN - A deep learning model for data-driven discovery of functional connectivity

A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou

Usman Mahmood 3 Aug 28, 2022
Like a cowsay but without cows!

Foxsay This is a simple program that generates pictures of a cute fox with a message. It is like a cowsay but without cows! Fox girls are better! Usag

Anastasia Kim 28 Feb 20, 2022
offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

LifelongReID Offical implementation of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021 by Nan Pu, Wei Chen, Yu L

PeterPu 76 Dec 08, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and shape estimation at the university of Lincoln

PhD_3DPerception Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and s

lelouedec 2 Oct 06, 2022
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
Code for the paper "Multi-task problems are not multi-objective"

Multi-Task problems are not multi-objective This is the code for the paper "Multi-Task problems are not multi-objective" in which we show that the com

Michael Ruchte 5 Aug 19, 2022
Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Movement Primitives Movement primitives are a common group of policy representations in robotics. There are many different types and variations. This

DFKI Robotics Innovation Center 63 Jan 06, 2023
The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue. How do I cite D-REX? For now, cite

Alon Albalak 6 Mar 31, 2022
Aspect-Sentiment-Multiple-Opinion Triplet Extraction (NLPCC 2021)

The code and data for the paper "Aspect-Sentiment-Multiple-Opinion Triplet Extraction" Requirements Python 3.6.8 torch==1.2.0 pytorch-transformers==1.

慢半拍 5 Jul 02, 2022
《Fst Lerning of Temporl Action Proposl vi Dense Boundry Genertor》(AAAI 2020)

Update 2020.03.13: Release tensorflow-version and pytorch-version DBG complete code. 2019.11.12: Release tensorflow-version DBG inference code. 2019.1

Tencent 338 Dec 16, 2022
Oscar and VinVL

Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks VinVL: Revisiting Visual Representations in Vision-Language Models Updates

Microsoft 938 Dec 26, 2022
Synthetic structured data generators

Join us on What is Synthetic Data? Synthetic data is artificially generated data that is not collected from real world events. It replicates the stati

YData 850 Jan 07, 2023