Improving Factual Consistency of Abstractive Text Summarization

Overview

Improving Factual Consistency of Abstractive Text Summarization

We provide the code for the papers:

  1. "Entity-level Factual Consistency of Abstractive Text Summarization", EACL 2021.
    • We provide a set of new metrics to quantify the entity-level factual consistency of generated summaries. We also provide code for the two methods in our paper:
      • JAENS: joint entity and summary generation, and
      • Summary-worthy entity classification with summarization (multi-task learning)
  2. "Improving Factual Consistency of Abstractive Summarization via Question Answering", ACL-IJCNLP 2021
    • QUALS, a new automatic metric for factual consistency.
    • CONSEQ, a new contrastive learning algorithm for Seq2seq models to optimize sequence level objectives such as QUALS.

Our code is based on the fairseq library and we added support for model training on Sagemaker.

Requirements and setup

  • python==3.6: conda create -n entity_fact python=3.6
  • pytorch==1.4.0: pip install torch==1.4.0 torchvision==0.5.0
  • run pip install --editable ./
  • install file2rouge following instructions here
  • download en_core_web_lg: python -m spacy download en_core_web_lg

Entity-level Factual Consistency of Abstractive Text Summarization

Data preprocessing:

We provide three options to preprocess summarization data through the filter_level option.

  • filter_level=0: no special processing
  • filter_level=1: remove corruption text in source articles and summaries. (Undesirable texts included as a result of imperfect data collection. e.g. "Share this with Email, Facebook, Messenger". Undesirable summaries such as "Collection of all USATODAY.com coverage of People, including articles, videos, photos, and quotes.")
  • filter_level=2: entity hallucination filtering in addition to corruption text removal. A summary sentence is removed if it contains a named entity not in the source document.

XSUM:

  1. Follow the instructions here to download and extract text from HTML files and establish the xsum-extracts-from-downloads directory.
  2. Let be the directory that contains the xsum-extracts-from-downloads directory and XSum-TRAINING-DEV-TEST-SPLIT-90-5-5.json.
  3. Run python preprocess/data_prepro_clean.py --mode preprocess_xsum --input_dir --output_dir /processed-data --filter_level 0 with filter_level set to 0, 1, or 2.

CNNDM:

  1. Download and unzip the stories directories from here for both CNN and Daily Mail. Put all .story files in a directory /raw_stories .
  2. Download the url files mapping_train.txt, mapping_test.txt and mapping_valid.txt from here to .
  3. Run python preprocess/data_prepro_clean.py --mode preprocess_cnndm --input_dir --output_dir /processed-data --filter_level 0 with filter_level set to 0, 1, or 2.

NEWSROOM:

  1. Download the datasets following instructions from here.
  2. Run python preprocess/data_prepro_clean.py --mode preprocess_newsroom --input_dir --output_dir /processed-data --filter_level 0 with filter_level set to 0, 1, or 2.

Tokenize and binarize the data:

Download bpe encoder.json, vocabulary and fairseq dictionary to a directory, say ; then tokenize and binarize the data.

wget -O <bpe-dir>/encoder.json 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
wget -O <bpe-dir>/vocab.bpe 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
wget -N <bpe-dir>/dict.txt' https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'

cd preprocess
python data_prepro_clean.py --mode bpe_binarize --input_dir <processed-data-dir> --tokenizer_dir <bpe-dir>

This generates the binary input files as well as the dictionaries under /data_bin for fairseq.

JAENS: Joint Entity and Summary Generation

The idea is to train the seq2seq model to generate .

1. prepare the train/val data by generating entity augmented targets:

python preprocess/create_entity_classification_labels.py --base_dir <processed-data-dir> --type entity_augment --tokenizer_dir <bpe-dir>

Binarize the augmented targets: cd preprocess

python data_prepro_clean.py --mode binarize --input_dir 
   
    /entity_augment --tokenizer_dir 
    

    
   

Since we already binarized the source documents, we just need to create symbolic links to put all binary input files together for fairseq training:

ln -s <processed-data-dir>/data_bin/train.source-target.source.idx <processed-data-dir>/entity_augment/data_bin/train.source-target.source.idx
ln -s <processed-data-dir>/data_bin/train.source-target.source.bin <processed-data-dir>/entity_augment/data_bin/train.source-target.source.bin
ln -s <processed-data-dir>/data_bin/valid.source-target.source.bin <processed-data-dir>/entity_augment/data_bin/valid.source-target.source.bin
ln -s <processed-data-dir>/data_bin/valid.source-target.source.idx <processed-data-dir>/entity_augment/data_bin/valid.source-target.source.idx

2. Fine-tune the BART-large model on the generated data:

Run the launch scripts scripts/launch_xsum.py, scripts/launch_cnndm.py or scripts/launch_newsroom.py to fine-tune the BART-large model. Note you need to modify the following in the scripts:

  • hyperparameters.
  • train_path: location of the binary input files. e.g. /entity_augment/data_bin .
  • init_path: location of the pre-trained BART-large model checkpoint. Please rename the checkpoint to pretrained_model.pt
  • output_path: location for the model outputs.

If training locally, you need to specify ngpus - the number of GPUS in the local machine. Example command:

python scripts/launch_xsum.py --datatype ner_filtered --epoch 8 --exp_type local

If training on Sagemaker, you need to specify the docker image name (image_name) as well as execution role (role). To create Sagemaker docker container and push to ECR:

./build_and_push.sh 
   

   

To launch training job:

python scripts/launch_xsum.py --datatype ner_filtered --epoch 8 --exp_type sagemaker

3. Generate the summaries from the fine-tuned models:

preprocess/multi_gpu_generate.py is used to generate summaries.

Since the JAENS models generates the named entities before the summaries, we need to remove the named entities before evaluating the summaries. Example command:

python evaluate_hypo.py --mode remove_ent_from_hypo --base_dir 
   
     --sub_dir 
    
      --split val --pattern .*.hypo

    
   

4. To evaluate the generated summaries for ROUGE as well as entity level factual scores:

We use the tokenizer from Stanford CoreNLP package. Example command:

export CLASSPATH=path/to/stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar
python evaluate_hypo.py --mode evaluate_summary --base_dir <output-dir> --sub_dir <output-sub-dir> --split val --pattern .*.hypo

See preprocess/run_*_eval.sh for examples.

Summary-worthy entity classification with summarization (multi-task learning)

We perform summary-worthy entity classification at a classification head on the encoder while keeping the seq2seq objective at the decoder. For training, we need to preprocess the input document by create B-I-O labels to identify summary-worthy entities:

python create_entity_classification_labels.py --base_dir 
   
     --type cls_labels --tokenizer_dir 
    
     
python data_prepro_clean.py --mode binarize_cls_labels --input_dir 
     
       --output_dir 
      
       /data_bin --tokenizer_dir 
        
       
      
     
    
   

Launch training jobs using scripts scripts/launch_multitask_*.py.

Improving Factual Consistency of Abstractive Summarization via Question Answering

QUALS (QUestion Answering with Language model score for Summarization)

To evaluate the QUALS of summaries (e.g. test.target) given original input (e.g. test.source), we execute the following steps in the preprocess sub-directory.

0. Prepare summaries into jsonl format

python evaluate_hypo.py --mode convert_hypo_to_json --base_dir 
   
     --sub_dir 
    
      --split test --pattern .target

    
   

1. Generating question and answer pairs from summaries

python sm_inference_asum.py --task gen_qa --base_dir 
   
     --source_dir 
    
      --output_dir 
     
       --num_workers 
      
        --bsz 5 --beam 60 --max_len 60 --min_len 8 --checkpoint_dir 
       
         --ckp_file checkpoint2.pt --bin_dir 
        
         /data_bin --diverse_beam_groups 60 --diverse_beam_strength 0.5 --batch_lines True --input_file test.target.hypo --return_token_scores True 
        
       
      
     
    
   

Here, we use diverse beam search to generate 60 question-answer pairs for each summary. The batch_lines option is set to True to batch bsz input summaries together for efficient generation. The QAGen model is trained by fine-tuning BART on the SQuAD and NewsQA datasets by concatenating the question-answer pairs using a separator.

To train the QAGen model, place the dev-v1.1.json and train-v1.1.json of SQuAD and the combined-newsqa-data-v1.json of the NewsQA under . The following command generates the binarized input for fine-tuning BART using Fairseq.

python data_prepro_clean.py --mode newsqa_squad_prepro --input_dir 
   
     --output_dir 
    

    
   

You can also download our trained QAGen model from s3 by running:

aws s3 cp s3://fact-check-summarization/newsqa-squad-qagen-checkpoint/checkpoint2.pt 
   
    /

   

Alternatively, you can download here if you don't have awscli.

2. Filter the generated question and answer for high quality pairs

python evaluate_hypo.py --mode filter_qas_dataset_lm_score --base_dir 
   
     --sub_dir 
    
      --pattern test.target.hypo.beam60.qas

    
   

3. Evaluate the generated question and answer pairs using the source document as input

python sm_inference_asum.py --task qa_eval --base_dir 
   
     --output_dir 
    
      --num_workers 
     
       --bsz 30 --checkpoint_dir 
      
        --ckp_file checkpoint2.pt --bin_dir 
       
        /data_bin --qas_dir 
        
          --source_file test.source --target_file test.target --input_file test.target.qas_filtered --prepend_target False 
        
       
      
     
    
   

4. Compute QUALS scores for each summary

python evaluate_hypo.py --mode compute_hypos_lm_score --base_dir 
   
     --sub_dir 
    
      --pattern test.*.source_eval_noprepend

    
   

CONSEQ (CONtrastive SEQ2seq learning)

To use QUALS to improve factual consistency of the summarization model using the CONSEQ algorithm, we follow the steps:

  1. Obtain the MLE summarization baseline by fine-tuning the BART model. Note that in the ACL paper, we used the corruption-filtered CNNDM and XSUM datasets (filter_level=1).
  2. Use the MLE summarization model to sample summaries on the training data.
  3. Evaluate the QUALS for the generated summaries as well as the ground truth summaries of the training data.
  4. Form the positive and negative sets for contrastive learning.
  5. Fine-tune the MLE summarization model using the positive and negative examples. Example scripts for launching training jobs locally or on Sagemaker are preprocess/run_generate_unlikelihood_train_cnndm.sh and preprocess/run_generate_unlikelihood_train_xsum.sh.

We provide an example script preprocess/run_generate_unlikelihood_train_xsum.sh to illustrate steps 2-4. Note that

  • To avoid running for a long time and encountering OOM errors and then restarting the whole process, we split the input files into smaller ones. We do this by splitting the source file by line (e.g. each sub-file has 10000 lines):
split -l 10000 train.source train.source.split   
  • The script has to make repeated calls of python sm_inference_asum.py --task gen_qa to generate question-ansewr pairs, as many times as there are sub-files as a result of line splits. The python function automatically checks which sub-files have been processed (based on output files) so it always processes the next available sub-file. If all sub-files have been processed, it will simply do nothing so it's safe if it's called more times than there are available sub-files.
  • Similarly, sm_inference_asum.py --task qa_eval needs to be repeated called to cover all sub-files.
  • The speed of question-answer pairs generation depends on the batch size setting. Depending on the summary file and the batch_lines setting, batching is handled differently. If the summary file contains only a single summary per input document, batch_lines should be set to True and bsz number of input lines are batched together as input to the QAGen model. If the summary file contains multiple summaries per input document, batch_lines should be set to False and the batching is done using bsz within each input example (line). For example, if there are 6 summaries per line in the summary file, we should set batch_lines to False; setting bsz to 7 will batch all the 7 summaries in a line together, which gives the best speed. (setting it higher won't improve speed since we do not do batching over different lines of input as batch_lines is False). On CNN-DM, bsz of 7 would sometimes result in OOM errors with 16G GPU memory so I use 3 or 4; on XSUM, it is safe to use 7.
  • num_workers should be the number of GPUs available on the machine. The lines in each input files will be distributed per GPU.
  • Finally, run the following to concatenate the QUALS scores from the sub-files: cat *.quals > train.source.source_eval_noprepend.quals

Citations

@inproceedings{nan-etal-2021-entity,
    title = "Entity-level Factual Consistency of Abstractive Text Summarization",
    author = "Nan, Feng  and
      Nallapati, Ramesh  and
      Wang, Zhiguo  and
      Nogueira dos Santos, Cicero  and
      Zhu, Henghui  and
      Zhang, Dejiao  and
      McKeown, Kathleen  and
      Xiang, Bing",
    booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.eacl-main.235",
    pages = "2727--2733",
}

@inproceedings{nan-etal-2021-improving,
    title = {Improving Factual Consistency of Abstractive Summarization via Question Answering},
    author = "Nan, Feng  and
      Nogueira dos Santos, Cicero  and
      Zhu, Henghui  and
      Ng, Patrick  and
      McKeown, Kathleen  and
      Nallapati, Ramesh  and
      Zhang, Dejiao  and
      Wang, Zhiguo  and
      Arnold, Andrew  and
      Xiang, Bing",
    booktitle = {Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP)},
    address = {Virtual},
    month = {August},
    url = {https://arxiv.org/abs/2105.04623},
    year = {2021}
}
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Jan 06, 2023
An experiment to bait a generalized frontrunning MEV bot

Honeypot 🍯 A simple experiment that: Creates a honeypot contract Baits a generalized fronturnning bot with a unique transaction Analyze bot behaviour

0x1355 14 Nov 24, 2022
PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

TuiGAN-PyTorch Official PyTorch Implementation of "TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images" (ECCV 2020 Spotligh

181 Dec 09, 2022
Revisiting Temporal Alignment for Video Restoration

Revisiting Temporal Alignment for Video Restoration [arXiv] Kun Zhou, Wenbo Li, Liying Lu, Xiaoguang Han, Jiangbo Lu We provide our results at Google

52 Dec 25, 2022
Competitive Programming Club, Clinify's Official repository for CP problems hosting by club members.

Clinify-CPC_Programs This repository holds the record of the competitive programming club where the competitive coding aspirants are thriving hard and

Clinify Open Sauce 4 Aug 22, 2022
Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Filtration Curves for Graph Representation This repository provides the code from the KDD'21 paper Filtration Curves for Graph Representation. Depende

Machine Learning and Computational Biology Lab 16 Oct 16, 2022
DC540 hacking challenge 0x00005a.

dc540-0x00005a DC540 hacking challenge 0x00005a. PROMOTIONAL VIDEO - WATCH NOW HERE ON YOUTUBE CRITICAL PART 5A VIDEO - WATCH NOW HERE ON YOUTUBE Prio

Kevin Thomas 3 May 09, 2022
Pytorch code for semantic segmentation using ERFNet

ERFNet (PyTorch version) This code is a toolbox that uses PyTorch for training and evaluating the ERFNet architecture for semantic segmentation. For t

Edu 394 Jan 01, 2023
"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

FGVC8 Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Catego

Marcos V. Conde 19 Dec 06, 2022
Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

DeepStock Technical experimentations to beat the stock market using deep learning. Experimentations Deep Learning Stock Prediction with Daily News Hea

Keon 449 Dec 29, 2022
Exploring the Dual-task Correlation for Pose Guided Person Image Generation

Dual-task Pose Transformer Network The source code for our paper "Exploring Dual-task Correlation for Pose Guided Person Image Generation“ (CVPR2022)

63 Dec 15, 2022
robomimic: A Modular Framework for Robot Learning from Demonstration

robomimic [Homepage]   [Documentation]   [Study Paper]   [Study Website]   [ARISE Initiative] Latest Updates [08/09/2021] v0.1.0: Initial code and pap

ARISE Initiative 178 Jan 05, 2023
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022
Implementation of Kronecker Attention in Pytorch

Kronecker Attention Pytorch Implementation of Kronecker Attention in Pytorch. Results look less than stellar, but if someone found some context where

Phil Wang 16 May 06, 2022
Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

Arun Mallya 165 Nov 22, 2022
(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds by Mutian Xu*, Runyu Ding*, Hengshuang Zhao, and Xiaojuan Qi. Int

CVMI Lab 228 Dec 25, 2022
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

SMSR Reposity for "Exploring Sparsity in Image Super-Resolution for Efficient Inference" [arXiv] Highlights Locate and skip redundant computation in S

Longguang Wang 225 Dec 26, 2022
An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

Tian Xie 94 Dec 10, 2022
Referring Video Object Segmentation

Awesome-Referring-Video-Object-Segmentation Welcome to starts ⭐ & comments 💹 & sharing 😀 !! - 2021.12.12: Recent papers (from 2021) - welcome to ad

Explorer 57 Dec 11, 2022