A New Approach to Overgenerating and Scoring Abstractive Summaries

Last update: Apr 03, 2022

Related tags

Overview

A New Approach to Overgenerating and Scoring Abstractive Summaries

We provide the source code for the paper "A New Approach to Overgenerating and Scoring Abstractive Summaries" accepted at NAACL'21. If you find the code useful, please cite the following paper.

@inproceedings{song2021new, 
    title={A New Approach to Overgenerating and Scoring Abstractive Summaries},
    author={Song, Kaiqiang and Wang, Bingqing and Feng, Zhe and Liu, Fei},
    booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
    pages={1392--1404},
    year={2021}
}

Presentation Video

Demo

Source Input:

The Bank of Japan appealed to financial markets to remain calm Friday following the US decision to order Daiwa Bank Ltd. to close its US operations.

Summaries with varying lengths:

Dependencies

The code is written in Python (v3.7) and Pytorch (v1.7+). We suggest the following enviorment:

A Linux machine (Ubuntu) with GPU
Python (v3.7+)
Pytorch (v1.7+)
Pyrouge
transformers (v2.3.0)

HINT: Since huggingface transformers is alternating very fast, you may need to modify a lot of stuff if you want to use a new version. Contact me if you get any trouble on it.

To install pyrouge and transformers, run the command below:

pip install pyrouge transformers==2.3.0

For generating summaries with varying length

Step 1: clone this repo. Download trained Our Model, move it to the working folder and uncompress it.

git clone https://github.com/ucfnlp/varying-length-summ.git
mv model.zip varying-length-summ
cd varying-length-summ
unzip models.zip

Step 2: Generating summaries with varying length from a raw input file.

python run.py --do_test --parallel --input data/input.txt

It will generate summaries of varying lengths coupled with its order information.

For Selecting summaries with best quality binary classifer

Step 1: Follow the previous section about generating summaries with multiple length.

Step 2: Collect test set similar to data/gigaword_cls/test500* files:

a source input file test500_input.txt
a target output file test500_output.txt
a label file test500_label.txt for whether the target summary is admissible for the source input. (all 0 if you don't have thoese labels)

HINT: one instance per line

Step 3: modify the test500 settings in settings/dataset/gigaword_cls.

Step 4: Run the code below.

python run_classifier.py --do_test --parallel

It will generate a prediction of admissible probability in predict.txt.

For Selecting summaries with length reward reranking method

Step 1: Follow the previous section about generating summaries with multiple length.

Step 2: Run the code below.

python run_rerank.py

It will re-rank the summary with length rewards. The predicted length is in length.txt

For Data Downloading (500 inputs x 7 lengths)

Please refer to this link

A New Approach to Overgenerating and Scoring Abstractive Summaries

Related tags

Overview

A New Approach to Overgenerating and Scoring Abstractive Summaries

Presentation Video

Demo

Source Input:

Summaries with varying lengths:

Dependencies

For generating summaries with varying length

For Selecting summaries with best quality binary classifer

For Selecting summaries with length reward reranking method

For Data Downloading (500 inputs x 7 lengths)

Owner

Kaiqiang Song

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

A dual benchmarking study of visual forgery and visual forensics techniques

Transparent Transformer Segmentation

Creating Multi Task Models With Keras

Enigma-Plus - Python based Enigma machine simulator with some extra features

Reinforcement learning for self-driving in a 3D simulation

Multi-Agent Reinforcement Learning (MARL) method to learn scalable control polices for multi-agent target tracking.

SPEAR: Semi suPErvised dAta progRamming

A quantum game modeling of pandemic (QHack 2022)

Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

Differentiable Simulation of Soft Multi-body Systems

Real-time Object Detection for Streaming Perception, CVPR 2022

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

Hypercomplex Neural Networks with PyTorch

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

A New Approach to Overgenerating and Scoring Abstractive Summaries

Related tags

Overview

A New Approach to Overgenerating and Scoring Abstractive Summaries

Presentation Video

Demo

Source Input:

Summaries with varying lengths:

Dependencies

For generating summaries with varying length

For Selecting summaries with best quality binary classifer

For Selecting summaries with length reward reranking method

For Data Downloading (500 inputs x 7 lengths)

Owner

Kaiqiang Song

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

A dual benchmarking study of visual forgery and visual forensics techniques

Transparent Transformer Segmentation

Creating Multi Task Models With Keras

Enigma-Plus - Python based Enigma machine simulator with some extra features

Reinforcement learning for self-driving in a 3D simulation

Multi-Agent Reinforcement Learning (MARL) method to learn scalable control polices for multi-agent target tracking.

SPEAR: Semi suPErvised dAta progRamming

A quantum game modeling of pandemic (QHack 2022)

Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

Differentiable Simulation of Soft Multi-body Systems

Real-time Object Detection for Streaming Perception, CVPR 2022

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

Hypercomplex Neural Networks with PyTorch

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队