Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Overview

Unsupervised-Multi-hop-QA

This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NAACL 2021).

  • We propose MQA-QG, an unsupervised question answering framework that can generate human-like multi-hop training pairs from both homogeneous and heterogeneous data sources.

  • We find that we can train a competent multi-hop QA model with only generated data. The F1 gap between the unsupervised and fully-supervised models is less than 20 in both the HotpotQA and the HybridQA dataset.

  • Pretraining a multi-hop QA model with our generated data would greatly reduce the demand for human-annotated training data for multi-hop QA.

Introduction

The model first defines a set of basic operators to retrieve / generate relevant information from each input source or to aggregate different information, as follows.

Afterwards, we define six Reasoning Graphs. Each corresponds to one type of multihop question and is formulated as a computation graph built upon the operators. We generate multihop question-answer pairs by executing the reasoning graph.

Requirements

  • Python 3.7.3
  • torch 1.7.1
  • tqdm 4.49.0
  • transformers 4.3.3
  • stanza 1.1.1
  • nltk 3.5
  • dateparser 1.0.0
  • scikit-learn 0.23.2
  • fuzzywuzzy 0.18.0

Data Preparation

Make the following data directories:

mkdir -p ./Data
mkdir -p ./Data/HotpotQA
mkdir -p ./Data/HybridQA

a) HotpotQA

First, download the raw dataset of hotpotQA.

HOTPOT_HOME=./Data/HotpotQA
mkdir -p $HOTPOT_HOME/raw
mkdir -p $HOTPOT_HOME/dataset
cd $HOTPOT_HOME/raw
wget http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_train_v1.1.json
wget http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

Then, run the following code to preprocess the raw dataset.

python prep_data_hotpotQA.py \
  --train_dir $HOTPOT_HOME/raw/hotpot_train_v1.1.json \
  --dev_dir $HOTPOT_HOME/raw/hotpot_dev_distractor_v1.json \
  --output_dir $HOTPOT_HOME/dataset/

You would be able to get the following files in ./Data/HotpotQA/dataset/

train.src.json
train.qa.json
dev.src.json
dev.qa.json

b) HybridQA

Download all the tables and passages of HybridQA into your data folder.

HYBRID_HOME=./Data/HybridQA
cd HYBRID_HOME
git clone https://github.com/wenhuchen/WikiTables-WithLinks

The human annotated questions can be found here. Download train.json, dev.json, and dev_reference.json. Rename train.json as train.human.json; rename dev.json as dev.human.json, and put them into ./Data/HybridQA folder.

Operators

Here are the codes that test our key operators: QGwithAns and DescribeEnt.

a) QGwithAns

QGwithAns generate a single-hop question Q with answer A from the input text D. We implement this module based on the pretrained QG model from patil-suraj, a Google T5 model finetuned on the SQuAD 1.1 dataset.

You could test this module by running the following python codes:

from MQA_QG.Operators import T5_QG

test_passage = '''Jenson Alexander Lyons Button (born 19 January 1980) is a British racing driver and former Formula One driver. He won the 2009 Formula One World Championship, driving for Brawn GP.'''

nlp = T5_QG.pipeline("question-generation", model='valhalla/t5-base-qg-hl', qg_format="highlight")

print(nlp.qg_without_answer(test_passage))
print(nlp.qg_with_answer_text(test_passage, "19 January 1980"))

b) DescribeEnt

DescribeEnt generate a sentence S that describes the given entity E based on the information of the table T. We implement this using the GPT-TabGen model (Chen et al., 2020a). The model first uses template to flatten the table T into a document PT and then feed PT to the pre-trained GPT-2 model to generate the output sentence S. The framework is as follows.

We finetune the GPT2 model on the ToTTo dataset (Parikh et al., 2020), a large-scale dataset of controlled table-to-text generation. Our fine-tuned model can be downloaded here. After downloading the finetuned model, put it under the Pretrained_Models directory. Then you could test this module by running the following python codes:

from MQA_QG.Operators.Table_to_Text import get_GPT2_Predictor

predictor = get_GPT2_Predictor('./Pretrained_Models/table2text_GPT2_medium_ep9.pt', num_samples = 3)
flattened_table = '''The table title is Netherlands at the European Track Championships . The Medal is Bronze . The Championship is 2011 Apeldoorn . The Name is Kirsten Wild . The Event is Women's omnium . Start describing Kirsten Wild : '''
results = predictor.predict_output(flattened_table)
print(results)

Multi-hop Question Generation

After data preparation and testing operators, you could generate different types of multi-hop questions from (table, passage) in HybridQA or passages in HotpotQA. You simply need to configure your experimental setting in MQA_QG/config.py, as follows:

###### Global Settings
EXPERIMENT = 'HybridQA' # The experiment you want to run, choose 'HotpotQA' or 'HybridQA'
QG_DEVICE = 5  # gpu device to run the QG module
BERT_DEVICE = 3 # gpu device to run the BERT module
TABLE2TEXT_DEVICE = 3 # gpu devide to run the Table2Text module
QUESTION_TYPE = 'table2text' # the type of question you want to generate
# for hybridQA, the options are: 'table2text', 'text2table', 'text_only', 'table_only'
# for hotpotQA, the options are: 'text2text', 'comparison'
QUESTION_NUM = 3 # the number of questions to generate for each input

###### User-specified data directory
DATA_PATH = '../Data/HybridQA/WikiTables-WithLinks/' # root data directory, '../Data/HybridQA/WikiTables-WithLinks/' for HybridQA; '../Data/HotpotQA/dataset/train.src.txt' for HotpotQA
OUTPUT_PATH = '../Outputs/train_table_to_text.json' # the json file to store the generated questions
DATA_RANGE = [0, 20] # for debug use: the range of the dataset you considered (use [0, -1] to use the full dataset)
Table2Text_Model_Path = '../Pretrained_Models/table2text_GPT2_medium_ep9.pt' # the path to the pretrained Table2Text model

Key parameters:

  • EXPERIMENT: the dataset you want to generate questions from, choose 'HotpotQA' or 'HybridQA'.
  • QG_DEVICE, BERT_DEVICE, TABLE2TEXT_DEVICE: the gpu device to run the QG module, BERT module, and Table2Text module.
  • QUESTION_TYPE: the type of question you want to generate. There are 6 different types of questions you can generate. For hybridQA, the options are: 'table2text', 'text2table', 'text_only', 'table_only'. For hotpotQA, the options are: 'text2text', 'comparison'.
  • QUESTION_NUM: the number of questions to generate for each input.
  • DATA_PATH: root data directory, the defaults are: '../Data/HybridQA/WikiTables-WithLinks/' for HybridQA; '../Data/HotpotQA/dataset/train.src.txt' for HotpotQA.
  • OUTPUT_PATH: the json file to store the generated questions
  • Table2Text_Model_Path: the path to the pretrained Table2Text model.

After configuration, run the following python code to generate multi-hop questions.

cd MQA-QG
python run_multihop_generation

A sample of generated (question, answer) pair for HybridQA is:

{
  "table_id": "\"Weird_Al\"_Yankovic_0",
  "question": "In what film did the Dollmaker play the role of Batman?",
  "answer-text": "Batman vs. Robin",
  "answer-node": [
    [
      "Batman vs. Robin",
      [
        12,
        1
      ],
      "/wiki/Batman_vs._Robin",
      "table"
    ]
  ],
  "question_id": "6",
  "where": "table",
  "question_postag": "IN WDT NN VBD DT NN VB DT NN IN NNP ."
}

A sample of generated (question, answer) pair for HotpotQA is:

{
  "passage_id": "5a70f0c05542994082a3e404",
  "ques_ans": [
    {
      "question": "When did the name that is the nickname of Baz Ashmawy begin filming Culture Clash?",
      "answer": "September 2008"
    },
    {
      "question": "How did the book that is the nickname of Baz Ashmawy travel to film Culture Clash?",
      "answer": "travelled the world"
    },
    {
      "question": "What is the common name of the song that is the name of Bazil Ashmawy 's first television show?",
      "answer": "Baz Ashmawy"
    }
  ]
}

(Optional) You could then rank the generated questions by the PPL under the pretrained GPT-medium model, by running the following codes:

python run_ppl_ranking.py \
  --input_dir ../Outputs/train_text_to_table.json \
  --output_dir ../Outputs/PPL_rank_train_text_to_table.json

Unsupervised Multi-hop QA

a) HotpotQA

We use the SpanBERT (Joshi et al., 2020) as the QA model for HotpotQA.

Data Preparation

First, in the project root directory, run the following scripts to prepare the data.

# Prepare the human-labeled training set
python Multihop_QA/HotpotQA/prepare_qa_data.py \
  --src_path ./Data/HotpotQA/dataset/train.src.json \
  --qa_path ./Data/HotpotQA/dataset/train.qa.json \
  --output_path ./Multihop_QA/HotpotQA/data/train.human.json

# Prepare the human-labeled dev set
python Multihop_QA/HotpotQA/prepare_qa_data.py \
  --src_path ./Data/HotpotQA/dataset/dev.src.json \
  --qa_path ./Data/HotpotQA/dataset/dev.qa.json \
  --output_path ./Multihop_QA/HotpotQA/data/dev.human.json

# Prepare the generated training set 
# (the generated questions in the last multi-hop QG step, name it as `train.hotpot.generated.json`)
python Multihop_QA/HotpotQA/prepare_qa_data.py \
  --src_path ./Data/HotpotQA/dataset/train.src.json \
  --qa_path ./Data/HotpotQA/dataset/train.hotpot.generated.json \
  --output_path ./Multihop_QA/HotpotQA/data/train.generated.json

This will create three datasets in the ./Multihop_QA/HotpotQA/data/ directory:

  • train.human.json: the human-labeled HotpotQA training set (90442 samples).
  • dev.human.json: the human-labeled HotpotQA validation set (7405 samples).
  • train.generated.json: the QA pairs generated by our MQA-QG model.

You could skip this data preparation process by directly downloading the above three files here.

Model Training

In the ./Multihop_QA/HotpotQA/ folder, run bash train.sh to train the SpanBERT QA model. Here is an example configuration of train.sh:

#!/bin/bash
set -x

DATAHOME=./data
MODELHOME=./outputs/supervised

mkdir -p ${MODELHOME}

export CUDA_VISIBLE_DEVICES=2

python code/run_mrqa.py \
  --do_train \
  --do_eval \
  --model spanbert-large-cased \
  --train_file ${DATAHOME}/train.human.json \
  --dev_file ${DATAHOME}/dev.human.json \
  --train_batch_size 32 \
  --eval_batch_size 32 \
  --gradient_accumulation_steps 8 \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --eval_per_epoch 10 \
  --output_dir ${MODELHOME} \

There are two typical settings:

  • Supervised QA Setting: train the SpanBERT model on the human-labeled training set (train.human.json) and then evaluate the performance on the human-labeled validation set (dev.human.json).

  • Unsupervised QA Setting: train the SpanBERT model on the generated training set (train.generated.json) and then evaluate the performance on the human-labeled validation set (dev.human.json).

Evaluation

In the ./Multihop_QA/HotpotQA/ folder, run bash evaluate.sh to train the SpanBERT QA model. Here is an example configuration of evaluate.sh:

set -x

DATAHOME=./data/dev.human.json
MODELHOME=./outputs/unsupervised

export CUDA_VISIBLE_DEVICES=4

python code/run_mrqa.py \
  --do_eval \
  --eval_test \
  --model spanbert-large-cased \
  --test_file ${DATAHOME} \
  --eval_batch_size 32 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --output_dir ${MODELHOME}

After evaluation, two files will be outputed to the model path:

  • test_results.txt: reporting the EM and F1.
  • predictions.txt: saving the QA results.

b) HybridQA

We use the HYBRIDER (Chen et al., 2020b) as the QA model for HybridQA.

Data Preparation

First, in the project root directory, run the following scripts to prepare the data. Suppose the generated questions in the last multi-hop QG step are saved in train.generated.json and put it into ./Data/HybridQA/ folder.

# Prepare the human-labeled train set
python Multihop_QA/HybridQA/prepare_qa_data.py \
  --input_path ./Data/HybridQA/train.human.json \
  --data_split train \
  --output_path ./Multihop_QA/HybridQA/data/human

# Prepare the human-labeled dev set
python Multihop_QA/HybridQA/prepare_qa_data.py \
  --input_path ./Data/HybridQA/dev.human.json \
  --data_split dev \
  --output_path ./Multihop_QA/HybridQA/data/human

# Prepare the generated training set 
python Multihop_QA/HybridQA/prepare_qa_data.py \
  --input_path ./Data/HybridQA/train.generated.json \
  --data_split train \
  --output_path ./Multihop_QA/HybridQA/data/generated

This will create two folders in the ./Multihop_QA/HybridQA/data/ directory:

  • generated: the processed generated train set.
  • human: the processed human-labeled train and dev set.

You could skip this data preparation process by directly downloading the above two folders here.

Model Training

Note that training the HYBRIDER model requires transformer==2.6.0

In the ./Multihop_QA/HybridQA/ folder, run bash train.sh to train the HYBRIDER QA model. Here is an example configuration of train.sh:

python train_stage12.py \
    --do_lower_case \
    --do_train \
    --train_file ./data/human/stage1_train_data.json \
    --resource_dir ../../Data/HybridQA/WikiTables-WithLinks \
    --learning_rate 2e-6 \
    --option stage1 \
    --num_train_epochs 3.0 \
    --gpu_index 6 \
    --cache_dir ./tmp/

python train_stage12.py \
    --do_lower_case \
    --do_train \
    --train_file ./data/human/stage2_train_data.json \
    --resource_dir ../../Data/HybridQA/WikiTables-WithLinks \
    --learning_rate 5e-6 \
    --option stage2 \
    --num_train_epochs 3.0 \
    --gpu_index 6 \
    --cache_dir ./tmp/

python train_stage3.py \
    --do_train  \
    --do_lower_case \
    --train_file ./data/human/stage3_train_data.json \
    --resource_dir ../../Data/HybridQA/WikiTables-WithLinks \
    --per_gpu_train_batch_size 12 \
    --learning_rate 3e-5 \
    --num_train_epochs 4.0 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --threads 8 \
    --gpu_index 6 \
    --cache_dir ./tmp/

There are two typical settings:

  • Supervised QA Setting: train the HYBRIDER model on the human-labeled training set. Set the train_file as ./data/human/stage1(2)(3)_train_data.json.

  • Unsupervised QA Setting: train the HYBRIDER model on the generated training set. Set the train_file as ./data/generated/stage1(2)(3)_train_data.json.

Evaluation

In the ./Multihop_QA/HybridQA/ folder, run bash evaluate.sh to evaluate the HYBRIDER QA model.

Reference

Please cite the paper in the following format if you use this dataset during your research.

@inproceedings{pan-etal-2021-MQA-QG,
  title={Unsupervised Multi-hop Question Answering by Question Generation},
  author={Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang},
  booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
  address = {Online},
  month = {June},
  year = {2021}
}

Q&A

If you encounter any problem, please either directly contact the first author or leave an issue in the github repo.

Owner
Liangming Pan
I am a third year Computer Science Ph.D. student at National University of Singapore.
Liangming Pan
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning Preprocess file of the dataset used in implicit sub-populations: (Demographic groups

<a href=[email protected]"> 4 Oct 14, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Project This repo has been populated by an initial template to help get you started. Please make sure to update the content to build a great experienc

Microsoft 674 Dec 26, 2022
Python library containing BART query generation and BERT-based Siamese models for neural retrieval.

Neural Retrieval Embedding-based Zero-shot Retrieval through Query Generation leverages query synthesis over large corpuses of unlabeled text (such as

Amazon Web Services - Labs 35 Apr 14, 2022
Federated Learning Based on Dynamic Regularization

Federated Learning Based on Dynamic Regularization This is implementation of Federated Learning Based on Dynamic Regularization. Requirements Please i

39 Jan 07, 2023
Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Instance-wise Occlusion and Depth Orders in Natural Scenes Official source code. Appears at CVPR 2022 This repository provides a new dataset, named In

27 Dec 27, 2022
RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos Implementation for "3D Human Pose, Shape and Texture from Low-Resoluti

XiangyuXu 42 Nov 10, 2022
Deep Learning for 3D Point Clouds: A Survey (IEEE TPAMI, 2020)

🔥Deep Learning for 3D Point Clouds (IEEE TPAMI, 2020)

Qingyong 1.4k Jan 08, 2023
RL agent to play μRTS with Stable-Baselines3

Gym-μRTS with Stable-Baselines3/PyTorch This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS usin

Oleksii Kachaiev 24 Nov 11, 2022
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

Facebook Research 42 Dec 09, 2022
Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

CLIPstyler Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition" Environment Pytorch 1.7.1, Python 3.6 $ c

201 Dec 29, 2022
RL-driven agent playing tic-tac-toe on starknet against challengers.

tictactoe-on-starknet RL-driven agent playing tic-tac-toe on starknet against challengers. GUI reference: https://pythonguides.com/create-a-game-using

21 Jul 30, 2022
Julia package for multiway (inverse) covariance estimation.

TensorGraphicalModels TensorGraphicalModels.jl is a suite of Julia tools for estimating high-dimensional multiway (tensor-variate) covariance and inve

Wayne Wang 3 Sep 23, 2022
Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020 BibTeX @INPROCEEDINGS{punnappurath2020modeling, author={Abhi

Abhijith Punnappurath 22 Oct 01, 2022
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Paul Treanor 12 Jan 10, 2022
Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

EGFNet Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing Dataset and Results Test maps: 百度网盘 提取码:zust Citation @ARTICLE{ author={Zhou,

ShaohuaDong 10 Dec 08, 2022
Good Semi-Supervised Learning That Requires a Bad GAN

Good Semi-Supervised Learning that Requires a Bad GAN This is the code we used in our paper Good Semi-supervised Learning that Requires a Bad GAN Ziha

Zhilin Yang 177 Dec 12, 2022
Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

Discrete Denoising Flows This repository contains the code for the experiments presented in the paper Discrete Denoising Flows [1]. To give a short ov

Alexandra Lindt 3 Oct 09, 2022
The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algori

Anish 324 Dec 27, 2022