Codes for coreference-aware machine reading comprehension

Last update: Sep 29, 2022

Related tags

Overview

Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022.

Dataset

There are three folders for our three models mentioned in the paper: Coref_additive_spacy for Coref_additive_attention, Coref_dgl_spacy for GNN and Coref_multiplication_spacy for Coref_multiplication_attention, and each contains the train data set and the dev data set under the quoref folder.

each sample contains

context: the paragraph text
context_id: the unique identifier of the context
qas: a group of questions
question: question text
id: the unique identifier of the question
answers: a group of the answers to one question
text: answer text
answer_start: the start_position of one answer

Models

If you want to use our trained model, please download it from Google drive

Training

python run_quoref.py --train_file "quoref/train.json" --predict_file "quoref/dev.json" --model_type "roberta_multi" --model_name_or_path "roberta-large" --output_dir "out" --do_train --do_eval --eval_all_checkpoints --learning_rate 1e-5 --num_train_epochs 6 --overwrite_output_dir --per_gpu_train_batch_size 4 --save_steps 6000 --coref_weight 0.4

Kindly Hint

There is an open issue regarding the compatibility between NeuralCoref and spaCy 3.0. If you intend to use the latest spaCy models, please watch the issue.

Cite

If you extend or use this work, please cite the paper where it was introduced:

@article{Huang2021TracingOC,
  title={Tracing Origins: Coref-aware Machine Reading Comprehension},
  author={Baorong Huang and Zhuosheng Zhang and Hai Zhao},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.07961}
}

Codes for coreference-aware machine reading comprehension

Related tags

Overview

Dataset

Models

Training

Kindly Hint

Cite

Owner

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

Collection of scripts to pinpoint obfuscated code

IMDB film review sentiment classification based on BERT's supervised learning model.

Beyond the Imitation Game collaborative benchmark for enormous language models

Mednlp - Medical natural language parsing and utility library

NLP-based analysis of poor Chinese movie reviews on Douban

Nmt - TensorFlow Neural Machine Translation Tutorial

ACL'2021: Learning Dense Representations of Phrases at Scale

Задания КЕГЭ по информатике 2021 на Python

An open-source NLP library: fast text cleaning and preprocessing.

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

End-to-End Speech Processing Toolkit

Natural Language Processing at EDHEC, 2022

Simple Speech to Text, Text to Speech

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.