Diagnostic tests for linguistic capacities in language models

Overview

LM diagnostics

This repository contains the diagnostic datasets and experimental code for What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models, by Allyson Ettinger.

Diagnostic test data

The datasets folder contains TSV files with data for each diagnostic test, along with explanatory README files for each dataset.

Code

[All code now updated to be run with Python 3.]

The code in this section can be used to process the diagnostic datasets for input to a language model, and then to run the diagnostic tests on that language model's predictions. The code should be used in three steps:

Step 1: Process datasets to produce inputs for LM

proc_datasets.py can be used to process the provided datasets into 1) <testname>-contextlist files containing contexts (one per line) on which the LM's predictions should be conditioned, and b) <testname>-targetlist files containing target words (one per line, aligned with the contexts in *-contextlist) for which you will need probabilities conditioned on the corresponding contexts. Repeats in *-contextlist are intentional, to align with the targets in *-targetlist.

Basic usage:

python proc_datasets.py \
  --outputdir <location for output files> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv \
  --add_mask_tok
  • add_mask_tok flag will append '[MASK]' to the contexts in *-contextlist, for use with BERT.
  • <testname> comes from the following list: cprag, role, negsimp, negnat for CPRAG-34, ROLE-88, NEG-88-SIMP and NEG-88-NAT, respectively.

Step 2: Get LM predictions/probabilities

You will need to produce two files: one containing top word predictions conditioned on each context, and one containing the probabilities for each target word conditioned on its corresponding context.

Predictions: Model word predictions should be written to a file with naming modelpreds-<testname>-<modelname>. Each line of this file should contain the top word predictions conditioned on the context in the corresponding line in *-contextlist. Word predictions on a given line should be separated by whitespace. Number of predictions per line should be no less than the highest k that you want to use for accuracy tests.

Probabilities Model target probabilities should be written to a file with naming modeltgtprobs-<testname>-<modelname>. Each line of this file should contain the probability of the target word on the corresponding line of *-targetlist, conditioned on the context on the corresponding line of *-contextlist.

  • <testname> list is as above. <modelname> should be the name of the model that will be input to the code in Step 3.

Step 3: Run accuracy and sensitivity tests for each diagnostic

prediction_accuracy_tests.py takes modelpreds-<testname>-<modelname> as input and runs word prediction accuracy tests.

Basic usage:

python prediction_accuracy_tests.py \
  --preddir <location of modelpreds-<testname>-<modelname>> \
  --resultsdir <location for results files> \
  --models <names of models to be tested, e.g., bert-base-uncased bert-large-uncased> \
  --k_values <list of k values to be tested, e.g., 1 5> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv

sensitivity_tests.py takes modeltgtprobs-<testname>-<modelname> as input and runs sensitivity tests.

Basic usage:

python sensitivity_tests.py \
  --probdir <location of modelpreds-<testname>-<modelname>> \
  --resultsdir <location for results files> \
  --models <names of models to be tested, e.g., bert-base-uncased bert-large-uncased> \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv

Experimental code

run_diagnostics_bert.py is the code that was used for the experiments on BERTBASE and BERTLARGE reported in the paper, including perturbations.

Example usage:

python run_diagnostics_bert.py \
  --cprag_stim datasets/CPRAG-34/CPRAG-34.tsv \
  --role_stim datasets/ROLE-88/ROLE-88.tsv \
  --negnat_stim datasets/NEG-88/NEG-88-NAT.tsv \
  --negsimp_stim datasets/NEG-88/NEG-88-SIMP.tsv \
  --resultsdir <location for results files> \
  --bertbase <BERT BASE location> \
  --bertlarge <BERT LARGE location> \
  --incl_perturb
  • bertbase and bertlarge specify locations for PyTorch BERTBASE and BERTLARGE models -- each folder is expected to include vocab.txt, bert_config.json, and pytorch_model.bin for the corresponding PyTorch BERT model. (Note that experiments were run with the original pytorch-pretrained-bert version, so I can't guarantee identical results with the updated pytorch-transformers.)
  • incl_perturb runs experiments with all perturbations reported in the paper. Without this flag, only runs experiments without perturbations.
A2LP for short, ECCV2020 spotlight, Investigating SSL principles for UDA problems

Label-Propagation-with-Augmented-Anchors (A2LP) Official codes of the ECCV2020 spotlight (label propagation with augmented anchors: a simple semi-supe

20 Oct 27, 2022
Reference implementation for Structured Prediction with Deep Value Networks

Deep Value Network (DVN) This code is a python reference implementation of DVNs introduced in Deep Value Networks Learn to Evaluate and Iteratively Re

Michael Gygli 55 Feb 02, 2022
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

HugsVision is an open-source and easy to use all-in-one huggingface wrapper for computer vision. The goal is to create a fast, flexible and user-frien

Labrak Yanis 166 Nov 27, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon PatiƱo 24 Oct 22, 2022
Clean Machine Learning, a Coding Kata

Kata: Clean Machine Learning From Dirty Code First, open the Kata in Google Colab (or else download it) You can clone this project and launch jupyter-

Neuraxio 13 Nov 03, 2022
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Auto-Seg-Loss By Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai This is the official implementation of the ICLR 2021 paper Auto

61 Dec 21, 2022
Answer a series of contextually-dependent questions like they may occur in natural human-to-human conversations.

SCAI-QReCC-21 [leaderboards] [registration] [forum] [contact] [SCAI] Answer a series of contextually-dependent questions like they may occur in natura

19 Sep 28, 2022
FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

FaceExtraction FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction Occlusions often occur in face images in the wild, tr

16 Dec 14, 2022
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 04, 2023
Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

FCN_MSCOCO_Food_Segmentation Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation Input data: [http://mscoco.org/dataset/#ove

Alexander Kalinovsky 11 Jan 08, 2019
Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

MAUVE MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE

Krishna Pillutla 182 Jan 02, 2023
[NeurIPS 2021] The PyTorch implementation of paper "Self-Supervised Learning Disentangled Group Representation as Feature"

IP-IRM [NeurIPS 2021] The PyTorch implementation of paper "Self-Supervised Learning Disentangled Group Representation as Feature". Codes will be relea

Wang Tan 67 Dec 24, 2022
A library for uncertainty quantification based on PyTorch

Torchuq [logo here] TorchUQ is an extensive library for uncertainty quantification (UQ) based on pytorch. TorchUQ currently supports 10 representation

TorchUQ 96 Dec 12, 2022
Official PyTorch implementation of "Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient".

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient This repository is the official PyTorch implementation of "Edge Rewiring Go

Shanchao Yang 4 Dec 12, 2022
My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Deep Q&A Table of Contents Presentation Installation Running Chatbot Web interface Results Pretrained model Improvements Upgrade Presentation This wor

Conchylicultor 2.9k Dec 28, 2022
PyTorch implementation of the ideas presented in the paper Interaction Grounded Learning (IGL)

Interaction Grounded Learning This repository contains a simple PyTorch implementation of the ideas presented in the paper Interaction Grounded Learni

Arthur Juliani 4 Aug 31, 2022
Simple implementation of Mobile-Former on Pytorch

Simple-implementation-of-Mobile-Former At present, only the model but no trained. There may be some bug in the code, and some details may be different

Acheung 103 Dec 31, 2022
Meaningful titles for tabs and PDF downloads! Also supports tab search.

arxiv-utils If you are a researcher that reads a lot on ArXiv, you'll benefit a lot from this web extension. Renames the title of PDF page to the pape

Johnson 174 Dec 20, 2022
Quantized tflite models for ailia TFLite Runtime

ailia-models-tflite Quantized tflite models for ailia TFLite Runtime About ailia TFLite Runtime ailia TF Lite Runtime is a TensorFlow Lite compatible

ax Inc. 13 Dec 23, 2022
[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation [Paper] Prerequisites To install requirements: pip install -r requirements.txt

Guangrui Li 84 Dec 26, 2022