NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Overview

OptiPrompt

This is the PyTorch implementation of the paper Factual Probing Is [MASK]: Learning vs. Learning to Recall.

We propose OptiPrompt, a simple and effective approach for Factual Probing. OptiPrompt optimizes the prompts on the input embedding space directly. It outperforms previous prompting methods on the LAMA benchmark. Furthermore, in order to better interprete probing results, we propose control experiments based on the probing results on randomly initialized models. Please check our paper for details.

Quick links

Setup

Install dependecies

Our code is based on python 3.7. All experiments are run on a single GPU.

Please install all the dependency packages using the following command:

pip install -r requirements.txt

Download the data

We pack all datasets we used in our experiments here. Please download it and extract the files to ./data, or run the following commands to autoamtically download and extract it.

bash script/download_data.sh

The datasets are structured as below.

data
├── LAMA-TREx                         # The original LAMA-TREx test set (34,039 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx_UHN                     # The LAMA-TREx_UHN test set (27,102 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx-easy-hard               # The easy and hard partitions of the LAMA-TREx dataset (check the paper for details)
│   ├── Easy                          # The LAMA-easy partition (10,546 examples)
│   │   ├── P17.jsonl                 # Testing file for the relation `P17`
│   │   └── ...
│   └── Hard                          # The LAMA-hard partition (23,493 examples)
│       ├── P17.jsonl                 # Testing file for the relation `P17`
│       └── ...
├── autoprompt_data                   # Training data collected by AutoPrompt
│   ├── P17                           # Train/dev/test files for the relation `P17`
│   │   ├── train.jsonl               # Training examples
│   │   ├── dev.jsonl                 # Development examples
│   │   └── test.jsonl                # Test examples (the same as LAMA-TREx test set)
│   └── ...
└── cmp_lms_data                      # Training data collected by ourselves which can be used for BERT, RoBERTa, and ALBERT (we only use this dataset in Table 6 in the paper)
    ├── P17                           # Train/dev/test files for the relation `P17`
    │   ├── train.jsonl               # Training examples
    │   ├── dev.jsonl                 # Development examples
    │   ├── test.jsonl                # Test examples (a subset of the LAMA-TREx test set, filtered using the common vocab of three models)
    └── ...

Run OptiPrompt

Train/evaluate OptiPrompt

You can use code/run_optiprompt.py to train or evaluate the prompts on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_optiprompt.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions \
    [--init_manual_template] [--num_vectors 5 | 10]

Arguments:

  • relation_profile: the meta information for each relation, containing the manual templates.
  • relation: the relation type (e.g., P101) considered in this experiment.
  • common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
  • model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
  • do_train: whether to train the prompts on a training and development set.
  • do_eval: whether to test the trained prompts on a testing set.
  • {train|dev|test}_data: the file path of training/development/testing dataset.
  • random_init: how do we random initialize the model before training, there are three settings:
    • none: use the pre-trained model, no random initialization is used;
    • embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
    • all: the Rand M control setting, where we random initialize all the parameters of the model.
  • init_manual_template: whether initialize the dense vectors in OptiPrompt using the manual prompts.
  • num_vectors: how many dense vectors are added in OptiPrompt (this argument is valid only when init_manual_template is not set).
  • output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_optiprompt.sh) to run OptiPrompt on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_opti.sh

The default setting of this script is to run OptiPromot initialized with manual prompts on the pre-trained bert-base-cased model (no random initialization is used). The results will be stored in the outputs directory.

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_optiprompt.sh if you want to run experiments on other settings.

Run Fine-tuning

We release the code that we used in our experiments (check Section 4 in the paper).

Fine-tuning language models on factual probing

You can use code/run_finetune.py to fine-tune a language model on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_finetune.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions

Arguments:

  • relation_profile: the meta information for each relation, containing the manual templates.
  • relation: the relation type (e.g., P101) considered in this experiment.
  • common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
  • model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
  • do_train: whether to train the prompts on a training and development set.
  • do_eval: whether to test the trained prompts on a testing set.
  • {train|dev|test}_data: the file path of training/development/testing dataset.
  • random_init: how do we random initialize the model before training, there are three settings:
    • none: use the pre-trained model, no random initialization is used;
    • embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
    • all: the Rand M control setting, where we random initialize all the parameters of the model.
  • output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_finetune.sh) to run fine-tuning on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_finetune.sh

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_finetune.sh if you want to run experiments on other settings.

Evaluate LAMA/LPAQA/AutoPrompt prompts

We provide a script to evaluate prompts released in previous works (based on code/run_finetune.py with only --do_eval). Please use the foolowing command:

bash scripts/run_eval_prompts {lama | lpaqa | autoprompt}

Questions?

If you have any questions related to the code or the paper, feel free to email Zexuan Zhong ([email protected]) or Dan Friedman ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

@inproceedings{zhong2021factual,
   title={Factual Probing Is [MASK]: Learning vs. Learning to Recall},
   author={Zhong, Zexuan and Friedman, Dan and Chen, Danqi},
   booktitle={North American Association for Computational Linguistics (NAACL)},
   year={2021}
}
Owner
Princeton Natural Language Processing
Princeton Natural Language Processing
DeepStochlog Package For Python

DeepStochLog Installation Installing SWI Prolog DeepStochLog requires SWI Prolog to run. Run the following commands to install: sudo apt-add-repositor

KU Leuven Machine Learning Research Group 17 Dec 23, 2022
DexterRedTool - Dexter's Red Team Tool that creates cronjob/task scheduler to consistently creates users

DexterRedTool Author: Dexter Delandro CSEC 473 - Spring 2022 This tool persisten

2 Feb 16, 2022
Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

Phil Wang 55 Jan 01, 2023
A MNIST-like fashion product database. Benchmark

Fashion-MNIST Table of Contents Why we made Fashion-MNIST Get the Data Usage Benchmark Visualization Contributing Contact Citing Fashion-MNIST License

Zalando Research 10.5k Jan 08, 2023
Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis, including human motion imitation, appearance transfer, and novel view synthesis. Currently the paper is under review

2.3k Jan 05, 2023
The Official PyTorch Implementation of DiscoBox.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision Paper | Project page | Demo (Youtube) | Demo (Bilib

NVIDIA Research Projects 89 Jan 09, 2023
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

Liming Jiang 238 Nov 25, 2022
Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

DTI-Sprites Pytorch implementation of "Unsupervised Layered Image Decomposition into Object Prototypes" paper Check out our paper and webpage for deta

40 Dec 22, 2022
ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Contributors of this repo: Zhibo Zhang ( Zhibo (Darren) Zhang 18 Nov 01, 2022

Christmas face app for Decathlon xmas coding party!

Christmas Face Application Use this library to create the perfect picture for your christmas cards! Done by Hasib Zunair, Guillaume Brassard and Samue

Hasib Zunair 4 Dec 20, 2021
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

DistMIS Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation. DistriMIS Distributing Deep Learning Hyperparameter Tuning

HiEST 2 Sep 09, 2022
Learning Modified Indicator Functions for Surface Reconstruction

Learning Modified Indicator Functions for Surface Reconstruction In this work, we propose a learning-based approach for implicit surface reconstructio

4 Apr 18, 2022
TrTr: Visual Tracking with Transformer

TrTr: Visual Tracking with Transformer We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder a

趙 漠居(Zhao, Moju) 66 Dec 27, 2022
Nodule Generation Algorithm Baseline and template code for node21 generation track

Nodule Generation Algorithm This codebase implements a simple baseline model, by following the main steps in the paper published by Litjens et al. for

node21challenge 10 Apr 21, 2022
DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021] Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 98 Dec 21, 2022
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

DeepSound 261 Dec 14, 2022
constructing maps of intellectual influence from publication data

Influencemap Project @ ANU Influence in the academic communities has been an area of interest for researchers. This can be seen in the popularity of a

CS Metrics 13 Jun 18, 2022
An introduction to satellite image analysis using Python + OpenCV and JavaScript + Google Earth Engine

A Gentle Introduction to Satellite Image Processing Welcome to this introductory course on Satellite Image Analysis! Satellite imagery has become a pr

Edward Oughton 32 Jan 03, 2023
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

简体中文 | English News [2021-10-12] PaddleNLP 2.1版本已发布!新增开箱即用的NLP任务能力、Prompt Tuning应用示例与生成任务的高性能推理! 🎉 更多详细升级信息请查看Release Note。 [2021-08-22]《千言:面向事实一致性的生

6.9k Jan 01, 2023
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Matthias Fey 139 Dec 25, 2022