LAnguage Model Analysis

Related tags

Deep LearningLAMA
Overview

LAMA: LAnguage Model Analysis

LAMA

LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models.

The dataset for the LAMA probe is available at https://dl.fbaipublicfiles.com/LAMA/data.zip

LAMA contains a set of connectors to pretrained language models.
LAMA exposes a transparent and unique interface to use:

  • Transformer-XL (Dai et al., 2019)
  • BERT (Devlin et al., 2018)
  • ELMo (Peters et al., 2018)
  • GPT (Radford et al., 2018)
  • RoBERTa (Liu et al., 2019)

Actually, LAMA is also a beautiful animal.

Reference:

The LAMA probe is described in the following papers:

@inproceedings{petroni2019language,
  title={Language Models as Knowledge Bases?},
  author={F. Petroni, T. Rockt{\"{a}}schel, A. H. Miller, P. Lewis, A. Bakhtin, Y. Wu and S. Riedel},
  booktitle={In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019},
  year={2019}
}

@inproceedings{petroni2020how,
  title={How Context Affects Language Models' Factual Predictions},
  author={Fabio Petroni and Patrick Lewis and Aleksandra Piktus and Tim Rockt{\"a}schel and Yuxiang Wu and Alexander H. Miller and Sebastian Riedel},
  booktitle={Automated Knowledge Base Construction},
  year={2020},
  url={https://openreview.net/forum?id=025X0zPfn}
}

The LAMA probe

To reproduce our results:

1. Create conda environment and install requirements

(optional) It might be a good idea to use a separate conda environment. It can be created by running:

conda create -n lama37 -y python=3.7 && conda activate lama37
pip install -r requirements.txt

2. Download the data

wget https://dl.fbaipublicfiles.com/LAMA/data.zip
unzip data.zip
rm data.zip

3. Download the models

DISCLAIMER: ~55 GB on disk

Install spacy model

python3 -m spacy download en

Download the models

chmod +x download_models.sh
./download_models.sh

The script will create and populate a pre-trained_language_models folder. If you are interested in a particular model please edit the script.

4. Run the experiments

python scripts/run_experiments.py

results will be logged in output/ and last_results.csv.

Other versions of LAMA

LAMA-UHN

This repository also provides a script (scripts/create_lama_uhn.py) to create the data used in (Poerner et al., 2019).

Negated-LAMA

This repository also gives the option to evalute how pretrained language models handle negated probes (Kassner et al., 2019), set the flag use_negated_probes in scripts/run_experiments.py. Also, you should use this version of the LAMA probe https://dl.fbaipublicfiles.com/LAMA/negated_data.tar.gz

What else can you do with LAMA?

1. Encode a list of sentences

and use the vectors in your downstream task!

pip install -e git+https://github.com/facebookresearch/LAMA#egg=LAMA
import argparse
from lama.build_encoded_dataset import encode, load_encoded_dataset

PARAMETERS= {
        "lm": "bert",
        "bert_model_name": "bert-large-cased",
        "bert_model_dir":
        "pre-trained_language_models/bert/cased_L-24_H-1024_A-16",
        "bert_vocab_name": "vocab.txt",
        "batch_size": 32
        }

args = argparse.Namespace(**PARAMETERS)

sentences = [
        ["The cat is on the table ."],  # single-sentence instance
        ["The dog is sleeping on the sofa .", "He makes happy noises ."],  # two-sentence
        ]

encoded_dataset = encode(args, sentences)
print("Embedding shape: %s" % str(encoded_dataset[0].embedding.shape))
print("Tokens: %r" % encoded_dataset[0].tokens)

# save on disk the encoded dataset
encoded_dataset.save("test.pkl")

# load from disk the encoded dataset
new_encoded_dataset = load_encoded_dataset("test.pkl")
print("Embedding shape: %s" % str(new_encoded_dataset[0].embedding.shape))
print("Tokens: %r" % new_encoded_dataset[0].tokens)

2. Fill a sentence with a gap.

You should use the symbol [MASK] to specify the gap. Only single-token gap supported - i.e., a single [MASK].

python lama/eval_generation.py  \
--lm "bert"  \
--t "The cat is on the [MASK]."

cat_on_the_phone

cat_on_the_phone

source: https://commons.wikimedia.org/wiki/File:Bluebell_on_the_phone.jpg

Note that you could use this functionality to answer cloze-style questions, such as:

python lama/eval_generation.py  \
--lm "bert"  \
--t "The theory of relativity was developed by [MASK] ."

Install LAMA with pip

Clone the repo

git clone [email protected]:facebookresearch/LAMA.git && cd LAMA

Install as an editable package:

pip install --editable .

If you get an error in mac os x, please try running this instead

CFLAGS="-Wno-deprecated-declarations -std=c++11 -stdlib=libc++" pip install --editable .

Language Model(s) options

Option to indicate which language model(s) to use:

  • --language-models/--lm : comma separated list of language models (REQUIRED)

BERT

BERT pretrained models can be loaded both: (i) passing the name of the model and using huggingface cached versions or (ii) passing the folder containing the vocabulary and the PyTorch pretrained model (look at convert_tf_checkpoint_to_pytorch in here to convert the TensorFlow model to PyTorch).

  • --bert-model-dir/--bmd : directory that contains the BERT pre-trained model and the vocabulary
  • --bert-model-name/--bmn : name of the huggingface cached versions of the BERT pre-trained model (default = 'bert-base-cased')
  • --bert-vocab-name/--bvn : name of vocabulary used to pre-train the BERT model (default = 'vocab.txt')

RoBERTa

  • --roberta-model-dir/--rmd : directory that contains the RoBERTa pre-trained model and the vocabulary (REQUIRED)
  • --roberta-model-name/--rmn : name of the RoBERTa pre-trained model (default = 'model.pt')
  • --roberta-vocab-name/--rvn : name of vocabulary used to pre-train the RoBERTa model (default = 'dict.txt')

ELMo

  • --elmo-model-dir/--emd : directory that contains the ELMo pre-trained model and the vocabulary (REQUIRED)
  • --elmo-model-name/--emn : name of the ELMo pre-trained model (default = 'elmo_2x4096_512_2048cnn_2xhighway')
  • --elmo-vocab-name/--evn : name of vocabulary used to pre-train the ELMo model (default = 'vocab-2016-09-10.txt')

Transformer-XL

  • --transformerxl-model-dir/--tmd : directory that contains the pre-trained model and the vocabulary (REQUIRED)
  • --transformerxl-model-name/--tmn : name of the pre-trained model (default = 'transfo-xl-wt103')

GPT

  • --gpt-model-dir/--gmd : directory that contains the gpt pre-trained model and the vocabulary (REQUIRED)
  • --gpt-model-name/--gmn : name of the gpt pre-trained model (default = 'openai-gpt')

Evaluate Language Model(s) Generation

options:

  • --text/--t : text to compute the generation for
  • --i : interactive mode
    one of the two is required

example considering both BERT and ELMo:

python lama/eval_generation.py \
--lm "bert,elmo" \
--bmd "pre-trained_language_models/bert/cased_L-24_H-1024_A-16/" \
--emd "pre-trained_language_models/elmo/original/" \
--t "The cat is on the [MASK]."

example considering only BERT with the default pre-trained model, in an interactive fashion:

python lamas/eval_generation.py  \
--lm "bert"  \
--i

Get Contextual Embeddings

python lama/get_contextual_embeddings.py \
--lm "bert,elmo" \
--bmn bert-base-cased \
--emd "pre-trained_language_models/elmo/original/"

Unified vocabulary

The intersection of the vocabularies for all considered models

Troubleshooting

If the module cannot be found, preface the python command with PYTHONPATH=.

If the experiments fail on GPU memory allocation, try reducing batch size.

Acknowledgements

Other References

  • (Kassner et al., 2019) Nora Kassner, Hinrich Schütze. Negated LAMA: Birds cannot fly. arXiv preprint arXiv:1911.03343, 2019.

  • (Poerner et al., 2019) Nina Poerner, Ulli Waltinger, and Hinrich Schütze. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA. arXiv preprint arXiv:1911.03681, 2019.

  • (Dai et al., 2019) Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, and Ruslan Salakhutdi. Transformer-xl: Attentive language models beyond a fixed-length context. CoRR, abs/1901.02860.

  • (Peters et al., 2018) Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. NAACL-HLT 2018

  • (Devlin et al., 2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.

  • (Radford et al., 2018) Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.

  • (Liu et al., 2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.

Licence

LAMA is licensed under the CC-BY-NC 4.0 license. The text of the license can be found here.

Owner
Meta Research
Meta Research
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
On the Adversarial Robustness of Visual Transformer

On the Adversarial Robustness of Visual Transformer Code for our paper "On the Adversarial Robustness of Visual Transformers"

Rulin Shao 35 Dec 14, 2022
A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

443 Jan 06, 2023
Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

163 Dec 22, 2022
DetCo: Unsupervised Contrastive Learning for Object Detection

DetCo: Unsupervised Contrastive Learning for Object Detection arxiv link News Sparse RCNN+DetCo improves from 45.0 AP to 46.5 AP(+1.5) with 3x+ms trai

Enze Xie 234 Dec 18, 2022
PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

Learning Character-Agnostic Motion for Motion Retargeting in 2D We provide PyTorch implementation for our paper Learning Character-Agnostic Motion for

Rundi Wu 367 Dec 22, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

AmbientGAN: Generative models from lossy measurements This repository provides code to reproduce results from the paper AmbientGAN: Generative models

Ashish Bora 87 Oct 19, 2022
Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification Usage The required packages are lis

0 Feb 07, 2022
A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022
Contains code for Deep Kernelized Dense Geometric Matching

DKM - Deep Kernelized Dense Geometric Matching Contains code for Deep Kernelized Dense Geometric Matching We provide pretrained models and code for ev

Johan Edstedt 83 Dec 23, 2022
Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

SCAPT-ABSA Code for EMNLP2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training" Overvie

Zhengyan Li 66 Dec 04, 2022
HistoKT: Cross Knowledge Transfer in Computational Pathology

HistoKT: Cross Knowledge Transfer in Computational Pathology Exciting News! HistoKT has been accepted to ICASSP 2022. HistoKT: Cross Knowledge Transfe

Mahdi S. Hosseini 5 Jan 05, 2023
Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

Vinicius G. Goecks 37 Oct 30, 2022
Peek-a-Boo: What (More) is Disguised in a Randomly Weighted Neural Network, and How to Find It Efficiently

Peek-a-Boo: What (More) is Disguised in a Randomly Weighted Neural Network, and How to Find It Efficiently This repository is the official implementat

VITA 4 Dec 20, 2022
FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

18 Sep 01, 2022
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

Galuh 17 Mar 10, 2022
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 03, 2023
Really awesome semantic segmentation

really-awesome-semantic-segmentation A list of all papers on Semantic Segmentation and the datasets they use. This site is maintained by Holger Caesar

Holger Caesar 400 Nov 28, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

THUML @ Tsinghua University 221 Dec 31, 2022