"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Related tags

Deep Learningblp
Overview

Inductive entity representations from text via link prediction





This repository contains the code used for the experiments in the paper "Inductive entity representations from text via link prediction", presented at The Web Conference, 2021. To refer to our work, please use the following:

@inproceedings{daza2021inductive,
    title = {Inductive Entity Representations from Text via Link Prediction},
    author = {Daniel Daza and Michael Cochez and Paul Groth},
    booktitle = {Proceedings of The Web Conference 2021},
    year = {2021},
    doi = {10.1145/3442381.3450141},
}

In this work, we show how a BERT-based text encoder can be fine-tuned with a link prediction objective, in a graph where entities have an associated textual description. We call the resulting model BLP. There are three interesting properties of a trained BLP model:

  • It can predict a link between entities, even if one or both were not present during training.
  • It produces useful representations for a classifier, that don't require retraining the encoder.
  • It improves an information retrieval system, by better matching entities and questions about them.

Usage

Please follow the instructions next to reproduce our experiments, and to train a model with your own data.

1. Install the requirements

Creating a new environment (e.g. with conda) is recommended. Use requirements.txt to install the dependencies:

conda create -n blp python=3.7
conda activate blp
pip install -r requirements.txt

2. Download the data

Download the required compressed datasets into the data folder:

Download link Size (compressed)
UMLS (small graph for tests) 121 KB
WN18RR 6.6 MB
FB15k-237 21 MB
Wikidata5M 1.4 GB
GloVe embeddings 423 MB
DBpedia-Entity 1.3 GB

Then use tar to extract the files, e.g.

tar -xzvf WN18RR.tar.gz

Note that the KG-related files above contain both transductive and inductive splits. Transductive splits are commonly used to evaluate lookup-table methods like ComplEx, while inductive splits contain entities in the test set that are not present in the training set. Files with triples for the inductive case have the ind prefix, e.g. ind-train.txt.

2. Reproduce the experiments

Link prediction

To check that all dependencies are correctly installed, run a quick test on a small graph (this should take less than 1 minute on GPU):

./scripts/test-umls.sh

The following table is a adapted from our paper. The "Script" column contains the name of the script that reproduces the experiment for the corresponding model and dataset. For example, if you want to reproduce the results of BLP-TransE on FB15k-237, run

./scripts/blp-transe-fb15k237.sh
WN18RR FB15k-237 Wikidata5M
Model MRR Script MRR Script MRR Script
GlovE-BOW 0.170 glove-bow-wn18rr.sh 0.172 glove-bow-fb15k237.sh 0.343 glove-bow-wikidata5m.sh
BE-BOW 0.180 bert-bow-wn18rr.sh 0.173 bert-bow-fb15k237.sh 0.362 bert-bow-wikidata5m.sh
GloVe-DKRL 0.115 glove-dkrl-wn18rr.sh 0.112 glove-dkrl-fb15k237.sh 0.282 glove-dkrl-wikidata5m.sh
BE-DKRL 0.139 bert-dkrl-wn18rr.sh 0.144 bert-dkrl-fb15k237.sh 0.322 bert-dkrl-wikidata5m.sh
BLP-TransE 0.285 blp-transe-wn18rr.sh 0.195 blp-transe-fb15k237.sh 0.478 blp-transe-wikidata5m.sh
BLP-DistMult 0.248 blp-distmult-wn18rr.sh 0.146 blp-distmult-fb15k237.sh 0.472 blp-distmult-wikidata5m.sh
BLP-ComplEx 0.261 blp-complex-wn18rr.sh 0.148 blp-complex-fb15k237.sh 0.489 blp-complex-wikidata5m.sh
BLP-SimplE 0.239 blp-simple-wn18rr.sh 0.144 blp-simple-fb15k237.sh 0.493 blp-simple-wikidata5m.sh

Entity classification

After training for link prediction, a tensor of embeddings for all entities is computed and saved in a file with name ent_emb-[ID].pt where [ID] is the id of the experiment in the database (we use Sacred to manage experiments). Another file called ents-[ID].pt contains entity identifiers for every row in the tensor of embeddings.

To ease reproducibility, we provide these tensors, which are required in the entity classification task. Click on the ID, download the file into the output folder, and decompress it. An experiment can be reproduced using the following command:

python train.py node_classification with checkpoint=ID dataset=DATASET

where DATASET is either WN18RR or FB15k-237. For example:

python train.py node_classification with checkpoint=199 dataset=WN18RR
WN18RR FB15k-237
Model Acc. ID Acc. Bal. ID
GloVe-BOW 55.3 219 34.4 293
BE-BOW 60.7 218 28.3 296
GloVe-DKRL 55.5 206 26.6 295
BE-DKRL 48.8 207 30.9 294
BLP-TransE 81.5 199 42.5 297
BLP-DistMult 78.5 200 41.0 298
BLP-ComplEx 78.1 201 38.1 300
BLP-SimplE 83.0 202 45.7 299

Information retrieval

This task runs with a pre-trained model saved from the link prediction task. For example, if the model trained is blp with transe and it was saved as model.pt, then run the following command to run the information retrieval task:

python retrieval.py with model=blp rel_model=transe \
checkpoint='output/model.pt'

Using your own data

If you have a knowledge graph where entities have textual descriptions, you can train a BLP model for the tasks of inductive link prediction, and entity classification (if you also have labels for entities).

To do this, add a new folder inside the data folder (let's call it my-kg). Store in it a file containing the triples in your KG. This should be a text file with one tab-separated triple per line (let's call it all-triples.tsv).

To generate inductive splits, you can use data/utils.py. If you run

python utils.py drop_entities --file=my-kg/all-triples.tsv

this will generate ind-train.tsv, ind-dev.tsv, ind-test.tsv inside my-kg (see Appendix A in our paper for details on how these are generated). You can then train BLP-TransE with

python train.py with dataset='my-kg'

Alternative implementations

Owner
Daniel Daza
PhD student at VU Amsterdam and the University of Amsterdam, working on machine learning and knowledge graphs.
Daniel Daza
Fedlearn支持前沿算法研发的Python工具库 | Fedlearn algorithm toolkit for researchers

FedLearn-algo Installation Development Environment Checklist python3 (3.6 or 3.7) is required. To configure and check the development environment is c

89 Nov 14, 2022
Official implementation of "Motif-based Graph Self-Supervised Learning forMolecular Property Prediction"

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction Official Pytorch implementation of NeurIPS'21 paper "Motif-based Graph Se

zaixi 71 Dec 20, 2022
Safe Local Motion Planning with Self-Supervised Freespace Forecasting, CVPR 2021

Safe Local Motion Planning with Self-Supervised Freespace Forecasting By Peiyun Hu, Aaron Huang, John Dolan, David Held, and Deva Ramanan Citing us Yo

Peiyun Hu 90 Dec 01, 2022
YOLOv5 detection interface - PyQt5 implementation

所有代码已上传,直接clone后,运行yolo_win.py即可开启界面。 2021/9/29:加入置信度选择 界面是在ultralytics的yolov5基础上建立的,界面使用pyqt5实现,内容较简单,娱乐而已。 功能: 模型选择 本地文件选择(视频图片均可) 开关摄像头

487 Dec 27, 2022
Deep learning library for solving differential equations and more

DeepXDE Voting on whether we should have a Slack channel for discussion. DeepXDE is a library for scientific machine learning. Use DeepXDE if you need

Lu Lu 1.4k Dec 29, 2022
Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

ppg-vc Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC) This repo implements different kinds of PPG-based VC models. Pretrained models. More m

Liu Songxiang 227 Dec 28, 2022
Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

IMAGINE: Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration This repo contains the code base of the paper Language as a Cog

Flowers Team 26 Dec 22, 2022
A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

SGCN ⠀ A PyTorch implementation of Signed Graph Convolutional Network (ICDM 2018). Abstract Due to the fact much of today's data can be represented as

Benedek Rozemberczki 251 Nov 30, 2022
Learning embeddings for classification, retrieval and ranking.

StarSpace StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems: Learning wor

Facebook Research 3.8k Dec 22, 2022
Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes (CVPR 2021) Project page | Paper | Colab | Colab for Drawing App Rethinking Style

CompVis Heidelberg 153 Jan 04, 2023
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit 👉 Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

AIcrowd 106 Dec 20, 2022
PRIME: A Few Primitives Can Boost Robustness to Common Corruptions

PRIME: A Few Primitives Can Boost Robustness to Common Corruptions This is the official repository of PRIME, the data agumentation method introduced i

Apostolos Modas 34 Oct 30, 2022
Official implementation of the Implicit Behavioral Cloning (IBC) algorithm

Implicit Behavioral Cloning This codebase contains the official implementation of the Implicit Behavioral Cloning (IBC) algorithm from our paper: Impl

Google Research 210 Dec 09, 2022
Stock-history-display - something like a easy yearly review for your stock performance

Stock History Display Available on Heroku: https://stock-history-display.herokua

LiaoJJ 1 Jan 07, 2022
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection This repository is an official implementation of the AAAI 2021 paper Co-mi

MEGVII Research 20 Dec 07, 2022
Rename Images with Auto Generated Neural Image Captions

Recaption Images with Generated Neural Image Caption Example Usage: Commandline: Recaption all images from folder /home/feng/Downloads/images to folde

feng wang 3 May 01, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022
PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

Sapana Chaudhary 25 Dec 08, 2022
Training neural models with structured signals.

Neural Structured Learning in TensorFlow Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured

955 Jan 02, 2023