《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

Overview

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

This repository is the implementation of the paper "K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters".

In the K-adapter paper, we present a flexible approach that supports continual knowledge infusion into large pre-trained models (e.g. RoBERTa in this work). We infuse factual knowledge and linguistic knowledge, and show that adapters for both kinds of knowledge work well on downstream tasks.

For more details, please check the latest version of the paper: https://arxiv.org/abs/2002.01808

Prerequisites

  • Python 3.6
  • PyTorch 1.3.1
  • tensorboardX
  • transformers

We use huggingface/transformers framework, the environment can be installed with:

conda create -n kadapter python=3.6
pip install -r requirements.txt

Pre-training Adapters

In the pre-training procedure, we train each knowledge-specific adapter on different pre-training tasks individually.

1. Process Dataset

  • ./scripts/clean_T_REx.py: clean raw T-Rex dataset (32G), and save the cleaned T-Rex to JSON format
  • ./scripts/create_subdataset-relation-classification.ipynb: create the dataset from T-REx for pre-training factual adapter on relation classification task. This sub-dataset can be found here.
  • refer to this code to get the dependency parsing dataset : create the dataset from Book Corpus for pre-training the linguistic adapter on dependency parsing task.

2. Factual Adapter

To pre-train fac-adapter, run

bash run_pretrain_fac-adapter.sh

3. Linguistic Adapter

To pre-train lin-adapter, run

bash run_pretrain_lin-adapter.sh

The pre-trained fac-adapter and lin-adapter models can be found here.

Fine-tuning on Downstream Tasks

Adapter Structure

  • The fac-adapter (lin-adapter) consists of two transformer layers (L=2, H=768, A = 12)
  • The RoBERTa layers where adapters plug in: 0,11,23 or 0,11,22
  • For using only single adapter
    • Use the concatenation of the last hidden feature of RoBERTa and the last hidden feature of the adapter as the input representation for the task-specific layer.
  • For using combine adapter
    • For each adapter, first concat the last hidden feature of RoBERTa and the last hidden feature of every adapter and feed into a linear layer separately, then concat the representations as input for task-specific layer.

About how to load pretrained RoBERTa and pretrained adapter

  • The pre-trained adapters are in ./pretrained_models/fac-adapter/pytorch_model.bin and ./pretrained_models/lin-adapter/pytorch_model.bin. For using only single adapter, for example, fac-adapter, then you can set the argument meta_fac_adaptermodel= and set meta_lin_adaptermodel=””. For using both adapters, just set the arguments meta_fac_adaptermodel and meta_lin_adaptermodel as the path of adapters.
  • The pretrained RoBERTa will be downloaded automaticly when you run the pipeline.

1. Entity Typing

1.1 OpenEntity

One single 16G P100

(1) run the pipeline

bash run_finetune_openentity_adapter.sh

(2) result

  • with fac-adapter dev: (0.7967123287671233, 0.7580813347236705, 0.7769169115682607) test: (0.7929708951125755, 0.7584033613445378, 0.7753020134228187)
  • with lin-adapter dev: (0.8071672354948806, 0.7398331595411888, 0.7720348204570185) test:(0.8001135718341851, 0.7400210084033614, 0.7688949522510232)
  • with fac-adapter + lin-adapter dev: (0.8001101321585903, 0.7575599582898853, 0.7782538832351366) test: (0.7899568034557235, 0.7627737226277372, 0.7761273209549072)

the results may vary when running on different machines, but should not differ too much. I just search results from per_gpu_train_batch_sizeh: [4, 8] lr: [1e-5, 5e-6], warmup[0,200,500,1000,1200], maybe you can change other parameters and see the results. For w/fac-adapter, the best performance is achieved at gpu_num=1, per_gpu_train_batch_size=4, lr=5e-6, warmup=500(it takes about 2 hours to get the best result running on singe 16G P100) For w/lin-adapter, the best performance is achieved at gpu_num=1, per_gpu_train_batch_size=4, lr=5e-6, warmup=1000(it takes about 2 hours to get the best result running on singe 16G P100)

(3) Data format

Add special token "@" before and after a certain entity, then the first @ is adopted to perform classification. 9 entity categories: ['entity', 'location', 'time', 'organization', 'object', 'event', 'place', 'person', 'group'], each entity can be classified to several of them or none of them. The output is represented as [0,1,1,0,1,0,0,0,0], 0 represents the entity does not belong to the type, while 1 belongs to.

1.2 FIGER

(1) run the pipeline

bash run_finetune_figer_adapter.sh

The detailed hyperparamerters are listed in the running script.

2. Relation Classification

4*16G P100

(1) run the pipeline

bash run_finetune_tacred_adapter.sh

(2) result

  • with fac-adapter

    • 'dev': (0.6686945083853996, 0.7481604120676968, 0.7061989928807085)
    • 'test': (0.693900391717963, 0.7458646616541353, 0.7189447746050153)
  • with lin-adapter

    • 'dev': (0.6679165308118683, 0.7536791758646063, 0.7082108902333621),
    • 'test': (0.6884615384615385, 0.7536842105263157, 0.7195979899497488)
  • with fac-adapter + lin-adapter

    • 'dev': (0.6793893129770993, 0.7367549668874173, 0.7069102462271645)
    • 'test': (0.7014245014245014, 0.7404511278195489, 0.7204096561814192)
  • the results may vary when running on different machines, but should not differ too much.

  • I just search results from per_gpu_train_batch_sizeh: [4, 8] lr: [1e-5, 5e-6], warmup[0,200,1000,1200], maybe you can change other parameters and see the results.

  • The best performance is achieved at gpu_num=4, per_gpu_train_batch_size=8, lr=1e-5, warmup=200 (it takes about 7 hours to get the best result running on 4 16G P100)

  • The detailed hyperparamerters are listed in the running script.

(3) Data format

Add special token "@" before and after the first entity, add '#' before and after the second entity. Then the representations of @ and # are concatenated to perform relation classification.

3. Question Answering

3.1 CosmosQA

One single 16G P100

(1) run the pipeline

bash run_finetune_cosmosqa_adapter.sh

(2) result

CosmosQA dev accuracy: 80.9 CosmosQA test accuracy: 81.8

The best performance is achieved at gpu_num=1, per_gpu_train_batch_size=64, GRADIENT_ACC=32, lr=1e-5, warmup=0 (it takes about 8 hours to get the best result running on singe 16G P100) The detailed hyperparamerters are listed in the running script.

(3) Data format

For each answer, the input is contextquestionanswer, and will get a score for this answers. After getting four scores, we will select the answer with the highest score.

3.2 SearchQA and Quasar-T

The source codes for fine-tuning on SearchQA and Quasar-T dataset are modified based on the code of paper "Denoising Distantly Supervised Open-Domain Question Answering".

Use K-Adapter just like RoBERTa

  • You can use K-Adapter (RoBERTa with adapters) just like RoBERTa, which almost have the same inputs and outputs. Specifically, we add a class RobertawithAdapter in pytorch_transformers/my_modeling_roberta.py.
  • A demo code [run_example.sh and examples/run_example.py] about how to use “RobertawithAdapter”, do inference, save model and load model. You can leave the arguments of adapters as default.
  • Now it is very easy to use Roberta with adapters. If you only want to use single adapter, for example, fac-adapter, then you can set the argument meta_fac_adaptermodel='./pretrained_models/fac-adapter/pytorch_model.bin'' and set meta_lin_adaptermodel=””. If you want to use both adapters, just set the arguments meta_fac_adaptermodel and meta_lin_adaptermodel as the path of adapters.
bash run_example.sh

TODO

  • Remove and merge redundant codes
  • Support other pre-trained models, such as BERT...

Contact

Feel free to contact Ruize Wang ([email protected]) if you have any further questions.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
ANEA: Distant Supervision for Low-Resource Named Entity Recognition

ANEA: Distant Supervision for Low-Resource Named Entity Recognition ANEA is a tool to automatically annotate named entities in unlabeled text based on

Saarland University Spoken Language Systems Group 15 Mar 30, 2022
This repository contains code demonstrating the methods outlined in Path Signature Area-Based Causal Discovery in Coupled Time Series presented at Causal Analysis Workshop 2021.

signed-area-causal-inference This repository contains code demonstrating the methods outlined in Path Signature Area-Based Causal Discovery in Coupled

Will Glad 1 Mar 11, 2022
A tensorflow model that predicts if the image is of a cat or of a dog.

Quick intro Hello and thank you for your interest in my project! This is the backend part of a two-repo application. The other part can be found here

Tudor Matei 0 Mar 08, 2022
Rlmm blender toolkit - A set of tools to streamline level generation in UDK straight from Blender

rlmm_blender_toolkit A set of tools to streamline level generation in UDK straig

Rocket League Mapmaking 0 Jan 15, 2022
HybVIO visual-inertial odometry and SLAM system

HybVIO A visual-inertial odometry system with an optional SLAM module. This is a research-oriented codebase, which has been published for the purposes

Spectacular AI 320 Jan 03, 2023
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

Aspuru-Guzik group repo 55 Nov 29, 2022
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Quan Nguyen 7 May 11, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP Russian Diffusio

AI Forever 232 Jan 04, 2023
A plug-and-play library for neural networks written in Python

A plug-and-play library for neural networks written in Python!

Dimos Michailidis 2 Jul 16, 2022
Static-test - A playground to play with ideas related to testing the comparability of the code

Static test playground ⚠️ The code is just an experiment. Compiles and runs on U

Igor Bogoslavskyi 4 Feb 18, 2022
Occlusion robust 3D face reconstruction model in CFR-GAN (WACV 2022)

Occlusion Robust 3D face Reconstruction Yeong-Joon Ju, Gun-Hee Lee, Jung-Ho Hong, and Seong-Whan Lee Code for Occlusion Robust 3D Face Reconstruction

Yeongjoon 31 Dec 19, 2022
Learning to Prompt for Continual Learning

Learning to Prompt for Continual Learning (L2P) Official Jax Implementation L2P is a novel continual learning technique which learns to dynamically pr

Google Research 207 Jan 06, 2023
This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

PFD:Pose-guided Feature Disentangling for Occluded Person Re-identification based on Transformer This repo is the official implementation of "Pose-gui

Tao Wang 93 Dec 18, 2022
Pointer-generator - Code for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks

Note: this code is no longer actively maintained. However, feel free to use the Issues section to discuss the code with other users. Some users have u

Abi See 2.1k Jan 04, 2023
Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul

Jungwoo Park 40 Dec 22, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
A Domain-Agnostic Benchmark for Self-Supervised Learning

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning This repository contains the code for DABS, a benchmark for domain-agnostic self-superv

Alex Tamkin 81 Dec 09, 2022
The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier')

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography

James 135 Dec 23, 2022
A pytorch-based deep learning framework for multi-modal 2D/3D medical image segmentation

A 3D multi-modal medical image segmentation library in PyTorch We strongly believe in open and reproducible deep learning research. Our goal is to imp

Adaloglou Nikolas 1.2k Dec 27, 2022
DeepMReye: magnetic resonance-based eye tracking using deep neural networks

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

73 Dec 21, 2022