Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.

Last update: Jan 03, 2023

Related tags

Overview

Adapter-BERT

Introduction

This repository contains a version of BERT that can be trained using adapters. Our ICML 2019 paper contains a full description of this technique: Parameter-Efficient Transfer Learning for NLP.

Adapters allow one to train a model to solve new tasks, but adjust only a few parameters per task. This technique yields compact models that share many parameters across tasks, whilst performing similarly to fine-tuning the entire model independently for every task.

The code here is forked from the original BERT repo. It provides our version of BERT with adapters, and the capability to train it on the GLUE tasks. For additional details on BERT, and support for additional tasks, see the original repo.

Tuning BERT with Adapters

The following command provides an example of tuning with adapters on GLUE.

Fine-tuning may be run on a GPU with at least 12GB of RAM, or a Cloud TPU. The same constraints apply as for full fine-tuning of BERT. For additional details, and instructions on downloading a pre-trained checkpoint and the GLUE tasks, see https://github.com/google-research/bert.

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue

python run_classifier.py \
  --task_name=MRPC \
  --do_train=true \
  --do_eval=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=3e-4 \
  --num_train_epochs=5.0 \
  --output_dir=/tmp/adapter_bert_mrpc/

You should see an output like this:

***** Eval results *****
  eval_accuracy = 0.85784316
  eval_loss = 0.48347527
  global_step = 573
  loss = 0.48347527

This means that the Dev set accuracy was 85.78%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre-training checkpoint. Therefore results may deviate from this by 2%.

Citation

Please use the following citation for this work:

@inproceedings{houlsby2019parameter,
  title = {Parameter-Efficient Transfer Learning for {NLP}},
  author = {Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and De Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain},
  booktitle = {Proceedings of the 36th International Conference on Machine Learning},
  year = {2019},
}

The paper is uploaded to ArXiv.

Disclaimer

This is not an official Google product.

Contact information

For personal communication, please contact Neil Houlsby ([email protected]).

Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.

Related tags

Overview

Adapter-BERT

Introduction

Tuning BERT with Adapters

Citation

Disclaimer

Contact information

Owner

Google Research

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

Workshop Materials Delivered on 28/02/2022

The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

DGCNN - Dynamic Graph CNN for Learning on Point Clouds

My implementation of Image Inpainting - A deep learning Inpainting model

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Official project repository for 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination'

Extracts data from the database for a graph-node and stores it in parquet files

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Face Library is an open source package for accurate and real-time face detection and recognition

CSKG is a commonsense knowledge graph that combines seven popular sources into a consolidated representation

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

CDGAN: Cyclic Discriminative Generative Adversarial Networks for Image-to-Image Transformation

Official repository of ICCV21 paper "Viewpoint Invariant Dense Matching for Visual Geolocalization"

This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Scripts and misc. stuff related to the PortSwigger Web Academy

Learning Representational Invariances for Data-Efficient Action Recognition

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

Real Time Object Detection and Classification using Yolo Algorithm.

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"