Prefix Tuning

Files:

.
├── gpt2                          # Code for GPT2 style autoregressive LM
│   ├── train_e2e.py              # high-level scripts to train.
│   ├── train_control.py          # code that implements prefix-tuning.
│   ├── trainer_prefix.py         # trainer code for the training loop. 
│   ├── run_language_modeling.py  # training code (contains data loading, model loading, and calls trainer)
│   ├── gen.py                    # high-level scripts to decode. 
│   └── run_generation.py         # decoding code. 
│
├── seq2seq                       # Code for encoder-decoder architecture
│   ├── train_bart.py             # high-level scripts to train.
│   ├── prefixTuning.py           # code that implements prefix-tuning.
│   ├── finetune.py               # training code (contains data loading, model loading, and calls trainer)   
│   ├── lightning_base.py         # helper code
│   ├── utils.py                  # helper code
│   └── callbacks.py              # helper code
└── ...

To run the code for GPT2 style autoregressive LM, the code is in gpt2/. This corresponds to the table-to-text experiments in the paper.

To run the code for encoder-decoder architecture like BART, the code is in seq2seq. This corresponds to the summarization experiments in the paper.

The two primary scripts I used to run my codes are gpt2/train_e2e.py (for table-to-text) and seq2seq/train_bart.py(for summarization). they are set to default of good hyperparameters, and can be used to tune hyperparameter :)

Setup:

cd transformer; pip install -e .

Train via prefix-tuning:

cd gpt2;

python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800

Other baseline approaches

cd gpt2;

python train_e2e.py --tuning_mode {finetune/adaptertune} --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq;

python train_e2e.py --tuning_mode finetune --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

Decode:

cd gpt2;

python gen.py {data2text/webnlg/...} yes test {checkpoint_path} no

cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training}

For details of the methods and results, please refer to our paper.

@misc{li2021prefixtuning,
      title={Prefix-Tuning: Optimizing Continuous Prompts for Generation}, 
      author={Xiang Lisa Li and Percy Liang},
      year={2021},
      eprint={2101.00190},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Related tags

Overview

Prefix Tuning

Files:

Setup:

Train via prefix-tuning:

Decode:

Owner

Differentiable Abundance Matching With Python

The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

Facial recognition project

StellarGraph - Machine Learning on Graphs

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

A small library of 3D related utilities used in my research.

Simple Dynamic Batching Inference

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

Official Implementation for Fast Training of Neural Lumigraph Representations using Meta Learning.

Learning Logic Rules for Document-Level Relation Extraction

StrongSORT: Make DeepSORT Great Again

Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

Time Series Cross-Validation -- an extension for scikit-learn

Bayesian Meta-Learning Through Variational Gaussian Processes

[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PConv-Keras - Unofficial implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions". Try at: www.fixmyphoto.ai

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch