Complete the code of prefix-tuning in low data setting

Last update: Jul 11, 2022

Related tags

Overview

Prefix Tuning

Note:

作者在论文中提到使用真实的word去初始化prefix的操作（Initializing the prefix with activations of real words，significantly improves generation）。我在使用作者提供的代码时遇到了一些问题，因此按照代码的思路添加了利用真实词汇进行初始化的内容。

可以采用以下的方式运行：

Train

cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800 --use_lowdata_token 'yes' --lowdata_token 'summarize'

其中use_lowdata_token表示是否采用real word初始化的方式；lowdata_token表示传入的real word.

Decode

cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training} --use_lowdata_token 'yes' --lowdata_token 'summarize'

Files:

.
├── gpt2                          # Code for GPT2 style autoregressive LM
│   ├── train_e2e.py              # high-level scripts to train.
│   ├── train_control.py          # code that implements prefix-tuning.
│   ├── trainer_prefix.py         # trainer code for the training loop. 
│   ├── run_language_modeling.py  # training code (contains data loading, model loading, and calls trainer)
│   ├── gen.py                    # high-level scripts to decode. 
│   └── run_generation.py         # decoding code. 
│
├── seq2seq                       # Code for encoder-decoder architecture
│   ├── train_bart.py             # high-level scripts to train.
│   ├── prefixTuning.py           # code that implements prefix-tuning.
│   ├── finetune.py               # training code (contains data loading, model loading, and calls trainer)   
│   ├── lightning_base.py         # helper code
│   ├── utils.py                  # helper code
│   └── callbacks.py              # helper code
└── ...

To run the code for GPT2 style autoregressive LM, the code is in gpt2/. This corresponds to the table-to-text experiments in the paper.

To run the code for encoder-decoder architecture like BART, the code is in seq2seq. This corresponds to the summarization experiments in the paper.

The two primary scripts I used to run my codes are gpt2/train_e2e.py (for table-to-text) and seq2seq/train_bart.py(for summarization). they are set to default of good hyperparameters, and can be used to tune hyperparameter :)

Setup:

cd transformer; pip install -e .

Train via prefix-tuning:

cd gpt2;

python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800

Other baseline approaches

cd gpt2;

python train_e2e.py --tuning_mode {finetune/adaptertune} --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq;

python train_e2e.py --tuning_mode finetune --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

Decode:

cd gpt2;

python gen.py {data2text/webnlg/...} yes test {checkpoint_path} no

cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training}

For details of the methods and results, please refer to our paper.

@misc{li2021prefixtuning,
      title={Prefix-Tuning: Optimizing Continuous Prompts for Generation}, 
      author={Xiang Lisa Li and Percy Liang},
      year={2021},
      eprint={2101.00190},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Complete the code of prefix-tuning in low data setting

Related tags

Overview

Prefix Tuning

Note:

Train

Decode

Files:

Setup:

Train via prefix-tuning:

Decode:

Owner

Andrew Zeng

This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Code for classifying international patents based on the text of their titles/abstracts

PyTorch implementation of a Real-ESRGAN model trained on custom dataset

Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

PyTorch Lightning implementation of Automatic Speech Recognition

Face recognition with trained classifiers for detecting objects using OpenCV

Implementing Vision Transformer (ViT) in PyTorch

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Some toy examples of score matching algorithms written in PyTorch

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

A Pytorch loader for MVTecAD dataset.

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Leaderboard and Visualization for RLCard

This repository contains the source code for the paper First Order Motion Model for Image Animation

Pretraining on Dynamic Graph Neural Networks

Geometry-Free View Synthesis: Transformers and no 3D Priors

MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

2021 National Underwater Robotics Vision Optics