Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Last update: Dec 27, 2022

Overview

DART

Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners.

Environment

[email protected]
Use pip install -r requirements.txt to install dependencies.
wandb account is required if the user wants to search for best hyper-parameter combinations.

Data source

16-shot GLUE dataset from LM-BFF.
Generated data consists of 5 random splits (13/21/42/87/100) for a task, each has 16 samples.

How to run

To run across each 5 splits in a task, use run.py:
- In the arguments, encoder="inner" is the method proposed in the paper where verbalizers are other trainable tokens; encoder="manual" means verbalizers are selected fixed tokens; encoder="lstm" refers to the P-Tuning method.

$ python run.py -h
usage: run.py [-h] [--encoder {manual,lstm,inner,inner2}] [--task TASK]
              [--num_splits NUM_SPLITS] [--repeat REPEAT] [--load_manual]
              [--extra_mask_rate EXTRA_MASK_RATE]
              [--output_dir_suffix OUTPUT_DIR_SUFFIX]

optional arguments:
  -h, --help            show this help message and exit
  --encoder {manual,lstm,inner,inner2}
  --task TASK
  --num_splits NUM_SPLITS
  --repeat REPEAT
  --load_manual
  --extra_mask_rate EXTRA_MASK_RATE
  --output_dir_suffix OUTPUT_DIR_SUFFIX, -o OUTPUT_DIR_SUFFIX

To train and evaluate on a single split with details recorded, use inference.py.
- Before running, [task_name, label_list, prompt_type] should be configured in the code.
- prompt_type="none" refers to fixed verbalizer training, while "inner" refers to the method proposed in the paper. ("inner2" is deprecated 2-stage training)
To find optimal hyper-parameters for each task-split and reproduce our result, please use sweep.py:
- Please refer to documentation for WandB for more details.

$ python sweep.py -h
usage: sweep.py [-h]
                [--task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}]
                [--encoder {none,mlp,lstm,inner,inner2}]
                [--seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]]
                [--batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]]
                [--sweep_id SWEEP_ID]

optional arguments:
  -h, --help            show this help message and exit
  --task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}
  --encoder {none,mlp,lstm,inner,inner2}
  --seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]
  --batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]
  --sweep_id SWEEP_ID

To train and evaluate with more customized configurations, use cli.py.
To analyze and visualize the results come from inference.py, use visualize.py and visualize_word_emb.py.

How to Cite

@article{DBLP:journals/corr/abs-2108-13161,
  author    = {Ningyu Zhang and
               Luoqiu Li and
               Xiang Chen and
               Shumin Deng and
               Zhen Bi and
               Chuanqi Tan and
               Fei Huang and
               Huajun Chen},
  title     = {Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
               Learners},
  journal   = {CoRR},
  volume    = {abs/2108.13161},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.13161},
  eprinttype = {arXiv},
  eprint    = {2108.13161},
  timestamp = {Thu, 13 Jan 2022 17:33:17 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2108-13161.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Related tags

Overview

DART

Environment

Data source

How to run

How to Cite

Owner

ZJUNLP

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Neural-fractal - Create Fractals Using Complex-Valued Neural Networks!

Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

Sarus implementation of classical ML models. The models are implemented using the Keras API of tensorflow 2. Vizualization are implemented and can be seen in tensorboard.

Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2021

A CV toolkit for my papers.

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Txt2Xml tool will help you convert from txt COCO format to VOC xml format in Object Detection Problem.

Music Generation using Neural Networks Streamlit App

A simple, unofficial implementation of MAE using pytorch-lightning

Realtime micro-expression recognition using OpenCV and PyTorch

Measuring if attention is explanation with ROAR

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Semantic Segmentation Suite in TensorFlow

Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021

Multi-modal Vision Transformers Excel at Class-agnostic Object Detection