当前位置:网站首页>Making Pre-trained Language Models Better Few-Shot Learners

Making Pre-trained Language Models Better Few-Shot Learners

2022-08-10 17:49:00 hithithithithit

Table of Contents

Abstract

Introduction

Methods

label words

Prompt template

Demonstrations

Experiments


Abstract

Using natural language prompts and task demonstrations as additional information to insert into the input text makes good use of the knowledge in the GPT-3 model.Therefore, this paper proposes the application of few samples in small models.Our approach includes prompt-based fine-tuning while using auto-generated prompts; for task demonstrations, we also redefine a dynamic and selective way to incorporate them into context.

Introduction

While GPT-3 can perform well using only cues and task examples without updating weights, the GPT-3 model is large and cannot be applied to real-world scenarios for fine-tuning.Therefore, this paper proposes to use only a small number of samples to fine-tune the model on small models such as BERT.The authors took inspiration from GPT-3 and used prompt and in-context to optimize both input and output. They used brute force search to get some better-performing answer words, and used T5 to generate prompt templates, which they saidThis method is cheap?Is it cheap to use T5 to generate a template separately?Due to the limitation of input length, they find a good demonstration for each class.Feeling nothing new?GPT-3 is really copied!!!

Methods

label words

Gao et al. (2021) used a pre-training model without fine-tuning to obtain the optimal K candidate words, which were used as the pruned answer word space.Then they further fine-tune the model on the training set to search for n better answer words in this space.Finally, an optimal answer word is obtained according to the results of the validation set.

Prompt template

Gao et al. (2021) Consider the prompt template generation problem as a text generation task, use T5(Raffel et al, 2020) as the generator model.They concatenate the raw input and output as T5 (Raffel et al, 2020) model, then they used beam search to generate multiple prompt templates, fine-tuned on the dev set to get a prompt template with the best performance, and they alsoThe prompt template obtained by beam search is used for the learning of the ensemble model.

Demonstrations

I don't want to read it, it's boring, just insert an example into the input by sampling each class, refer to GPT-3.

Experiments

I've done a lot of experiments, but it's okay, I don't know much about these data sets, let's see for yourself

原网站

版权声明
本文为[hithithithithit]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/222/202208101723074534.html