当前位置:网站首页>Making Pre-trained Language Models Better Few-Shot Learners
Making Pre-trained Language Models Better Few-Shot Learners
2022-08-10 17:49:00 【hithithithithit】
Table of Contents
Abstract
Using natural language prompts and task demonstrations as additional information to insert into the input text makes good use of the knowledge in the GPT-3 model.Therefore, this paper proposes the application of few samples in small models.Our approach includes prompt-based fine-tuning while using auto-generated prompts; for task demonstrations, we also redefine a dynamic and selective way to incorporate them into context.
Introduction
While GPT-3 can perform well using only cues and task examples without updating weights, the GPT-3 model is large and cannot be applied to real-world scenarios for fine-tuning.Therefore, this paper proposes to use only a small number of samples to fine-tune the model on small models such as BERT.The authors took inspiration from GPT-3 and used prompt and in-context to optimize both input and output. They used brute force search to get some better-performing answer words, and used T5 to generate prompt templates, which they saidThis method is cheap?Is it cheap to use T5 to generate a template separately?Due to the limitation of input length, they find a good demonstration for each class.Feeling nothing new?GPT-3 is really copied!!!
Methods
label words
Gao et al. (2021) used a pre-training model without fine-tuning to obtain the optimal K candidate words, which were used as the pruned answer word space.Then they further fine-tune the model on the training set to search for n better answer words in this space.Finally, an optimal answer word is obtained according to the results of the validation set.
Prompt template
Gao et al. (2021) Consider the prompt template generation problem as a text generation task, use T5(Raffel et al, 2020) as the generator model.They concatenate the raw input and output as T5 (Raffel et al, 2020) model, then they used beam search to generate multiple prompt templates, fine-tuned on the dev set to get a prompt template with the best performance, and they alsoThe prompt template obtained by beam search is used for the learning of the ensemble model.
Demonstrations
I don't want to read it, it's boring, just insert an example into the input by sampling each class, refer to GPT-3.
Experiments
I've done a lot of experiments, but it's okay, I don't know much about these data sets, let's see for yourself
边栏推荐
猜你喜欢
随机推荐
Redis下载安装教程 (windows)
vvic API 接入说明
机器人控制器编程实践指导书旧版-实践四 步进电机(执行器)
期货开户手续费加1分已经是常态
【严重】Nps 鉴权绕过 0day 漏洞
不止跑路,拯救误操作rm -rf /*的小伙儿
招聘分析2020.6.1
等保2.0一个中心三重防护指的是什么?如何理解?
2021强网杯
R语言patchwork包将多个可视化结果组合起来、plot_annotation函数以及tag_level参数将组合图用大写字母进行顺序编码、为组合图的标签添加自定义后缀信息(suffix)
Trie字典树
强网杯2021final
神经网络有哪些激活函数,卷积神经网络有哪些
烟雾、空气质量、温湿度…自己徒手做个环境检测设备
网易云信亮相LiveVideoStackCon2022,解构基于WebRTC的开源低延时播放器实践
自适应模糊神经网络与bp神经网络的区别
ZLMediaKit 服务器源码解读---RTSP推流拉流
JWT 实现登录认证 + Token 自动续期方案
Making Pre-trained Language Models Better Few-Shot Learners
Return the next higher prime number