当前位置:网站首页>Ai21 labs | standing on the shoulders of giant frozen language models
Ai21 labs | standing on the shoulders of giant frozen language models
2022-04-23 13:30:00 【Zhiyuan community】
author :Yoav Levine , Itay Dalmedigos , Ori Ram , etc.
brief introduction : Huge pre training language model (LM) Demonstrated surprisingly excellent zero sample ability in various tasks . This creates a single 、 Attractive vision of a multifunctional model , The model has a wide range of functions in different applications . However , At present, we use “ frozen ”LM Leading technology —— namely , Keep its weight unchanged —— It's still often better to fine tune these weights in a task related way . In turn, , If you endure forgetting and impairing versatility , This indicates that there will be a trade-off between performance and versatility . The main content of this paper is , Current freezing model technology ( For example, quick adjustment ) It's just the tip of the iceberg , More powerful use of freezing LM Our approach can be fine tuned in challenging areas , Without sacrificing the versatility of the underlying model . To prove it , The author introduces three new methods of using freezing model : The prompt adjustment depends on the input PromptTuning、 Freeze reader frozen readers、 And recursive language model recursive LMs; Each method greatly improves the current freezing model method . in fact , Some of the author's methods are even better than the fine-tuning method in the current dominant field . The computational cost of each method is higher than that of the existing freezing model methods , But compared to a single pass through a huge freeze LM Still negligible . Each of these methods constitutes a meaningful contribution in itself . Please refer to the paper for details .




Paper download :https://arxiv.org/pdf/2204.10019
版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231326414398.html
边栏推荐
- Solve the problem that Oracle needs to set IP every time in the virtual machine
- Plato farm, a top-level metauniverse game, has made frequent positive moves recently
- Request和Response及其ServletContext总结
- Mui + hbuilder + h5api simulate pop-up payment style
- 5 tricky activity life cycle interview questions. After learning, go and hang the interviewer!
- [point cloud series] summary of papers related to implicit expression of point cloud
- 交叉碳市场和 Web3 以实现再生变革
- X509 parsing
- POM of SSM integration xml
- 初鉴canvas,展示个小小的小案例
猜你喜欢

Request和Response及其ServletContext总结
![[point cloud series] multi view neural human rendering (NHR)](/img/40/dc042a42710096b66f3c173f04adc4.png)
[point cloud series] multi view neural human rendering (NHR)

UEFI learning 01-arm aarch64 compilation, armplatformpripeicore (SEC)

浅谈js正则之test方法bug篇

The filter() traverses the array, which is extremely friendly

缘结西安 | CSDN与西安思源学院签约,全面开启IT人才培养新篇章

MySQL 8.0.11 download, install and connect tutorials using visualization tools

Machine learning -- PCA and LDA

十万大学生都已成为猿粉,你还在等什么?

Isparta is a tool that generates webp, GIF and apng from PNG and supports the transformation of webp, GIF and apng
随机推荐
MySQL5. 5 installation tutorial
Armv8m (cortex M33) MPU actual combat
[quick platoon] 215 The kth largest element in the array
初鉴canvas,展示个小小的小案例
uniapp image 引入本地图片不显示
顶级元宇宙游戏Plato Farm,近期动作不断利好频频
[point cloud series] unsupervised multi task feature learning on point clouds
Isparta is a tool that generates webp, GIF and apng from PNG and supports the transformation of webp, GIF and apng
叮~ 你的奖学金已到账!C认证企业奖学金名单出炉
2020年最新字节跳动Android开发者常见面试题及详细解析
2020最新Android大厂高频面试题解析大全(BAT TMD JD 小米)
【快排】215. 数组中的第K个最大元素
数据仓库—什么是OLAP
The difference between string and character array in C language
[multi screen interaction] realize dual multi screen display II: startactivity mode
Bottomsheetdialogfragment + viewpager + fragment + recyclerview sliding problem
Request和Response及其ServletContext总结
Using open to open a file in JNI returns a - 1 problem
校园外卖系统 - 「农职邦」微信原生云开发小程序
Comparison and summary of applicable scenarios of Clickhouse and MySQL database