当前位置:网站首页>5-minute NLP: text to text transfer transformer (T5) unified text to text task model
5-minute NLP: text to text transfer transformer (T5) unified text to text task model
2022-04-23 16:35:00 【deephub】
This article will explain the following terms :T5,C4,Unified Text-to-Text Tasks
Transfer learning in NLP The effectiveness in comes from the model of pre training rich unmarked text data with self-monitoring tasks , For example, language modeling or filling in missing words . After pre training , You can fine tune the model on smaller labeled datasets , Generally better performance than training with tagged data alone . Transfer learning is such as GPT,Bert,XLNet,Roberta,Albert and Reformer As proved by the model .
Text-To-Text Transfer Transformer (T5)
The paper “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”(2019 Published in ) Put forward a large-scale empirical survey , It shows which transfer learning technology is the most effective , And apply these insights to create new ones called Text-To-Text Transfer Transformer (T5) Model .
An important part of migration learning is unlabeled data sets for pre training , This should not only be of high quality and diversity , And there should be a lot of . Previous pre training data sets did not meet all three criteria , because :
- Wikipedia High quality of text , But the style is uniform , The purpose suitable for us is relatively small
- come from Common Crawl Web The captured text is huge , Highly diverse , But the quality is relatively low .
So a new data set is developed in this paper : Colossal Clean Crawled Corpus (C4), This is a Common Crawl Of “ clean ” edition , Two orders of magnitude larger than Wikipedia .
stay C4 Pre trained T5 Models can be used in many NLP Get the most advanced results on the benchmark , And flexible enough , Several downstream tasks can be fine tuned .
Unify text to text format
Use T5, all NLP Tasks can be converted into a unified text to text format , The input and output of the task is always a text string .
The framework provides consistent training objectives , For pre training and fine tuning . Whatever the task , The models all have maximum likelihood targets . If you want to specify what kind of tasks the model should , You need to identify the target of the task before sending it to the model . Added to the original input sequence as a specific text prefix .
This framework allows for any NLP Use the same model on the task 、 Loss functions and hyperparameters , For example, machine translation 、 Document summary 、 Q & A and classification tasks .
Compare different models and training strategies
T5 The paper provides a variety of model architectures , Pre training objectives , Data sets , Comparison of training strategy and scale level . The baseline model for comparison is the standard encoder decoder Transformer.
- Model architecture : Although some about NLP The work of transfer learning has been considered Transformer Architectural variants of , But the original encoder - The decoder form works perfectly in experiments with text to text frames .
- Pre training objectives : Most denoising target training models will reconstruct randomly damaged text , stay T5 Similar operations are also performed in the settings of . therefore , It is suggested to use unsupervised pre training to increase computational efficiency , For example, the deprivation goal of filling the gap .
- Unlabeled dataset : Training of in domain data may be beneficial , However, pre training of small data sets may lead to harmful over fitting , Especially when the data set is small enough , Repeat several times during pre training . This has prompted people to use things like C4 Such a large and diverse data set to complete the task of general language understanding .
- Training strategy : Fine tune after training tasks , It can produce a good performance improvement for unsupervised pre training .
- Scale horizontal scaling : Various strategies using additional computing are compared , Include more data , Larger models , And use the integration of the model . Each method can improve the performance , But train a smaller model with more data , It's often better than training a larger model with fewer steps .
It turns out that , The text method is successfully applied to the generation task ( for example , Abstract abstract ), Classification task ( For example, natural language inference ), Even the return mission , It has considerable performance for task specific architecture and state .
The final T5 Model
Combined with experimental insights , The author uses different dimensions ( As many as 110 One hundred million parameters ) Training models , And achieve the most advanced results in many benchmarks . These models are in C4 Pre trained on the dataset , Then before fine tuning individual tasks , Pre training on multi task mix .
The biggest model is GLUE, SuperGLUE, SQuAD, and CNN/Daily Mail When the test reaches the most advanced results .
summary
In this paper , It introduces Text-To-Text Transfer Transformer (T5) Models and Colossal Clean Crawled Corpus (C4) Data sets . At the same time, examples of different tasks are introduced , This is called the unified text to text task , And see the qualitative experimental results of performance with different model architectures and training strategies .
If you're interested in this , You can try the following work by yourself :
- understand T5 Subsequent improvements to the model , Such as T5v1.1( With some architectural adjustments T5 Improved version ),MT5( Multilingual T5 Model ) and BYT5( Pre trained on byte sequences T5 Model, not token Sequence )
- You can see Hugging Face Of T5 Implement and fine tune
https://www.overfit.cn/post/a0e9aaeaabf04087a278aea6f06d14d6
author :Fabio Chiusano
版权声明
本文为[deephub]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231624506733.html
边栏推荐
- Postman batch production body information (realize batch modification of data)
- 英语 | Day15、16 x 句句真研每日一句(从句断开、修饰)
- Day (6) of picking up matlab
- Findstr is not an internal or external command workaround
- Countdown 1 day ~ 2022 online conference of cloud disaster tolerance products is about to begin
- Oak-d raspberry pie cloud project [with detailed code]
- JSP learning 2
- RAID磁盘阵列与RAID5的创建
- Day (7) of picking up matlab
- 七朋元视界可信元宇宙社交体系满足多元化的消费以及社交需求
猜你喜欢
You need to know about cloud disaster recovery
Day 10 abnormal mechanism
vim编辑器的实时操作
Hypermotion cloud migration helped China Unicom. Qingyun completed the cloud project of a central enterprise and accelerated the cloud process of the group's core business system
Hyperbdr cloud disaster recovery v3 Release of version 3.0 | upgrade of disaster recovery function and optimization of resource group management function
Download and install mongodb
RecyclerView advanced use - to realize drag and drop function of imitation Alipay menu edit page
Sail soft segmentation solution: take only one character (required field) of a string
Questions about disaster recovery? Click here
Force buckle - 198 raid homes and plunder houses
随机推荐
Postman batch production body information (realize batch modification of data)
磁盘管理与文件系统
最详细的背包问题!!!
linux上启动oracle服务
建站常用软件PhpStudy V8.1图文安装教程(Windows版)超详细
最詳細的背包問題!!!
Day (7) of picking up matlab
04 Lua 运算符
You need to know about cloud disaster recovery
Query the data from 2013 to 2021, and only query the data from 2020. The solution to this problem is carried out
Gartner announces emerging technology research: insight into the meta universe
Six scenarios of cloud migration
How to quickly batch create text documents?
Passing header request header information between services through feign
Review 2021: how to help customers clear the obstacles in the last mile of going to the cloud?
Start Oracle service on Linux
Using JSON server to create server requests locally
Set the color change of interlaced lines in cells in the sail software and the font becomes larger and red when the number is greater than 100
Flask如何在内存中缓存数据?
Day (10) of picking up matlab