当前位置:网站首页>5-minute NLP: text to text transfer transformer (T5) unified text to text task model
5-minute NLP: text to text transfer transformer (T5) unified text to text task model
2022-04-23 16:35:00 【deephub】
This article will explain the following terms :T5,C4,Unified Text-to-Text Tasks
Transfer learning in NLP The effectiveness in comes from the model of pre training rich unmarked text data with self-monitoring tasks , For example, language modeling or filling in missing words . After pre training , You can fine tune the model on smaller labeled datasets , Generally better performance than training with tagged data alone . Transfer learning is such as GPT,Bert,XLNet,Roberta,Albert and Reformer As proved by the model .
Text-To-Text Transfer Transformer (T5)
The paper “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”(2019 Published in ) Put forward a large-scale empirical survey , It shows which transfer learning technology is the most effective , And apply these insights to create new ones called Text-To-Text Transfer Transformer (T5) Model .
An important part of migration learning is unlabeled data sets for pre training , This should not only be of high quality and diversity , And there should be a lot of . Previous pre training data sets did not meet all three criteria , because :
- Wikipedia High quality of text , But the style is uniform , The purpose suitable for us is relatively small
- come from Common Crawl Web The captured text is huge , Highly diverse , But the quality is relatively low .
So a new data set is developed in this paper : Colossal Clean Crawled Corpus (C4), This is a Common Crawl Of “ clean ” edition , Two orders of magnitude larger than Wikipedia .
stay C4 Pre trained T5 Models can be used in many NLP Get the most advanced results on the benchmark , And flexible enough , Several downstream tasks can be fine tuned .
Unify text to text format
Use T5, all NLP Tasks can be converted into a unified text to text format , The input and output of the task is always a text string .
The framework provides consistent training objectives , For pre training and fine tuning . Whatever the task , The models all have maximum likelihood targets . If you want to specify what kind of tasks the model should , You need to identify the target of the task before sending it to the model . Added to the original input sequence as a specific text prefix .
This framework allows for any NLP Use the same model on the task 、 Loss functions and hyperparameters , For example, machine translation 、 Document summary 、 Q & A and classification tasks .

Compare different models and training strategies
T5 The paper provides a variety of model architectures , Pre training objectives , Data sets , Comparison of training strategy and scale level . The baseline model for comparison is the standard encoder decoder Transformer.
- Model architecture : Although some about NLP The work of transfer learning has been considered Transformer Architectural variants of , But the original encoder - The decoder form works perfectly in experiments with text to text frames .
- Pre training objectives : Most denoising target training models will reconstruct randomly damaged text , stay T5 Similar operations are also performed in the settings of . therefore , It is suggested to use unsupervised pre training to increase computational efficiency , For example, the deprivation goal of filling the gap .
- Unlabeled dataset : Training of in domain data may be beneficial , However, pre training of small data sets may lead to harmful over fitting , Especially when the data set is small enough , Repeat several times during pre training . This has prompted people to use things like C4 Such a large and diverse data set to complete the task of general language understanding .
- Training strategy : Fine tune after training tasks , It can produce a good performance improvement for unsupervised pre training .
- Scale horizontal scaling : Various strategies using additional computing are compared , Include more data , Larger models , And use the integration of the model . Each method can improve the performance , But train a smaller model with more data , It's often better than training a larger model with fewer steps .
It turns out that , The text method is successfully applied to the generation task ( for example , Abstract abstract ), Classification task ( For example, natural language inference ), Even the return mission , It has considerable performance for task specific architecture and state .
The final T5 Model
Combined with experimental insights , The author uses different dimensions ( As many as 110 One hundred million parameters ) Training models , And achieve the most advanced results in many benchmarks . These models are in C4 Pre trained on the dataset , Then before fine tuning individual tasks , Pre training on multi task mix .
The biggest model is GLUE, SuperGLUE, SQuAD, and CNN/Daily Mail When the test reaches the most advanced results .
summary
In this paper , It introduces Text-To-Text Transfer Transformer (T5) Models and Colossal Clean Crawled Corpus (C4) Data sets . At the same time, examples of different tasks are introduced , This is called the unified text to text task , And see the qualitative experimental results of performance with different model architectures and training strategies .
If you're interested in this , You can try the following work by yourself :
- understand T5 Subsequent improvements to the model , Such as T5v1.1( With some architectural adjustments T5 Improved version ),MT5( Multilingual T5 Model ) and BYT5( Pre trained on byte sequences T5 Model, not token Sequence )
- You can see Hugging Face Of T5 Implement and fine tune
https://www.overfit.cn/post/a0e9aaeaabf04087a278aea6f06d14d6
author :Fabio Chiusano
版权声明
本文为[deephub]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231624506733.html
边栏推荐
- Differences between MySQL BTREE index and hash index
- About background image gradient()!
- How to upgrade openstack across versions
- LVM与磁盘配额
- 建站常用软件PhpStudy V8.1图文安装教程(Windows版)超详细
- What is the experience of using prophet, an open source research tool?
- ES常用查询、排序、聚合语句
- Gartner 發布新興技術研究:深入洞悉元宇宙
- Sail soft calls the method of dynamic parameter transfer and sets parameters in the title
- Day (9) of picking up matlab
猜你喜欢

TIA botu - basic operation

深度学习100例 | 第41天-卷积神经网络(CNN):UrbanSound8K音频分类(语音识别)

Interview question 17.10 Main elements

Countdown 1 day ~ 2022 online conference of cloud disaster tolerance products is about to begin

Research and Practice on business system migration of a government cloud project

Creation of RAID disk array and RAID5

磁盘管理与文件系统

Using JSON server to create server requests locally

Solution of garbled code on idea console

Take according to the actual situation, classify and summarize once every three levels, and see the figure to know the demand
随机推荐
Use if else to judge in sail software - use the title condition to judge
Interview question 17.10 Main elements
Jour (9) de ramassage de MATLAB
Server log analysis tool (identify, extract, merge, and count exception information)
JSP learning 2
Real time operation of vim editor
JSP learning 1
How to quickly batch create text documents?
100 deep learning cases | day 41 - convolutional neural network (CNN): urbansound 8K audio classification (speech recognition)
homwbrew安装、常用命令以及安装路径
logback的配置文件加载顺序
What does cloud disaster tolerance mean? What is the difference between cloud disaster tolerance and traditional disaster tolerance?
面试题 17.10. 主要元素
About background image gradient()!
Passing header request header information between services through feign
Cloud migration practice in the financial industry Ping An financial cloud integrates hypermotion cloud migration solution to provide migration services for customers in the financial industry
VMware Workstation cannot connect to the virtual machine. The system cannot find the specified file
Solution to the fourth "intelligence Cup" National College Students' IT skills competition (group B of the final)
Nacos detailed explanation, something
最詳細的背包問題!!!