当前位置:网站首页>Transformer XL: attention language modelsbbeyond a fixed length context paper summary
Transformer XL: attention language modelsbbeyond a fixed length context paper summary
2022-04-23 08:22:00 【A grain of sand in the vast sea of people】
Paper:Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context
Code:Transformer-XL code
1. Brief introduction of the paper
Transfomer-XL = Transformer Extra Long
2. What is? Transformer
XLNet Used Transformer-XL Medium Segment Recurrence Mechanism ( Segmental circulation ) and Relative Positional Encoding ( Relative position coding ) To optimize .
Segment Recurrence Mechanism The segment loop mechanism will save the information output from the previous text , Calculation for the current text , So that the model can have broader context information .
After introducing the previous information , There may be two token Have the same location information , For example, the position information of the first word in the previous paragraph is the same as that of the current paragraph . therefore Transformer-XL Adopted Relative Positional Encoding ( Relative position coding ) , No fixed position , Instead, we use the relative positions of words to encode .
3. Vanilla transfomer langange models Brief introduction and disadvantages
3.1 Brief introduction
3.2 shortcoming
3.2.1 Training with the Vanilla Model (Vanila The training phase of )
1. Tokens at the beginning of each segment do not have sufficent context for proper optimization.
2. Limited by a fixed-length context
3.2.2 Evaluation with the Vanilla Model
1. Longest context limited by segment length.
2. very expensive due to recomputation.
3.2.3. Temporal Incoherence
4. Transformer-XL Contribution or major improvement
4.1 Transformer-XL Introduce
4.1.1 Training with Transformer-XL
4.1.2 Evaluation with Transformer-XL
4.1.3. Solution: Relative Positional Encodings
Benefits:
1. Allows recurrence mechanism
2. Better generalization
-> WordLM: Train with memory length 150 , evaluate with 640
-> CharLM: Train with memory length 680, evalute with 3800
4.1 Segment-level Recurrence
Cache and reuse hidden states from last batch
Analogous to Truncated BPTT for RNN : pass the last hidden state to the next segment as the initial hidden
4.2. Keep Temporal information coherenet
5. summary
Reference material
Transformer-XL_ Attentive Language Models beyond a Fixed-Length Context_ Bili, Bili _bilibili
版权声明
本文为[A grain of sand in the vast sea of people]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230704087027.html
边栏推荐
猜你喜欢
ATSS(CVPR2020)
【学习】从零开始的音视频开发(9)——NuPlayer
Qtablewidget header customization and beautification developed by pyqt5 (with source code download)
Idea: export Yapi interface using easyyapi plug-in
vslam PPT
AQS & ReentrantLock 实现原理
Using qlst excel file
[appium] encountered the problem of switching the H5 page embedded in the mobile phone during the test
一键清理项目下pycharm和Jupyter缓存文件
Data security has become a hidden danger. Let's see how vivo can make "user data" armor again
随机推荐
Data deletion and modification (MySQL)
[C语言] 文件操作《一》
form表单 post提交 数据量大的问题
Somme numérique de la chaîne de calcul pour un problème simple de leetcode
Comparison of indoor positioning technology
线程的调度(优先级)
Flink SQL实现流批一体
Qtablewidget header customization and beautification developed by pyqt5 (with source code download)
編譯原理題-帶答案
rust 使用tokio的Notify 和timeout实现类似可超时条件变量的效果
My heart's broken! A woman's circle of friends envied others for paying wages on time and was fired. Even her colleagues who liked her were fired together
Green apple film and television system source code film and television aggregation film and television navigation film and television on demand website source code
An example of network communication based on TCP / IP protocol -- file transmission
2022.4.11-4.17 AI行业周刊(第93期):AI行业的困局
QFileDialog 选择多个文件或文件夹
程序,进程,线程;内存结构图;线程的创建和启动;Thread的常用方法
关于ORB——SLAM运行中关键帧位置越来越近的异常说明
npm安装yarn
Idea: export Yapi interface using easyyapi plug-in
DOM 学习之—添加+-按钮