当前位置:网站首页>Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context 论文总结
Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context 论文总结
2022-04-23 07:04:00 【茫茫人海一粒沙】
Paper:Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context
Code:Transformer-XL code
1. 论文简介
Transfomer-XL = Transformer Extra Long
2. 什么是Transformer
XLNet 使用了 Transformer-XL 中的 Segment Recurrence Mechanism (段循环) 和 Relative Positional Encoding (相对位置编码) 进行优化。
Segment Recurrence Mechanism 段循环的机制会将上一段文本输出的信息保存下来,用于当前文本的计算,使模型可以拥有更广阔的上下文信息。
在引入上一段信息后,可能会有两个 token 拥有相同的位置信息,例如上一段的第一个单词和当前段的第一个单词位置信息都是一样的。因此 Transformer-XL 采用了 Relative Positional Encoding (相对位置编码) ,不使用固定的位置,而是采用单词之间的相对位置进行编码。
3. Vanilla transfomer langange models 简单介绍与缺点
3.1 简单介绍
3.2 缺点
3.2.1 Training with the Vanilla Model (Vanila的训练阶段问题)
1. Tokens at the beginning of each segment do not have sufficent context for proper optimization.
2. Limited by a fixed-length context
3.2.2 Evaluation with the Vanilla Model
1. Longest context limited by segment length.
2. very expensive due to recomputation.
3.2.3. Temporal Incoherence
4. Transformer-XL贡献或主要改进
4.1 Transformer-XL 介绍
4.1.1 Training with Transformer-XL
4.1.2 Evaluation with Transformer-XL
4.1.3. Solution: Relative Positional Encodings
Benefits:
1. Allows recurrence mechanism
2. Better generalization
-> WordLM: Train with memory length 150 , evaluate with 640
-> CharLM: Train with memory length 680, evalute with 3800
4.1 Segment-level Recurrence
Cache and reuse hidden states from last batch
Analogous to Truncated BPTT for RNN : pass the last hidden state to the next segment as the initial hidden
4.2. Keep Temporal information coherenet
5. 总结
参考资料
Transformer-XL_ Attentive Language Models beyond a Fixed-Length Context_哔哩哔哩_bilibili
版权声明
本文为[茫茫人海一粒沙]所创,转载请带上原文链接,感谢
https://blog.csdn.net/keeppractice/article/details/119790553
边栏推荐
猜你喜欢
dried food! Point based: differentiable Poisson solver
Upload labs range practice
Weekly leetcode - 06 array topics 7 ~ 739 ~ 50 ~ offer 62 ~ 26 ~ 189 ~ 9
LeetCode中等题之旋转函数
[appium] encountered the problem of switching the H5 page embedded in the mobile phone during the test
LeetCode简单题之重新排列日志文件
使用 Ingress 实现金丝雀发布
[programming practice / embedded competition] learning record of embedded competition (I): establishment of TCP server and web interface
岛屿的个数
LeetCode 1611. 使整数变为 0 的最少操作次数
随机推荐
数据库之Mysql——概述安装篇
C outputs a two-dimensional array with the following characteristics.
校园转转二手市场源码下载
PHP generates short links: convert numbers to letters and letters to numbers
浅谈ES6尾调优化
多目视觉SLAM
Go语学习笔记 - 结构体 | 从零开始Go语言
LeetCode15. 三数之和
Feign source code analysis
How does feign integrate hystrix
Smart business card applet business card details page function implementation key code
巨头押注的全屋智能,正在驱动海信、华为、小米们「自我革命」
MySQL——第一章节(MySQL中的数据类型)
为什么会存在1px问题?怎么解决?
室内定位技术对比
Ubuntu安装Mysql并查询平均成绩
Fibula dynamic programming
Planification du mouvement du manipulateur dans l'assemblage 3c
一款拥有漂亮外表的Typecho简洁主题_Scarfskin 源码下载
PHP high precision computing