当前位置:网站首页>Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context 论文总结
Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context 论文总结
2022-04-23 07:04:00 【茫茫人海一粒沙】
Paper:Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context
Code:Transformer-XL code
1. 论文简介
Transfomer-XL = Transformer Extra Long
2. 什么是Transformer

XLNet 使用了 Transformer-XL 中的 Segment Recurrence Mechanism (段循环) 和 Relative Positional Encoding (相对位置编码) 进行优化。
Segment Recurrence Mechanism 段循环的机制会将上一段文本输出的信息保存下来,用于当前文本的计算,使模型可以拥有更广阔的上下文信息。
在引入上一段信息后,可能会有两个 token 拥有相同的位置信息,例如上一段的第一个单词和当前段的第一个单词位置信息都是一样的。因此 Transformer-XL 采用了 Relative Positional Encoding (相对位置编码) ,不使用固定的位置,而是采用单词之间的相对位置进行编码。

3. Vanilla transfomer langange models 简单介绍与缺点
3.1 简单介绍

3.2 缺点
3.2.1 Training with the Vanilla Model (Vanila的训练阶段问题)
1. Tokens at the beginning of each segment do not have sufficent context for proper optimization.
2. Limited by a fixed-length context

3.2.2 Evaluation with the Vanilla Model
1. Longest context limited by segment length.
2. very expensive due to recomputation.

3.2.3. Temporal Incoherence


4. Transformer-XL贡献或主要改进
4.1 Transformer-XL 介绍
4.1.1 Training with Transformer-XL


4.1.2 Evaluation with Transformer-XL

4.1.3. Solution: Relative Positional Encodings
Benefits:
1. Allows recurrence mechanism
2. Better generalization
-> WordLM: Train with memory length 150 , evaluate with 640
-> CharLM: Train with memory length 680, evalute with 3800





4.1 Segment-level Recurrence
Cache and reuse hidden states from last batch
Analogous to Truncated BPTT for RNN : pass the last hidden state to the next segment as the initial hidden
4.2. Keep Temporal information coherenet

5. 总结

参考资料
Transformer-XL_ Attentive Language Models beyond a Fixed-Length Context_哔哩哔哩_bilibili
版权声明
本文为[茫茫人海一粒沙]所创,转载请带上原文链接,感谢
https://blog.csdn.net/keeppractice/article/details/119790553
边栏推荐
- Flutter之Provider共享数据的两种方式
- 数据库之MySQL——基础篇
- Jetson Xavier NX(3)Bazel Mediapipe 安装
- NFT ecological development of Ignis public chain: unicorn Donation and development of Art
- Implementation of promise all
- LeetCode简单题之重新排列日志文件
- Concours de compétences en informatique en nuage - - première partie de l'environnement cloud privé openstack
- 欧圣电气深交所上市:市值52亿 陆为东父女为美国籍
- 编译原理题-带答案
- Depth of binary tree
猜你喜欢

Samsung, March to the west again

Data security has become a hidden danger. Let's see how vivo can make "user data" armor again

LeetCode中等题之旋转函数

AAAI 2022 recruit speakers!!

LeetCode简单题之统计字符串中的元音子字符串

【无标题】
![[programming practice / embedded competition] learning record of embedded competition (I): establishment of TCP server and web interface](/img/f1/09de53509479a01098d3cf46bf48eb.jpg)
[programming practice / embedded competition] learning record of embedded competition (I): establishment of TCP server and web interface
![BUUCTF [ACTF2020 新生赛]Include1](/img/47/b8f46037f7e9476b8e01e8d6a7857a.png)
BUUCTF [ACTF2020 新生赛]Include1

Mobile terminal layout (3D conversion, animation)

Research on software security based on NLP (2)
随机推荐
Construction of middleman environment mitmproxy
浏览器中的 Kubernetes 和 IDE | 交互式学习平台Killercoda
Fibula dynamic programming
Weekly leetcode - 06 array topics 7 ~ 739 ~ 50 ~ offer 62 ~ 26 ~ 189 ~ 9
AAAI 2022 recruit speakers!!
室内定位技术对比
dmp引擎工作总结(2021,光剑)
Mobile terminal layout (3D conversion, animation)
Face to face summary 2
Data security has become a hidden danger. Let's see how vivo can make "user data" armor again
Manipulator motion planning in 3C assembly
谈谈那些基础但不简单的股票数据
Move layout (Flex layout, viewport label)
How does feign integrate hystrix
Codeforces Round #784 (Div. 4)
三星,再次“西征”
Jetson Xavier NX (3) bazel mediapipe installation
MYSQL——第一章节(数据类型2)
GUI,CLI与Unix哲学
vslam PPT