当前位置:网站首页>Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context 论文总结
Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context 论文总结
2022-04-23 07:04:00 【茫茫人海一粒沙】
Paper:Transformer-XL: Attentive Language ModelsBeyond a Fixed-Length Context
Code:Transformer-XL code
1. 论文简介
Transfomer-XL = Transformer Extra Long
2. 什么是Transformer
XLNet 使用了 Transformer-XL 中的 Segment Recurrence Mechanism (段循环) 和 Relative Positional Encoding (相对位置编码) 进行优化。
Segment Recurrence Mechanism 段循环的机制会将上一段文本输出的信息保存下来,用于当前文本的计算,使模型可以拥有更广阔的上下文信息。
在引入上一段信息后,可能会有两个 token 拥有相同的位置信息,例如上一段的第一个单词和当前段的第一个单词位置信息都是一样的。因此 Transformer-XL 采用了 Relative Positional Encoding (相对位置编码) ,不使用固定的位置,而是采用单词之间的相对位置进行编码。
3. Vanilla transfomer langange models 简单介绍与缺点
3.1 简单介绍
3.2 缺点
3.2.1 Training with the Vanilla Model (Vanila的训练阶段问题)
1. Tokens at the beginning of each segment do not have sufficent context for proper optimization.
2. Limited by a fixed-length context
3.2.2 Evaluation with the Vanilla Model
1. Longest context limited by segment length.
2. very expensive due to recomputation.
3.2.3. Temporal Incoherence
4. Transformer-XL贡献或主要改进
4.1 Transformer-XL 介绍
4.1.1 Training with Transformer-XL
4.1.2 Evaluation with Transformer-XL
4.1.3. Solution: Relative Positional Encodings
Benefits:
1. Allows recurrence mechanism
2. Better generalization
-> WordLM: Train with memory length 150 , evaluate with 640
-> CharLM: Train with memory length 680, evalute with 3800
4.1 Segment-level Recurrence
Cache and reuse hidden states from last batch
Analogous to Truncated BPTT for RNN : pass the last hidden state to the next segment as the initial hidden
4.2. Keep Temporal information coherenet
5. 总结
参考资料
Transformer-XL_ Attentive Language Models beyond a Fixed-Length Context_哔哩哔哩_bilibili
版权声明
本文为[茫茫人海一粒沙]所创,转载请带上原文链接,感谢
https://blog.csdn.net/keeppractice/article/details/119790553
边栏推荐
- php高精度计算
- Smart business card applet business card details page function implementation key code
- Depth of binary tree
- LeetCoed18. Sum of four numbers
- Data security has become a hidden danger. Let's see how vivo can make "user data" armor again
- dried food! Point based: differentiable Poisson solver
- Thinkphp6 + JWT realizes login verification
- Jetson Xavier NX(3)Bazel Mediapipe 安装
- WordPress爱导航主题 1.1.3 简约大气网站导航源码网址导航源码
- LeetCode简单题之三除数
猜你喜欢
随机推荐
BUUCTF [ACTF2020 新生赛]Include1
校园转转二手市场源码下载
Implementation of new
Go语学习笔记 - 语言接口 | 从零开始Go语言
室内定位技术对比
LeetCode简单题之重新排列日志文件
[go] common concurrency model [generic version]
将实例化对象的方法 给新的对象用
[untitled]
数据安全问题已成隐患,看vivo如何让“用户数据”重新披甲
岛屿的个数
惨了,搞坏了领导的机密文件,吐血分享备份文件的代码技巧
sql 使用过的查询语句
巨头押注的全屋智能,正在驱动海信、华为、小米们「自我革命」
一个没啥L用,但可以装X的IDEA插件
Ignis公链的NFT生态发展:Unicorn.art的捐赠开发之路
分布式服务治理Nacos
Research on system and software security (4)
数据库之Mysql——概述安装篇
LeetCoed18. Sum of four numbers