当前位置:网站首页>基于布朗运动的文本生成方法-LANGUAGE MODELING VIA STOCHASTIC PROCESSES
基于布朗运动的文本生成方法-LANGUAGE MODELING VIA STOCHASTIC PROCESSES
2022-08-09 06:54:00 【just do it now】
标题:LANGUAGE MODELING VIA STOCHASTIC PROCESSES
文章:https://arxiv.org/abs/2203.11370
代码:https://github.com/rosewang2008/language_modeling_via_stochastic_processes
本篇文章可谓是开放域对话的又一开山制作,众所周知,开放域对话是无状态的,不能像任务式对话那样进行状态的追踪,也即不可控性。本文则提出了一种基于布朗桥的文本生成方法,对对话过程进行编码,构建布朗桥来控制对话的过程。
1. 基于布朗桥过程的编码器
首先训练一个编码器,将句子从文本空间X映射到隐空间Z, 记为f:X->Z。在隐空间中的移动轨迹应遵循布朗桥运动。也就是说,该轨迹的起点和终点固定,设为z0和zT则在时间点 t 时, zt 服从以下正态分布:
其均值是z0和zT之间随时间变化的线性插值。方差部分,可以直观理解为:在靠起点和终点处方差较小,而中间部分则方差较大(如下图左侧所示)。
- 怎样训练一个编码器来拟合这个过程呢?
对于句子序列,从中随机采样顺序(但未必相邻)的三个句子(x0,xt,xT)优化目标为:使得f(x0)遵循布朗桥运动轨迹。其目标函数可以写为:
可以理解为:使得(x0,xt,xT)更加接近布朗桥过程,而其他负样本三元组与布朗桥过程的差异变大。其中,函数d(.)用于度量编码器预测结果到布朗桥轨迹的距离
2. 基于GPT微调解码器生成
用上述编码器得到隐空间中的布朗桥轨迹后,需要再使用一个解码器,以该轨迹为条件生成对应的文本。对于该解码器的训练,直接对GPT2进行微调。
在 inference 时,给定隐空间起点z0与终点zT,只需随机采样一个两点之间的布朗桥过程,然后用上述解码器进行生成即可,如下图所示:
3. 结果
RQ1:Can Time Control model local text dynamics?
Section 4.1 investigates this question using a sentence ordering prediction task: given two sentences from the same document, we evaluate whether different models can predict their original order.
RQ2: Can Time Control generate locally coherent text?
Section 4.2 investigates this question using the text-infilling task: given prefix and suffix, we evaluate how well different models can fill in between.
RQ3: Can Time Control model global text dynamics?
Section 4.3 investigates this question on text generation for Wikipedia city articles by examining the length of generated sections.
RQ4: Can Time Control generate long coherent documents?
Section 4.4 investigates this question on forced long text generation: we evaluate how well models preserve global text statistics (such as typical section orders and lengths) when forced to extrapolate during generation.
边栏推荐
- leetcode 之盛水问题
- crc计算
- 思维方法 解决问题的能力
- Inception V3 闭眼检测
- Reverse Engineering
- Search 1688 product interface by image (item_search_img-search 1688 product by image (Politao interface) code docking tutorial
- Simple Factory Pattern
- APP product source data interface (taobao, jingdong/spelling/suning/trill platform details a lot data analysis interface) code and docking tutorial
- MongDb query method
- The singleton pattern
猜你喜欢
The working principle of the transformer (illustration, schematic explanation, understand at a glance)
字节跳动面试题之镜像二叉树2020
当酷雷曼VR直播遇上视频号,会摩擦出怎样的火花?
Built-in macros in C language (define log macros)
网络学习总结
半导体新能源智能装备整机软件系统方案设计
虚拟机网卡报错:Bringing up interface eth0: Error: No suitable device found: no device found for connection
The Integer thread safe
Variable used in lambda expression should be final or effectively final报错解决方案
pycharm环境包导入到另外一个环境
随机推荐
报错jinja2.exceptions.UndefinedError: ‘form‘ is undefined
2022.8.8DAY628
多米诺骨牌
默默重新开始,第一页也是新的一页
flask创建数据库失败未报错
AD picture PCB tutorial 20 minutes clear label shop operation process, copper network
Zero shift of leetcode
找出数组中不重复的值php
jvm线程状态
failed (13: Permission denied) while connecting to upstream
The Integer thread safe
2022 年全球十大最佳自动化测试工具
高项 01 信息化与信息系统
Example of using the cut command
eyb:Redis学习(2)
Thread Pool Summary
C language implements sequential stack and chain queue
BeautifulSoup4的介绍与使用
ByteDance Interview Questions: Mirror Binary Tree 2020
XxlJobConfig distributed timer task management XxlJob configuration class, replace