当前位置:网站首页>R-drop: a more powerful dropout regularization method
R-drop: a more powerful dropout regularization method
2022-04-23 11:11:00 【Graduate students are not late】
List of articles
Write it at the front : This article studies and quotes Microsoft Research Institute AI headlines 、“ Mr meatball ” 's blog post “R-Drop—— More powerful Dropout”
1 Background introduction
1.1 Dropout technology
Deep neural network (DNN) Recently, it has achieved remarkable success in various fields . In training these large-scale DNN Model time , Regularization (regularization) technology , Such as L2 Normalization、Batch Normalization、Dropout And so on are indispensable modules , To prevent the model Over fitting (over-fitting), meanwhile Improve the generalization of the model (generalization) Ability . In the Middle East: ,Dropout Technology only needs to discard a part of neurons in the training process , It has become the most widely used regularization Technology .
Regularization techniques regularization : L2 Normalization、Batch Normalization、Dropout wait
- However Dropout The operation of , To some extent, the trained model will become a combination constraint of multiple sub models . therefore , Put forward
R-Drop

1.2 Regularized Dropout (R-Drop) technology
Microsoft Asia Research Institute and Suzhou University stay Dropout A further regularization method is proposed :Regularized Dropout, abbreviation R-Drop.
Compared with the traditional action on neurons (Dropout) Or model parameters (DropConnect ) The constraint methods on are different ,R-Drop Act on the output layer of the model , Make up for Dropout Inconsistencies in training and testing .
Simply put, in every mini-batch in , Each data sample has been tested twice with Dropout The same model ,R-Drop Reuse KL-divergence The output of two constraints is consistent . therefore ,R-Drop Constrained by Dropout The output consistency of the two slave sub models .【 There are some obscure things here , We'll continue later 】
Compared with traditional training methods ,R- Drop It's just a simple addition KL-divergence Loss function term , No other changes . Although the method looks simple , But experiments have shown that , stay 5 A common contains NLP and CV In the task of ( altogether 18 Data sets ),R-Drop Have achieved very good results , And in machine translation 、 The current optimal results have been achieved on tasks such as text summarization .
2 R-Dropout Introduction to the principle of
because DNN It's very easy to over fit , So we use Dropout Method , Randomly discard some neurons in each layer , In order to avoid the over fitting problem in the training process .
And because of random discarding , As a result, the sub models generated after each discard are different , therefore Dropout To some extent, the operation of The trained model is a combination constraint of multiple sub models .
be based on Dropout The randomness brought to the network by this special way , The researchers put forward R-Drop To further the ( Submodels ) The output prediction of the network is subject to regular constraints .
Add one more KL-divergence Loss function term
- The overall framework is as follows :


2.1 Model explanation
- The model proposed in this paper is the right model in the figure above . You can see , The same data in two calculations , Random... Is used dropout after , Two different sub models are obtained , In the figure P 1 ( y ∣ x ) P_1(y|x) P1(y∣x) and P 2 ( y ∣ x ) P_2(y|x) P2(y∣x) Is the distribution of the two sub models .( The two sub models are because ,Dropout The missing neurons are different )
- therefore , For the same input data P 1 w ( y ∣ x ) P_1^w(y|x) P1w(y∣x) and P 2 w ( y ∣ x ) P_2^w(y|x) P2w(y∣x) The distribution of is different .
- therefore , In the training steps ,R-Dropout Method , Try to minimize the bidirectional relationship between these two output distributions of the same sample K u l l b a c k − L e i b l e r ( K L ) Kullback−Leibler(KL) Kullback−Leibler(KL) Divergence is used to regularize the model prediction .
3 summary
-
Dropout It's obvious that : Inconsistency between prediction and training , This is also very intuitive .
-
and R-D By adding a regular term , To strengthen the model for Dropout The robustness of , Make a difference Dropout The output of the lower model is basically the same , Therefore, this inconsistency can be reduced , promote “ The model is average ” And “ Weight average ” The similarity of , This makes it easy to close Dropout The effect is equivalent to more Dropout The result of model fusion , Improve the final performance of the model .
-
in general ,R-D The form is simple , The results are excellent , It's a very innovative idea . But for R-D Why can we achieve such excellent results , And how to guide the model to find the right R-D It is also worth exploring .
版权声明
本文为[Graduate students are not late]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231106345777.html
边栏推荐
- Mysql中有关Datetime和Timestamp的使用总结
- Visual common drawing (III) area map
- MySQL面试题讲解之如何设置Hash索引
- Learn go language 0x04: Code of exercises sliced in go language journey
- 26. Delete duplicates in ordered array
- MBA-day5数学-应用题-工程问题
- CUMCM 2021-b: preparation of C4 olefins by ethanol coupling (2)
- Detailed introduction to paging exploration of MySQL index optimization
- About the three commonly used auxiliary classes of JUC
- Usage Summary of datetime and timestamp in MySQL
猜你喜欢

语雀文档编辑器将开源:始于但不止于Markdown

初探 Lambda Powertools TypeScript

Visual Road (XII) detailed explanation of collection class

STM32接电机驱动,杜邦线供电,然后反烧问题

Google Earth Engine(GEE)——将原始影像进行升尺度计算(以海南市为例)

进程间通信 -- 消息队列

CUMCM 2021-B:乙醇偶合制备C4烯烃(2)

Detailed explanation of typora Grammar (I)

Let the LAN group use the remote device

学习 Go 语言 0x04:《Go 语言之旅》中切片的练习题代码
随机推荐
《Neo4j权威指南》简介,求伯君、周鸿袆、胡晓峰、周涛等大咖隆重推荐
mysql创建存储过程及函数详解
Excel · VBA array bubble sorting function
Understand the key points of complement
Mysql系列SQL查询语句书写顺序及执行顺序详解
mysql插入datetime类型字段不加单引号插入不成功
Differences among restful, soap, RPC, SOA and microservices
MySQL sorting feature details
MBA-day6 逻辑学-假言推理练习题
MBA - day5 mathématiques - Questions d'application - Questions d'ingénierie
How to bind a process to a specified CPU
Typora operation skill description (I)
MySQL8.0升级的踩坑历险记
More reliable model art than deep learning
Common parameters of ffmpeg command line
期货开户哪个公司好?安全靠谱的期货公司谁能推荐几家?
Excel·VBA数组冒泡排序函数
The songbird document editor will be open source: starting with but not limited to markdown
Upgrade the functions available for cpolar intranet penetration
学习 Go 语言 0x04:《Go 语言之旅》中切片的练习题代码