当前位置:网站首页>R-drop: a more powerful dropout regularization method
R-drop: a more powerful dropout regularization method
2022-04-23 11:11:00 【Graduate students are not late】
List of articles
Write it at the front : This article studies and quotes Microsoft Research Institute AI headlines 、“ Mr meatball ” 's blog post “R-Drop—— More powerful Dropout”
1 Background introduction
1.1 Dropout technology
Deep neural network (DNN) Recently, it has achieved remarkable success in various fields . In training these large-scale DNN Model time , Regularization (regularization) technology , Such as L2 Normalization、Batch Normalization、Dropout And so on are indispensable modules , To prevent the model Over fitting (over-fitting), meanwhile Improve the generalization of the model (generalization) Ability . In the Middle East: ,Dropout Technology only needs to discard a part of neurons in the training process , It has become the most widely used regularization Technology .
Regularization techniques regularization : L2 Normalization、Batch Normalization、Dropout wait
- However Dropout The operation of , To some extent, the trained model will become a combination constraint of multiple sub models . therefore , Put forward
R-Drop
1.2 Regularized Dropout (R-Drop) technology
Microsoft Asia Research Institute and Suzhou University stay Dropout A further regularization method is proposed :Regularized Dropout, abbreviation R-Drop.
Compared with the traditional action on neurons (Dropout) Or model parameters (DropConnect ) The constraint methods on are different ,R-Drop Act on the output layer of the model , Make up for Dropout Inconsistencies in training and testing .
Simply put, in every mini-batch in , Each data sample has been tested twice with Dropout The same model ,R-Drop Reuse KL-divergence The output of two constraints is consistent . therefore ,R-Drop Constrained by Dropout The output consistency of the two slave sub models .【 There are some obscure things here , We'll continue later 】
Compared with traditional training methods ,R- Drop It's just a simple addition KL-divergence Loss function term , No other changes . Although the method looks simple , But experiments have shown that , stay 5 A common contains NLP and CV In the task of ( altogether 18 Data sets ),R-Drop Have achieved very good results , And in machine translation 、 The current optimal results have been achieved on tasks such as text summarization .
2 R-Dropout Introduction to the principle of
because DNN It's very easy to over fit , So we use Dropout Method , Randomly discard some neurons in each layer , In order to avoid the over fitting problem in the training process .
And because of random discarding , As a result, the sub models generated after each discard are different , therefore Dropout To some extent, the operation of The trained model is a combination constraint of multiple sub models .
be based on Dropout The randomness brought to the network by this special way , The researchers put forward R-Drop To further the ( Submodels ) The output prediction of the network is subject to regular constraints .
Add one more KL-divergence Loss function term
- The overall framework is as follows :
2.1 Model explanation
- The model proposed in this paper is the right model in the figure above . You can see , The same data in two calculations , Random... Is used dropout after , Two different sub models are obtained , In the figure P 1 ( y ∣ x ) P_1(y|x) P1(y∣x) and P 2 ( y ∣ x ) P_2(y|x) P2(y∣x) Is the distribution of the two sub models .( The two sub models are because ,Dropout The missing neurons are different )
- therefore , For the same input data P 1 w ( y ∣ x ) P_1^w(y|x) P1w(y∣x) and P 2 w ( y ∣ x ) P_2^w(y|x) P2w(y∣x) The distribution of is different .
- therefore , In the training steps ,R-Dropout Method , Try to minimize the bidirectional relationship between these two output distributions of the same sample K u l l b a c k − L e i b l e r ( K L ) Kullback−Leibler(KL) Kullback−Leibler(KL) Divergence is used to regularize the model prediction .
3 summary
-
Dropout It's obvious that : Inconsistency between prediction and training , This is also very intuitive .
-
and R-D By adding a regular term , To strengthen the model for Dropout The robustness of , Make a difference Dropout The output of the lower model is basically the same , Therefore, this inconsistency can be reduced , promote “ The model is average ” And “ Weight average ” The similarity of , This makes it easy to close Dropout The effect is equivalent to more Dropout The result of model fusion , Improve the final performance of the model .
-
in general ,R-D The form is simple , The results are excellent , It's a very innovative idea . But for R-D Why can we achieve such excellent results , And how to guide the model to find the right R-D It is also worth exploring .
版权声明
本文为[Graduate students are not late]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231106345777.html
边栏推荐
- Detailed explanation of how to smoothly go online after MySQL table splitting
- MySQL索引优化之分页探索详细介绍
- Intuitive understanding entropy
- More reliable model art than deep learning
- Xdotool key Wizard
- Common parameters of ffmpeg command line
- Solutions to common problems in visualization (IX) background color
- SWAT - Introduction to Samba web management tool
- 小程序 支付
- Excel · VBA array bubble sorting function
猜你喜欢
Visualized common drawing (II) line chart
MIT:用无监督为世界上每个像素都打上标签!人类:再也不用为1小时视频花800个小时了
Excel · VBA custom function to obtain multiple cell values
Google Earth Engine(GEE)——将原始影像进行升尺度计算(以海南市为例)
Solutions to common problems in visualization (VIII) solutions to problems in shared drawing area
得物技术网络优化-CDN资源请求优化实践
An interesting interview question
About the three commonly used auxiliary classes of JUC
使用 PHP PDO ODBC 示例的 Microsoft Access 数据库
More reliable model art than deep learning
随机推荐
About the three commonly used auxiliary classes of JUC
VIM + ctags + cscope development environment construction guide
数据库管理软件SQLPro for SQLite for Mac 2022.30
colab
Understanding of fileprovider path configuration strategy
MBA-day5數學-應用題-工程問題
一道有趣的阿里面试题
Cygwin 中的 rename 用法
Solutions to common problems in visualization (VIII) solutions to problems in shared drawing area
Use of SVN:
Mysql中一千万条数据怎么快速查询
Source insight 4.0 FAQs
How to bind a process to a specified CPU
期货开户哪个公司好?安全靠谱的期货公司谁能推荐几家?
The songbird document editor will be open source: starting with but not limited to markdown
Chapter 1 of technical Xiaobai (express yourself)
Mysql系列SQL查询语句书写顺序及执行顺序详解
Anaconda3 installation
Common parameters of ffmpeg command line
Difference between pregnancy box and delivery box