当前位置:网站首页>On the problem of cliff growth of loss function in the process of training
On the problem of cliff growth of loss function in the process of training
2022-04-23 14:13:00 【All the names I thought of were used】
The reasons for the cliff like growth of loss function in the process of training
( One ). Since the loss function is nonconvex , Setting the learning rate too large leads to jumping out of the interval of the optimal solution , We can choose an optimization algorithm that dynamically changes the learning rate , such as adam
( Two ) When the gradient explosion occurs in the training process, the loss will also increase like a cliff
Reasons for gradient explosion or disappearance
The root cause : When we take improper training methods, ah, resulting in the disappearance of the gradient in the front layer , The model will greatly adjust the parameters of the next few layers , Cause the gradient to be too large , Finally, there is a gradient explosion
Be careful : The gradient disappears in the first few layers , Gradient explosions occur in the latter layers
Solutions
Be careful : Gradient truncation method is also an important means to prevent gradient explosion
1. Select the appropriate distribution to initialize the parameters ,w Too large can easily lead to gradient explosion or disappearance , For example, use tanh When activating the function ,w To cause to z Too big , The derivative tends to 0
2. use BN The way , Make the input and output keep the same distribution as much as possible , Slowing the appearance of gradient disappearance can also avoid gradient explosion or disappearance ( Very easy to use )
3. According to the chain rule , When we w The value of is small ,a The derivative of will also be smaller ,a The smaller the derivative of the previous layer w The smaller the gradient , So we can use L1、L2 Regularization to slow down the gradient explosion
4. Choose the appropriate activation function ,relu Is the most commonly used activation function
5. When the effect is similar , The simpler the neural network is, the less prone it is to gradient explosion and gradient disappearance
版权声明
本文为[All the names I thought of were used]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231404419479.html
边栏推荐
猜你喜欢
帆软中根据分类进行汇总
室内外地图切换(室内基于ibeacons三点定位)
HyperMotion云迁移助力中国联通,青云完成某央企上云项目,加速该集团核心业务系统上云进程
VMWare安装64位XP中文教程
Jmeter安装教程以及我遇到的问题的解决办法
Wechat applet positioning and ranging through low-power Bluetooth device (2)
金融行业云迁移实践 平安金融云整合HyperMotion云迁移解决方案,为金融行业客户提供迁移服务
Prediction of tomorrow's trading limit of Low Frequency Quantization
Some good articles on pthread multithreading
MySQL数据库讲解(九)
随机推荐
Mysql个人学习总结
Prediction of tomorrow's trading limit of Low Frequency Quantization
Wechat applet obtains login user information, openid and access_ token
Detailed tutorial on the use of smoke sensor (mq-2) (based on raspberry pie 3B +)
帆软调用动态传参的方法,在标题中设置参数
FBS (fman build system) packaging
Logging module
Intégration de Clusters CDH Phoenix basée sur la gestion cm
Check in system based on ibeacons
mysql 5.1升级到5.610
POI operation word template replaces data and exports word
PySide2
某政务云项目业务系统迁移调研实践
Call wechat customer service applet
使用Postman进行Mock测试
基于ibeacons三点定位(微信小程序)
JDBC详解
使用itextpdf实现截取pdf文档第几页到第几页,进行分片
OpenStack如何跨版本升级
leetcode--977. Squares of a Sorted Array