当前位置:网站首页>On the problem of cliff growth of loss function in the process of training
On the problem of cliff growth of loss function in the process of training
2022-04-23 14:13:00 【All the names I thought of were used】
The reasons for the cliff like growth of loss function in the process of training
( One ). Since the loss function is nonconvex , Setting the learning rate too large leads to jumping out of the interval of the optimal solution , We can choose an optimization algorithm that dynamically changes the learning rate , such as adam
( Two ) When the gradient explosion occurs in the training process, the loss will also increase like a cliff
Reasons for gradient explosion or disappearance
The root cause : When we take improper training methods, ah, resulting in the disappearance of the gradient in the front layer , The model will greatly adjust the parameters of the next few layers , Cause the gradient to be too large , Finally, there is a gradient explosion
Be careful : The gradient disappears in the first few layers , Gradient explosions occur in the latter layers
Solutions
Be careful : Gradient truncation method is also an important means to prevent gradient explosion
1. Select the appropriate distribution to initialize the parameters ,w Too large can easily lead to gradient explosion or disappearance , For example, use tanh When activating the function ,w To cause to z Too big , The derivative tends to 0
2. use BN The way , Make the input and output keep the same distribution as much as possible , Slowing the appearance of gradient disappearance can also avoid gradient explosion or disappearance ( Very easy to use )
3. According to the chain rule , When we w The value of is small ,a The derivative of will also be smaller ,a The smaller the derivative of the previous layer w The smaller the gradient , So we can use L1、L2 Regularization to slow down the gradient explosion
4. Choose the appropriate activation function ,relu Is the most commonly used activation function
5. When the effect is similar , The simpler the neural network is, the less prone it is to gradient explosion and gradient disappearance
版权声明
本文为[All the names I thought of were used]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231404419479.html
边栏推荐
猜你喜欢
随机推荐
Gartner预测云迁移规模大幅增长;云迁移的优势是什么?
On the multi-level certificate based on OpenSSL, the issuance and management of multi-level Ca, and two-way authentication
Jacob print word
Visio installation error 1:1935 2: {XXXXXXXX
VMware Workstation 无法连接到虚拟机。系统找不到指定的文件
findstr不是内部或外部命令解决方法
Three point positioning based on ibeacons (wechat applet)
Wechat applet initializes Bluetooth, searches nearby Bluetooth devices and connects designated Bluetooth (I)
Jira截取全图
FBS (fman build system) packaging
Some experience of using dialogfragment and anti stepping pit experience (getactivity and getdialog are empty, cancelable is invalid, etc.)
MySQL数据库讲解(七)
Mysql个人学习总结
困扰多年的系统调研问题有自动化采集工具了,还是开源免费的
按实际取,每三级分类汇总一次,看图知需求
VMware15Pro在Deepin系统里面挂载真机电脑硬盘
CDH cluster integration Phoenix based on CM management
HyperBDR云容灾V3.3.0版本发布|容灾功能升级,资源组管理功能优化
利用json-server在本地创建服务器请求
mysql 5.1升级到5.66