当前位置:网站首页>Forward Propagation and Back Propagation
Forward Propagation and Back Propagation
2022-08-08 09:33:00 【ZhangJiQun & MXP】
Why use gradient descent to optimize neural network parameters?
Backpropagation (used to optimize neural network parameters): According to the error calculated by the loss function, it guides the update and optimization of deep network parameters through backpropagation.
The reason for using backpropagation: First, the deep network is composed of many linear layers and nonlinear layers stacked, and each nonlinear layer can be regarded as a nonlinear function (the nonlinearity comes from the nonlinear activation function), so the entire deep network can be regarded as a composite nonlinear multivariate function.
Our ultimate goal is to hope that this nonlinear function can well complete the mapping between input and output, that is, to find a minimum value for the loss function.So the final problem becomes a problem of finding the minimum value of a function. Mathematically, it is natural to think of using gradient descent to solve it.
What are the effects of gradient disappearance and explosion
For example, for a simple neural network with three hidden layers, when the gradient disappears, the hidden layer close to the output layer has a relatively normal gradient, so the weight update is relatively normal.However, when it is closer to the input layer, due to the disappearance of the gradient, the weights of the hidden layers close to the input layer will be updated slowly or the update will be stagnant.This leads to the fact that during training, it is only equivalent to the learning of the later layers of the shallow network.
Reason
Gradient disappearance and gradient explosion are essentially the same, both are multiplicative effects in gradient backpropagation caused by the number of network layers being too deep.
Solution
There are mainly the following solutions to solve the gradient disappearance and explosion:
Switch to activation functions such as Relu, LeakyRelu, Elu, etc.
ReLu: Let the derivative of the activation function be 1
边栏推荐
猜你喜欢
Defense - MFW all over the world
交换两个整型变量的三种方法
小散量化炒股记|打包Py可执行文件,双击就能选出全市场稳步上扬的股票
jupyter lab安装、配置教程
Database Tuning: The Impact of Mysql Indexes on Group By Sorting
COMSOL Multiphysics 6.0 software installation package and installation tutorial
Bytes and Characters and Common Encodings
ACWing 198. Antiprime Problem Solution
Do you really know IP addresses?
Django+MySQL+HarmonyOS------------笔记二
随机推荐
MySQL redo log和undo log
数学基础(二)逆矩阵、伪逆矩阵、最小二乘解、最小范数解
Practical Case: Building Churn Prediction Models with PySpark ML
简单理解MVVM模型
LabVIEW前面板和程序框图的最大尺寸
Kotlin Compose MiUI13.0.4 版本 Livedata不生效
HyperLynx(三)传输线类型及相关设置
Go 匿名字段与实现重写方法
蔚来杯2022牛客暑期多校训练营6 ABGJM
Pinia(一)初体验快速安装与上手
2.5W 字详解线程与锁了,面试随便问!!
各种attention的代码实现
SSRF漏洞
DVWA full level detailed customs clearance tutorial
Go 函数与方法
推荐系统 使用surprise库基于协同过滤的方法实现
交换两个整型变量的三种方法
docker部署redis容器问题
什么是DFT?FT、FS、DTFT、DFS、DFT的关系
【AGC】开放式测试示例