当前位置：网站首页>Forward Propagation and Back Propagation

Forward Propagation and Back Propagation

2022-08-08 09:33:00 【ZhangJiQun & MXP】

Why use gradient descent to optimize neural network parameters?

Backpropagation (used to optimize neural network parameters): According to the error calculated by the loss function, it guides the update and optimization of deep network parameters through backpropagation.

The reason for using backpropagation: First, the deep network is composed of many linear layers and nonlinear layers stacked, and each nonlinear layer can be regarded as a nonlinear function (the nonlinearity comes from the nonlinear activation function), so the entire deep network can be regarded as a composite nonlinear multivariate function.

Our ultimate goal is to hope that this nonlinear function can well complete the mapping between input and output, that is, to find a minimum value for the loss function.So the final problem becomes a problem of finding the minimum value of a function. Mathematically, it is natural to think of using gradient descent to solve it.

What are the effects of gradient disappearance and explosion

For example, for a simple neural network with three hidden layers, when the gradient disappears, the hidden layer close to the output layer has a relatively normal gradient, so the weight update is relatively normal.However, when it is closer to the input layer, due to the disappearance of the gradient, the weights of the hidden layers close to the input layer will be updated slowly or the update will be stagnant.This leads to the fact that during training, it is only equivalent to the learning of the later layers of the shallow network.

Reason

Gradient disappearance and gradient explosion are essentially the same, both are multiplicative effects in gradient backpropagation caused by the number of network layers being too deep.