当前位置：网站首页>Advantages, disadvantages and selection of activation function

Advantages, disadvantages and selection of activation function

2022-04-23 15:27:00 【moletop】

Activation function ：

significance ： Increase the nonlinear modeling ability of the network , If there is no activation function , Then the network can only express linear mapping , Even if there are more hidden layers , The whole network is also equivalent to the single-layer neural network
Characteristics required ：1. Continuous derivable .2, As simple as possible , Improve network computing efficiency .3, The value range is in the appropriate range , Otherwise, it will affect the training efficiency and stability .
Saturation activation function ：Sigmoid、Tanh. Unsaturated activation function ：ReLu. And the output layer ( classifier ) Of softmax
The choice of activation function ： In the hidden layer ReLu>Tanh>Sigmoid .RNN in ：Tanh,Sigmoid. Output layer ：softmax（ Classification task ）. Neuronal death occurs , It can be used PRelu.

1**.Sigmoid**:

$[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-yx52Vtvu-1649727884516)(D:\Download\picture666-master\img/20220319224913.png)]$

advantage ：

<1> Sigmoid The value range of is (0, 1), Coincidence probability , And monotonically increasing , Easier to optimize .

<2> Sigmoid Derivation is easier , It can be directly deduced that .

shortcoming ：

<1> Sigmoid The function converges slowly .

<2> because Sigmoid It's soft saturation , It's easy to produce gradients that disappear , It is not suitable for deep network training, which is easy to cause the gradient to disappear .

<3> Sigmoid The function is not in the form of （0,0） For the center , Ring breaking data distribution .

2.Tanh function

advantage ：<1> The function outputs in （0,0） Centered .

shortcoming ：<1> tanh There is no solution sigmoid The problem of gradient disappearance .

3.ReLU function

advantage ：

<1> stay SGD The convergence rate is faster than Sigmoid and tanh Much faster

<2> It effectively alleviates the problem of gradient disappearance .

shortcoming ：

<1> Neuron disappointment is easy to appear in the process of training （ Negative half axis ）, Then the gradient is always 0 The situation of , Cause irreversible death .

<2> The derivative is 1, Alleviate the problem of gradient disappearance , But it's easy to explode .

4.ReLu improvement