### Activation function ：

• significance ： Increase the nonlinear modeling ability of the network , If there is no activation function , Then the network can only express linear mapping , Even if there are more hidden layers , The whole network is also equivalent to the single-layer neural network

• Characteristics required ：1. Continuous derivable .2, As simple as possible , Improve network computing efficiency .3, The value range is in the appropriate range , Otherwise, it will affect the training efficiency and stability .

• Saturation activation function ：Sigmoid、Tanh. Unsaturated activation function ：ReLu. And the output layer ( classifier ) Of softmax

• The choice of activation function ： In the hidden layer ReLu>Tanh>Sigmoid .RNN in ：Tanh,Sigmoid. Output layer ：softmax（ Classification task ）. Neuronal death occurs , It can be used PRelu.

1**.Sigmoid**:

<1> Sigmoid The value range of is (0, 1), Coincidence probability , And monotonically increasing , Easier to optimize .

<2> Sigmoid Derivation is easier , It can be directly deduced that .

shortcoming ：

<1> Sigmoid The function converges slowly .

<2> because Sigmoid It's soft saturation , It's easy to produce gradients that disappear , It is not suitable for deep network training, which is easy to cause the gradient to disappear .

<3> Sigmoid The function is not in the form of （0,0） For the center , Ring breaking data distribution .

2.Tanh function

advantage ：<1> The function outputs in （0,0） Centered .

shortcoming ：<1> tanh There is no solution sigmoid The problem of gradient disappearance .

3.ReLU function

<1> stay SGD The convergence rate is faster than Sigmoid and tanh Much faster

<2> It effectively alleviates the problem of gradient disappearance .

shortcoming ：

<1> Neuron disappointment is easy to appear in the process of training （ Negative half axis ）, Then the gradient is always 0 The situation of , Cause irreversible death .

<2> The derivative is 1, Alleviate the problem of gradient disappearance , But it's easy to explode .

4.ReLu improvement

