当前位置：网站首页>Function, type and design principle of neural network activation function

Function, type and design principle of neural network activation function

2022-04-22 06:56:00 【Indignant teasing】

One ． Activation function
Activation function activation function is very important in neurons . In order to enhance the expression ability and learning ability of the network ,

1. Activation functions need to have the following properties ：
(1) Continuous and derivable （ A few points are allowed to be non derivable ） The nonlinear function of . The differentiable activation function can directly use numerical optimization method to learn network parameters .
(2) The activation function and its derivative should be as simple as possible , It can improve the efficiency of network computing .
(3) The range of the derivative of the activation function should be in a suitable range , It can't be too big or too small , Otherwise, it will affect the efficiency and stability of training .

2. effect ： Change the range of linear function from real number interval “ extrusion ” here we are (0, 1) Between , It can be used to express probability .（ It can be regarded as a mapping relationship of transformation domain , Some books are also called “ Rectifier linear unit ”）, The basic expression is
Insert picture description here
3. characteristic ： The derivative is large , Gradient consistency , The second derivative is almost everywhere 0, The first derivative is everywhere 1

4. The goal is ： Solve nonlinear problems

5. type ：
5.1 Sigmoid（Logistic）
Insert picture description here
advantage ： Output mapping in （0,1） Inside , Monotone continuous , Derivative easily
shortcoming ： The input falls in non （-5,5） In the area , The derivative is close to 0, Will cause the gradient to disappear
nature ： 1） Its output can be regarded as probability distribution directly , So that the neural network can be better combined with the statistical learning model . 2） It can be seen as a soft door （Soft Gate）, Used to control the amount of information output by other neurons

5.2 Tanh
Insert picture description here
advantage ： The value range is (-1, 1), With 0 For the output center , Fast convergence
shortcoming ： The gradient disappears

5.3 ReLU: A slope （ramp） function

advantage ： use ReLU All you need to do is add 、 Multiplication and comparison operations , Computationally more efficient .
shortcoming ： The output is non-zero centered , Bias is introduced into the neural network of the latter layer , It affects the efficiency of gradient descent .
(1)Leaky ReLU： Input x < 0 when , Keep a small gradient λ. In this way, when the neuron is inactive, there can also be a non-zero gradient to update the parameters , Avoid never being activated
Insert picture description here
(2)PRelu： Introduce a learnable parameter , Different neurons can have different parameters

(3)ELU

5.4 Sotfplus

advantage ： Unilateral inhibition 、 Characteristics of wide excitation boundary
shortcoming ： No sparse activation

5.5 Swish: A self gating system （Self-Gated） Activation function
Insert picture description here
among σ(·) by Logistic function
Swish Functions can be regarded as linear functions and ReLU Nonlinear interpolation function between functions , The degree depends on the parameter β control

6. summary
Several problems that should be considered in designing activation function ：
（1） The activation function is monotonically continuous in the input field
（2） The output should preferably be in 0 Centered , It will make the convergence speed faster
（3） The output should preferably be free of saturation , Can solve the gradient disappearance
（4） Functions are easy to derive , The first derivative is 1, The second derivative is almost everywhere 0

版权声明
本文为[Indignant teasing]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204220600162619.html

当前位置：网站首页>Function, type and design principle of neural network activation function

Function, type and design principle of neural network activation function

边栏推荐

猜你喜欢

随机推荐