当前位置:网站首页>Advantages, disadvantages and selection of activation function
Advantages, disadvantages and selection of activation function
2022-04-23 15:27:00 【moletop】
Activation function :
-
significance : Increase the nonlinear modeling ability of the network , If there is no activation function , Then the network can only express linear mapping , Even if there are more hidden layers , The whole network is also equivalent to the single-layer neural network
-
Characteristics required :1. Continuous derivable .2, As simple as possible , Improve network computing efficiency .3, The value range is in the appropriate range , Otherwise, it will affect the training efficiency and stability .
-
Saturation activation function :Sigmoid、Tanh. Unsaturated activation function :ReLu. And the output layer ( classifier ) Of softmax
-
The choice of activation function : In the hidden layer ReLu>Tanh>Sigmoid .RNN in :Tanh,Sigmoid. Output layer :softmax( Classification task ). Neuronal death occurs , It can be used PRelu.
1**.Sigmoid**:
advantage :<1> Sigmoid The value range of is (0, 1), Coincidence probability , And monotonically increasing , Easier to optimize .
<2> Sigmoid Derivation is easier , It can be directly deduced that .
shortcoming :
<1> Sigmoid The function converges slowly .
<2> because Sigmoid It's soft saturation , It's easy to produce gradients that disappear , It is not suitable for deep network training, which is easy to cause the gradient to disappear .
<3> Sigmoid The function is not in the form of (0,0) For the center , Ring breaking data distribution .
2.Tanh function
advantage :<1> The function outputs in (0,0) Centered .shortcoming :<1> tanh There is no solution sigmoid The problem of gradient disappearance .
3.ReLU function
advantage :<1> stay SGD The convergence rate is faster than Sigmoid and tanh Much faster
<2> It effectively alleviates the problem of gradient disappearance .
shortcoming :
<1> Neuron disappointment is easy to appear in the process of training ( Negative half axis ), Then the gradient is always 0 The situation of , Cause irreversible death .
<2> The derivative is 1, Alleviate the problem of gradient disappearance , But it's easy to explode .
4.ReLu improvement
版权声明
本文为[moletop]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231523160750.html
边栏推荐
猜你喜欢
Openstack command operation
Basic operation of circular queue (Experiment)
我的树莓派 Raspberry Pi Zero 2W 折腾笔记,记录一些遇到的问题和解决办法
G007-HWY-CC-ESTOR-03 华为 Dorado V6 存储仿真器搭建
My raspberry PI zero 2W tossing notes record some problems encountered and solutions
字节面试 transformer相关问题 整理复盘
Detailed explanation of C language knowledge points -- first understanding of C language [1] - vs2022 debugging skills and code practice [1]
Mysql database explanation (8)
Tun model of flannel principle
Detailed explanation of kubernetes (IX) -- actual combat of creating pod with resource allocation list
随机推荐
机器学习——逻辑回归
Detailed explanation of C language knowledge points -- data types and variables [1] - carry counting system
Application of skiplist in leveldb
Differential privacy (background)
Comparaison du menu de l'illustrateur Adobe en chinois et en anglais
Have you really learned the operation of sequence table?
Mysql database explanation (IX)
A series of problems about the best time to buy and sell stocks
Squid agent
Code live collection ▏ software test report template Fan Wen is here
电脑怎么重装系统后显示器没有信号了
Collation of errors encountered in the use of redis shake
如何设计一个良好的API接口?
C language super complete learning route (collection allows you to avoid detours)
激活函数的优缺点和选择
Nacos program connects to mysql8 0+ NullPointerException
MySQL InnoDB transaction
T2 icloud calendar cannot be synchronized
API gateway / API gateway (II) - use of Kong - load balancing
Five data types of redis