当前位置:网站首页>Neural network learning small record 56 -- the principle and function of batch normalization layer
Neural network learning small record 56 -- the principle and function of batch normalization layer
2022-04-21 21:20:00 【Bubbliiiing】
Small records of neural network learning 56——Batch Normalization Principle and function of layer
Learn foreword
Batch Normalization It is a layer commonly used in neural networks , It has solved many problems encountered in deep learning , Let's study together .

What is? Batch Normalization
Batch Normalization By google A training optimization method is proposed . Reference paper :Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift.
Batch Normalization The name of is batch standardization , Its function is to make input X The data conforms to a distribution , This makes training easier 、 Fast .
In general ,Batch Normalization Will be placed behind the convolution layer , That is convolution + Standardization + Activation function .
The calculation process can be simply summarized as follows 3 spot :
1、 Find the mean value of the data .
2、 Find the data variance .
3、 Data standardization .
Batch Normalization Calculation formula
Batch Normalization The calculation formula of is mainly shown in the following figure :

We must calm down and look at this formula , The whole formula can be divided into four lines :
1、 Yes Enter the incoming data X Conduct Mean value calculation .
2、 utilize Enter the incoming data X Subtract from the first step mean value , then Strives for the sum of squares , get Input X The variance of .
3、 utilize Input X、 The mean value obtained in the first step and The variance obtained in the second step Normalize the data , namely utilize X Subtract the mean , Then divide by the root sign above . Add a minimum before the root of variance .
4、 introduce γ and β Variable , Zoom and pan the input data . utilize γ and β Two parameters , Let our network learn to recover the feature distribution of the original network .
The first three steps are Standard chemical sequence , The last step is Reverse standardization process .
Bn Benefits of layers
1、 Accelerate the convergence speed of the network . In the neural network , There is The phenomenon of internal covariate offset , If If the data distribution of each layer is different , Can lead to very difficult convergence , If you put The data of each layer is changing to zero , The variance of 1 Under the state of , such The distribution of each layer of data is the same , Training will be easier to converge .
2、 Prevent gradient explosion and gradient disappearance . about In terms of gradient disappearance , With Sigmoid Function as an example , It will make the output in [0,1] Between , actually When x To a certain size ,sigmoid The gradient value of the activation function becomes very small , It's not easy to train . Normalized data , You can keep the gradient at a relatively large value and rate of change ; about In terms of gradient explosion , In the process of directional propagation , The gradient of each layer is obtained by multiplying the gradient of the previous layer by the data of this layer . If normalized , The data mean is 0 near , Obviously , The gradient of each layer will not produce an explosion .
3、 Prevent over fitting . In the network training ,Bn Make a minibatch All of the samples are linked together , Therefore, the network will not generate certain results from a certain training sample , In this way, the whole network will not study hard in this direction . Over fitting is avoided to some extent .
Why introduce γ and β Variable
Bn Layer after the first three steps , Will introduce γ and β Variable , Zoom and pan the input data .
γ and β Variables are network parameters , It's learnable .
introduce γ and β Variable to zoom and translate Sure Make the neural network have adaptive ability , When the standardization effect is good , Try not to offset the role of Standardization , And in the When the standardization effect is not good , Try to offset part of the effect of Standardization , It's equivalent to letting Should the neural network society standardize , How to compromise and choose .
Bn Layer code implementation
Pytorch The code looks simple , And it is very consistent with the above formula , You can learn , Reference to self-knowledge https://zhuanlan.zhihu.com/p/269465213:
def batch_norm(is_training, x, gamma, beta, moving_mean, moving_var, eps=1e-5, momentum=0.9):
if not is_training:
x_hat = (x - moving_mean) / torch.sqrt(moving_var + eps)
else:
mean = x.mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)
var = ((x - mean) ** 2).mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)
x_hat = (x - mean) / torch.sqrt(var + eps)
moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
moving_var = momentum * moving_var + (1.0 - momentum) * var
Y = gamma * x_hat + beta
return Y, moving_mean, moving_var
class BatchNorm2d(nn.Module):
def __init__(self, num_features):
super(BatchNorm2d, self).__init__()
shape = (1, num_features, 1, 1)
self.gamma = nn.Parameter(torch.ones(shape))
self.beta = nn.Parameter(torch.zeros(shape))
self.register_buffer('moving_mean', torch.zeros(shape))
self.register_buffer('moving_var', torch.ones(shape))
def forward(self, x):
if self.moving_mean.device != x.device:
self.moving_mean = self.moving_mean.to(x.device)
self.moving_var = self.moving_var.to(x.device)
y, self.moving_mean, self.moving_var = batch_norm(self.training,
x, self.gamma, self.beta, self.moving_mean,
self.moving_var, eps=1e-5, momentum=0.9)
return y
版权声明
本文为[Bubbliiiing]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204212114258528.html
边栏推荐
- Let's take a look at this super comprehensive Android interview questions and analysis
- 迅为RK3568开发板交叉编译C程序
- Future development of manufacturing industry after digital transformation
- Bailian3722 因子问题【枚举】
- Principal component analysis R language implementation
- MapReduce服务初体验
- Others - use of siege pressure test tool
- UART learning
- Others - MYCAT realizes sub database and sub table
- Product growth framework: from web2 to Web3
猜你喜欢
![[ZigBee wireless communication module step by step] zigbee3 0 module to establish remote network control method](/img/d5/5102aae908f444e187ebc812e236a9.png)
[ZigBee wireless communication module step by step] zigbee3 0 module to establish remote network control method

10分钟快速入门RDS

Connection between Tongda OA and third-party app

Principal component analysis R language implementation

Manuel d'utilisation et de développement de la plate - forme de connexion unique pour l'amarrage du système d'AP de Tongda

通达oa工作流升级 操作说明

Matlab handle graphics

强化学习基础篇(一):多臂老虎机 Multi-armed Bandit

Guanglianda is positioned as a digital "enabler"

MySQL numeric function
随机推荐
Manuel d'utilisation et de développement de la plate - forme de connexion unique pour l'amarrage du système d'AP de Tongda
通道满了 继续往里面发 会如何?
Bailian4105 拯救公主【BFS】
Module 3 operation architecture design document
Operation instructions for upgrading Tongda OA workflow
Module-3: Outsourcing student management system architecture design document
档案管理系统操作说明
产品增长框架:从 Web2 到 Web3
工作流操作说明
【常用快捷键】
工作流 流程设置 定制开发
剑指 Offer 15. 二进制中1的个数
强化学习基础篇(一):多臂老虎机 Multi-armed Bandit
SQL error: ora-01428: parameter '0' is out of range 00000 - “argument ‘%s‘ is out of range“
25.<tag-数组和模拟>-lt- 31.下一个排列 + lt- 556. 下一个更大元素 III
【sql】SQL32 将employees表的所有员工的last_name和first_name拼接
Importance of slip ring technology in machine operation
公文管理系统案例展示
The annual salary is 170W. Alibaba P8 blind date requires the woman's monthly salary of 10000. Netizen: it's a little high
工作流报表设置 定制开发