当前位置：网站首页>Neural network learning small record 56 -- the principle and function of batch normalization layer

Neural network learning small record 56 -- the principle and function of batch normalization layer

2022-04-21 21:20:00 【Bubbliiiing】

Small records of neural network learning 56——Batch Normalization Principle and function of layer

Learn foreword
What is? Batch Normalization
Batch Normalization Calculation formula
Bn Benefits of layers
Why introduce γ and β Variable
Bn Layer code implementation

Learn foreword

Batch Normalization It is a layer commonly used in neural networks , It has solved many problems encountered in deep learning , Let's study together .
Insert picture description here

What is? Batch Normalization

Batch Normalization By google A training optimization method is proposed . Reference paper ：Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift.

Batch Normalization The name of is batch standardization , Its function is to make input X The data conforms to a distribution , This makes training easier 、 Fast .

In general ,Batch Normalization Will be placed behind the convolution layer , That is convolution + Standardization + Activation function .

The calculation process can be simply summarized as follows 3 spot ：
1、 Find the mean value of the data .
2、 Find the data variance .
3、 Data standardization .

Batch Normalization Calculation formula

Batch Normalization The calculation formula of is mainly shown in the following figure ：
Insert picture description here
We must calm down and look at this formula , The whole formula can be divided into four lines ：
1、 Yes Enter the incoming data X Conduct Mean value calculation .
2、 utilize Enter the incoming data X Subtract from the first step mean value , then Strives for the sum of squares , get Input X The variance of .
3、 utilize Input X、 The mean value obtained in the first step and The variance obtained in the second step Normalize the data , namely utilize X Subtract the mean , Then divide by the root sign above . Add a minimum before the root of variance .
4、 introduce γ and β Variable , Zoom and pan the input data . utilize γ and β Two parameters , Let our network learn to recover the feature distribution of the original network .

The first three steps are Standard chemical sequence , The last step is Reverse standardization process .

Bn Benefits of layers

1、 Accelerate the convergence speed of the network . In the neural network , There is The phenomenon of internal covariate offset , If If the data distribution of each layer is different , Can lead to very difficult convergence , If you put The data of each layer is changing to zero , The variance of 1 Under the state of , such The distribution of each layer of data is the same , Training will be easier to converge .

2、 Prevent gradient explosion and gradient disappearance . about In terms of gradient disappearance , With Sigmoid Function as an example , It will make the output in [0,1] Between , actually When x To a certain size ,sigmoid The gradient value of the activation function becomes very small , It's not easy to train . Normalized data , You can keep the gradient at a relatively large value and rate of change ; about In terms of gradient explosion , In the process of directional propagation , The gradient of each layer is obtained by multiplying the gradient of the previous layer by the data of this layer . If normalized , The data mean is 0 near , Obviously , The gradient of each layer will not produce an explosion .

3、 Prevent over fitting . In the network training ,Bn Make a minibatch All of the samples are linked together , Therefore, the network will not generate certain results from a certain training sample , In this way, the whole network will not study hard in this direction . Over fitting is avoided to some extent .

Why introduce γ and β Variable

Bn Layer after the first three steps , Will introduce γ and β Variable , Zoom and pan the input data .

γ and β Variables are network parameters , It's learnable .

introduce γ and β Variable to zoom and translate Sure Make the neural network have adaptive ability , When the standardization effect is good , Try not to offset the role of Standardization , And in the When the standardization effect is not good , Try to offset part of the effect of Standardization , It's equivalent to letting Should the neural network society standardize , How to compromise and choose .

Bn Layer code implementation

Pytorch The code looks simple , And it is very consistent with the above formula , You can learn , Reference to self-knowledge https://zhuanlan.zhihu.com/p/269465213：

def batch_norm(is_training, x, gamma, beta, moving_mean, moving_var, eps=1e-5, momentum=0.9):
    if not is_training:
        x_hat = (x - moving_mean) / torch.sqrt(moving_var + eps)
    else:
        mean = x.mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)
        var = ((x - mean) ** 2).mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)

        x_hat = (x - mean) / torch.sqrt(var + eps)
        moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
        moving_var = momentum * moving_var + (1.0 - momentum) * var
    Y = gamma * x_hat + beta
    return Y, moving_mean, moving_var
    
class BatchNorm2d(nn.Module):
    def __init__(self, num_features):
        super(BatchNorm2d, self).__init__()
        shape = (1, num_features, 1, 1)
        self.gamma = nn.Parameter(torch.ones(shape))
        self.beta = nn.Parameter(torch.zeros(shape))
        self.register_buffer('moving_mean', torch.zeros(shape))
        self.register_buffer('moving_var', torch.ones(shape))

    def forward(self, x):
        if self.moving_mean.device != x.device:
            self.moving_mean = self.moving_mean.to(x.device)
            self.moving_var = self.moving_var.to(x.device)
            
        y, self.moving_mean, self.moving_var = batch_norm(self.training,
            x, self.gamma, self.beta, self.moving_mean,
            self.moving_var, eps=1e-5, momentum=0.9)
        return y

版权声明
本文为[Bubbliiiing]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204212114258528.html