当前位置：网站首页>Pytorch learning record (7): skills in processing data and training models

Pytorch learning record (7): skills in processing data and training models

2022-04-23 05:54:00 【Zuo Xiaotian ^ o^】

Good data preprocessing and parameter initialization can quickly achieve twice the result with half the effort . Use some training skills in model training , Can make the model finally achieve state-of-art The effect of , In this section, we will talk about data processing and training models .

Data preprocessing

1. Centralization

Each feature dimension subtracts the corresponding mean to achieve centralization , This makes the data 0 mean value , Especially for some image data , For convenience, we subtract the same value from all the data .
What is zero mean
In deep learning , Generally, we will preprocess the training pictures fed to the network model , The most commonly used method is zero mean (zero-mean) Centralization , Even if The pixel value range becomes [-128,127], With 0 Centered .
effect
The advantage of this is to Accelerate the convergence of weight parameters of each layer in the network in back propagation .
You can avoid Z Type update , This can accelerate the convergence speed of neural network .

2. Standardization

In making the data become 0 After mean , We also need to use standardized methods to make different feature dimensions of data have the same scale .
There are two common methods ： One is divided by the standard deviation , In this way, the distribution of new data can be close to the standard Gaussian distribution ; Another way is to scale the maximum and minimum values of each feature dimension to -1～1 Between .

Insert picture description here

3. PCA Principal component analysis

Insert picture description here
Explain the website in detail https://blog.csdn.net/program_developer/article/details/80632779

4. White noise

First of all, I will follow PCA Project data into a feature space , Then each dimension is divided by the eigenvalue to standardize the data , Intuitively, a multivariate Gaussian distribution is transformed into a 0 mean value , The covariance matrix is 1 Multivariate Gaussian distribution .
Insert picture description here
In the actual processing data , Centralization and standardization are particularly important . We calculate the statistics of the training set, such as the mean , Then these statistics are applied to the test set and verification set . however PCA And white noise are not used in convolution networks , Because the convolution network can automatically learn how to extract these features without manual intervention .

Weight initialization

Detailed explanation of initialization method ：https://zhuanlan.zhihu.com/p/72374385
Preprocess parameters

1. whole 0 initialization

Initialize all parameters to 0. You can't take this approach

2. Random initialization

At present, we know that we want the weight initialization to be as close as possible to 0, But it can't all be equal to 0, So you can Initialize the weights to some values 0 The random number , In this way, symmetry can be broken . The general randomization strategy is Gaussian randomization 、 Uniform randomization, etc , It should be noted that the smaller the randomization, the better the results , Because the smaller the weight initialization , The smaller the gradient of weight in back propagation , Because the gradient is proportional to the size of the parameter , So this will greatly weaken the signal of gradient flow , It has become a hidden danger in neural network training .

3. Sparse initialization

Sparse initialization , Initialize all weights to 0, Then, in order to break the symmetry, randomly select some parameters and attach some random values . The advantage of this method is that the parameters occupy less memory , Because there are more 0, But it is rarely used in practice .

4. Initialize the bias （bias）

about ** bias （bias）, It is usually initialized to 0,** Because the weight has broken the symmetry , So use 0 To initialize is the simplest .

5. Batch of standardized （Batch Normalization）

Batch of standardized , Its core idea is standardization. This process is differentiable , Many unreasonable initialization problems are reduced , Therefore, we can apply the standardization process to each layer of neural network for forward propagation and back propagation , Batch standardization is usually applied behind the full connection layer 、 In front of nonlinear layer .