当前位置：网站首页>Introduction to standardization, regularization and normalization

Introduction to standardization, regularization and normalization

2022-04-23 20:31:00 【zjt597778912】

1. Standardization

Standardized formula ：z-score

$\frac {(X-mean)} {std}$

The calculation is correct Each attribute （ Each column ） separately .

For each column Every number All minus The mean value of the column , And divide by Standard deviation of this column .
The result is 0 Nearby and the variance is 1 .

Method realization sklearn.preprocessing.scale()

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
X = preprocessing.scale(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
   [ 0.          0.          0.        ]
   [ 1.22474487  1.22474487  1.22474487]]
'''

calculate

The standard deviation formula is ： Each number in this column is summed by subtracting the square of the average , Divide by the number of numbers and square it $\sqrt {\frac{\sum(x_i-mean)^2} n}$
The mean value of the first column is $\frac {(1+4+7)} 3=4$
The standard deviation of the first number in the first column is $\sqrt \frac { {(1-4)^2+(4-4)^2+(7-4)^2}} 3 =\sqrt {6}$
The first number in the first column is $\frac {1-4} {\sqrt {6}}=-1.22474487$

Method realization sklearn.preprocessing.StandardScaler()

sklearn The encapsulated algorithms in the must be used before use fit, For the subsequent API service

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
scaler = preprocessing.StandardScaler().fit(X) 
scaler.transform(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
   [ 0.          0.          0.        ]
   [ 1.22474487  1.22474487  1.22474487]]
'''

fit() Simply speaking , Is to get the training set X The average of , variance , Maximum , minimum value , These training sets X Inherent properties .
stay fit() On the basis of , Standardize , Dimension reduction , Normalization and other operations .

2. Regularization

Regularization ：

Scale each sample to the unit norm , Calculate its... For each sample p- norm , Then in the sample Every Number divided by the norm

p- Norm calculation formula ： $x_p= \sqrt[p]{\sum x_i^p}$

In general use l1-norm（p=1） or l2-norm（p=2）
For one sample, i.e a line data

Method realization ：sklearn.preprocessing.Normalizer()

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
normalizer = preprocessing.Normalizer().fit(X)
normalizer.transform(X)
'''
X= [[0.26726124 0.53452248 0.80178373]
   [0.45584231 0.56980288 0.68376346]
   [0.50257071 0.57436653 0.64616234]]
'''

calculate

The default is l2-norm
First line 2- norm $\sqrt {1^2+2^2+3^2}=\sqrt {14}$
The first number in the first line $\frac 1 {\sqrt {14}}=0.26726124$

3. normalization

Zoom the attribute to a specified range

common min-max Standardization is also called Deviation standardization

$X=\frac {X-min} {max-min}$
For an attribute, i.e A column of data

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
min_max_scaler = preprocessing.MinMaxScaler().fit(X)
min_max_scaler.transform(X)
'''
X= [[0.  0.  0. ]
   [0.5 0.5 0.5]
   [1.  1.  1. ]]
'''