当前位置:网站首页>Introduction to standardization, regularization and normalization

Introduction to standardization, regularization and normalization

2022-04-23 20:31:00 zjt597778912

1. Standardization

Standardized formula :z-score

X = ( X − m e a n ) s t d X = \frac {(X-mean)} {std} X=std(Xmean)

The calculation is correct Each attribute ( Each column ) separately .

For each column Every number All minus The mean value of the column , And divide by Standard deviation of this column .
The result is 0 Nearby and the variance is 1 .

Method realization sklearn.preprocessing.scale()

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
X = preprocessing.scale(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
   [ 0.          0.          0.        ]
   [ 1.22474487  1.22474487  1.22474487]]
'''

calculate

The standard deviation formula is : Each number in this column is summed by subtracting the square of the average , Divide by the number of numbers and square it s t d = ∑ ( x i − m e a n ) 2 n std = \sqrt {\frac{\sum(x_i-mean)^2} n} std=n(ximean)2
The mean value of the first column is ( 1 + 4 + 7 ) 3 = 4 \frac {(1+4+7)} 3=4 3(1+4+7)=4
The standard deviation of the first number in the first column is ( 1 − 4 ) 2 + ( 4 − 4 ) 2 + ( 7 − 4 ) 2 3 = 6 \sqrt \frac { {(1-4)^2+(4-4)^2+(7-4)^2}} 3 =\sqrt {6} 3(14)2+(44)2+(74)2 =6
The first number in the first column is 1 − 4 6 = − 1.22474487 \frac {1-4} {\sqrt {6}}=-1.22474487 6 14=1.22474487

Method realization sklearn.preprocessing.StandardScaler()

sklearn The encapsulated algorithms in the must be used before use fit, For the subsequent API service

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
scaler = preprocessing.StandardScaler().fit(X) 
scaler.transform(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
   [ 0.          0.          0.        ]
   [ 1.22474487  1.22474487  1.22474487]]
'''

fit() Simply speaking , Is to get the training set X The average of , variance , Maximum , minimum value , These training sets X Inherent properties .
stay fit() On the basis of , Standardize , Dimension reduction , Normalization and other operations .

2. Regularization

Regularization :

  • Scale each sample to the unit norm , Calculate its... For each sample p- norm , Then in the sample Every Number divided by the norm

p- Norm calculation formula : x p = ∑ x i p p x_p= \sqrt[p]{\sum x_i^p} xp=pxip

In general use l1-norm(p=1) or l2-norm(p=2)
For one sample, i.e a line data

Method realization :sklearn.preprocessing.Normalizer()

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
normalizer = preprocessing.Normalizer().fit(X)
normalizer.transform(X)
'''
X= [[0.26726124 0.53452248 0.80178373]
   [0.45584231 0.56980288 0.68376346]
   [0.50257071 0.57436653 0.64616234]]
'''

calculate

  • The default is l2-norm
    First line 2- norm 1 2 + 2 2 + 3 2 = 14 \sqrt {1^2+2^2+3^2}=\sqrt {14} 12+22+32 =14
    The first number in the first line 1 14 = 0.26726124 \frac 1 {\sqrt {14}}=0.26726124 14 1=0.26726124

3. normalization

  • Zoom the attribute to a specified range

common min-max Standardization is also called Deviation standardization

X = X − m i n m a x − m i n X=\frac {X-min} {max-min} X=maxminXmin
For an attribute, i.e A column of data

from sklearn import preprocessing
import numpy as np

X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
     [4. 5. 6.]
     [7. 8. 9.]]
'''
min_max_scaler = preprocessing.MinMaxScaler().fit(X)
min_max_scaler.transform(X)
'''
X= [[0.  0.  0. ]
   [0.5 0.5 0.5]
   [1.  1.  1. ]]
'''

calculate

The first number in the first column 1 − 1 7 − 1 = 0 \frac {1-1} {7-1}=0 7111=0
The second number in the first column 4 − 1 7 − 1 = 0.5 \frac {4-1} {7-1}=0.5 7141=0.5

版权声明
本文为[zjt597778912]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210550240962.html