当前位置:网站首页>Introduction to standardization, regularization and normalization
Introduction to standardization, regularization and normalization
2022-04-23 20:31:00 【zjt597778912】
1. Standardization
Standardized formula :z-score
X = ( X − m e a n ) s t d X = \frac {(X-mean)} {std} X=std(X−mean)
The calculation is correct Each attribute ( Each column ) separately .
For each column Every number All minus The mean value of the column , And divide by Standard deviation of this column .
The result is 0 Nearby and the variance is 1 .
Method realization sklearn.preprocessing.scale()
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
X = preprocessing.scale(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
[ 0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]
'''
calculate
The standard deviation formula is : Each number in this column is summed by subtracting the square of the average , Divide by the number of numbers and square it s t d = ∑ ( x i − m e a n ) 2 n std = \sqrt {\frac{\sum(x_i-mean)^2} n} std=n∑(xi−mean)2
The mean value of the first column is ( 1 + 4 + 7 ) 3 = 4 \frac {(1+4+7)} 3=4 3(1+4+7)=4
The standard deviation of the first number in the first column is ( 1 − 4 ) 2 + ( 4 − 4 ) 2 + ( 7 − 4 ) 2 3 = 6 \sqrt \frac { {(1-4)^2+(4-4)^2+(7-4)^2}} 3 =\sqrt {6} 3(1−4)2+(4−4)2+(7−4)2=6
The first number in the first column is 1 − 4 6 = − 1.22474487 \frac {1-4} {\sqrt {6}}=-1.22474487 61−4=−1.22474487
Method realization sklearn.preprocessing.StandardScaler()
sklearn The encapsulated algorithms in the must be used before use fit, For the subsequent API service
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
scaler = preprocessing.StandardScaler().fit(X)
scaler.transform(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
[ 0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]
'''
fit() Simply speaking , Is to get the training set X The average of , variance , Maximum , minimum value , These training sets X Inherent properties .
stay fit() On the basis of , Standardize , Dimension reduction , Normalization and other operations .
2. Regularization
Regularization :
- Scale each sample to the unit norm , Calculate its... For each sample p- norm , Then in the sample Every Number divided by the norm
p- Norm calculation formula : x p = ∑ x i p p x_p= \sqrt[p]{\sum x_i^p} xp=p∑xip
In general use l1-norm(p=1) or l2-norm(p=2)
For one sample, i.e a line data
Method realization :sklearn.preprocessing.Normalizer()
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
normalizer = preprocessing.Normalizer().fit(X)
normalizer.transform(X)
'''
X= [[0.26726124 0.53452248 0.80178373]
[0.45584231 0.56980288 0.68376346]
[0.50257071 0.57436653 0.64616234]]
'''
calculate
- The default is l2-norm
First line 2- norm 1 2 + 2 2 + 3 2 = 14 \sqrt {1^2+2^2+3^2}=\sqrt {14} 12+22+32=14
The first number in the first line 1 14 = 0.26726124 \frac 1 {\sqrt {14}}=0.26726124 141=0.26726124
3. normalization
- Zoom the attribute to a specified range
common min-max Standardization is also called Deviation standardization
X = X − m i n m a x − m i n X=\frac {X-min} {max-min} X=max−minX−min
For an attribute, i.e A column of data
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
min_max_scaler = preprocessing.MinMaxScaler().fit(X)
min_max_scaler.transform(X)
'''
X= [[0. 0. 0. ]
[0.5 0.5 0.5]
[1. 1. 1. ]]
'''
calculate
The first number in the first column 1 − 1 7 − 1 = 0 \frac {1-1} {7-1}=0 7−11−1=0
The second number in the first column 4 − 1 7 − 1 = 0.5 \frac {4-1} {7-1}=0.5 7−14−1=0.5
版权声明
本文为[zjt597778912]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210550240962.html
边栏推荐
- JDBC tool class jdbcfiledateutil uploads files and date format conversion, including the latest, simplest and easiest way to upload single files and multiple files
- LeetCode 20、有效的括号
- LeetCode 1337、矩阵中战斗力最弱的 K 行
- SQL gets the latest record of the data table
- Handwritten Google's first generation distributed computing framework MapReduce
- Analysis of the relationship between generalized Bim and CAD under the current background
- Go language development Daily Fresh Project Day 3 Case - Press Release System II
- Implementation of mypromise
- LeetCode 1351、统计有序矩阵中的负数
- SQL: query duplicate data and delete duplicate data
猜你喜欢
Numpy mathematical function & logical function
LeetCode 116. Populate the next right node pointer for each node
PIP installation package reports an error. Could not find a version that satisfies the requirement pymysql (from versions: none)
[talkative cloud native] load balancing - the passenger flow of small restaurants has increased
Automatically fill in body temperature and win10 task plan
Linux64Bit下安装MySQL5.6-不能修改root密码
Devaxpress report replay: complete the drawing of conventional two-dimensional report + histogram + pie chart
. Ren -- the intimate artifact in the field of vertical Recruitment!
How to protect ECs from hacker attacks?
Leetcode 74. Search two-dimensional matrix
随机推荐
Azkaban recompile, solve: could not connect to SMTP host: SMTP 163.com, port: 465 [January 10, 2022]
GO语言开发天天生鲜项目第三天 案例-新闻发布系统二
Leetcode 1337. Row K with the weakest combat effectiveness in the matrix
LeetCode 1337、矩阵中战斗力最弱的 K 行
Syntaxerror: unexpected token r in JSON at position 0
Go language development Daily Fresh Project Day 3 Case - Press Release System II
Case of the third day of go language development fresh every day project - news release system II
After route link navigation, the sub page does not display the navigation style problem
PIP installation package reports an error. Could not find a version that satisfies the requirement pymysql (from versions: none)
[problem solving] 'ASCII' codec can't encode characters in position XX XX: ordinal not in range (128)
SQL Server connectors by thread pool 𞓜 instructions for dtsqlservertp plug-in
go-zero框架数据库方面避坑指南
Zdns was invited to attend the annual conference of Tencent cloud basic resources and share the 2020 domain name industry development report
LeetCode 116. Populate the next right node pointer for each node
Vscode download speed up
Plato farm is one of the four largest online IEOS in metauniverse, and the transaction on the chain is quite high
【PTA】整除光棍
Scripy tutorial - (2) write a simple crawler
Solution: NPM err! code ELIFECYCLE npm ERR! errno 1
[PTA] l1-002 printing hourglass