当前位置:网站首页>Introduction to standardization, regularization and normalization
Introduction to standardization, regularization and normalization
2022-04-23 20:31:00 【zjt597778912】
1. Standardization
Standardized formula :z-score
X = ( X − m e a n ) s t d X = \frac {(X-mean)} {std} X=std(X−mean)
The calculation is correct Each attribute ( Each column ) separately .
For each column Every number All minus The mean value of the column , And divide by Standard deviation of this column .
The result is 0 Nearby and the variance is 1 .
Method realization sklearn.preprocessing.scale()
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
X = preprocessing.scale(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
[ 0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]
'''
calculate
The standard deviation formula is : Each number in this column is summed by subtracting the square of the average , Divide by the number of numbers and square it s t d = ∑ ( x i − m e a n ) 2 n std = \sqrt {\frac{\sum(x_i-mean)^2} n} std=n∑(xi−mean)2
The mean value of the first column is ( 1 + 4 + 7 ) 3 = 4 \frac {(1+4+7)} 3=4 3(1+4+7)=4
The standard deviation of the first number in the first column is ( 1 − 4 ) 2 + ( 4 − 4 ) 2 + ( 7 − 4 ) 2 3 = 6 \sqrt \frac { {(1-4)^2+(4-4)^2+(7-4)^2}} 3 =\sqrt {6} 3(1−4)2+(4−4)2+(7−4)2=6
The first number in the first column is 1 − 4 6 = − 1.22474487 \frac {1-4} {\sqrt {6}}=-1.22474487 61−4=−1.22474487
Method realization sklearn.preprocessing.StandardScaler()
sklearn The encapsulated algorithms in the must be used before use fit, For the subsequent API service
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
scaler = preprocessing.StandardScaler().fit(X)
scaler.transform(X)
'''
X= [[-1.22474487 -1.22474487 -1.22474487]
[ 0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]
'''
fit() Simply speaking , Is to get the training set X The average of , variance , Maximum , minimum value , These training sets X Inherent properties .
stay fit() On the basis of , Standardize , Dimension reduction , Normalization and other operations .
2. Regularization
Regularization :
- Scale each sample to the unit norm , Calculate its... For each sample p- norm , Then in the sample Every Number divided by the norm
p- Norm calculation formula : x p = ∑ x i p p x_p= \sqrt[p]{\sum x_i^p} xp=p∑xip
In general use l1-norm(p=1) or l2-norm(p=2)
For one sample, i.e a line data
Method realization :sklearn.preprocessing.Normalizer()
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
normalizer = preprocessing.Normalizer().fit(X)
normalizer.transform(X)
'''
X= [[0.26726124 0.53452248 0.80178373]
[0.45584231 0.56980288 0.68376346]
[0.50257071 0.57436653 0.64616234]]
'''
calculate
- The default is l2-norm
First line 2- norm 1 2 + 2 2 + 3 2 = 14 \sqrt {1^2+2^2+3^2}=\sqrt {14} 12+22+32=14
The first number in the first line 1 14 = 0.26726124 \frac 1 {\sqrt {14}}=0.26726124 141=0.26726124
3. normalization
- Zoom the attribute to a specified range
common min-max Standardization is also called Deviation standardization
X = X − m i n m a x − m i n X=\frac {X-min} {max-min} X=max−minX−min
For an attribute, i.e A column of data
from sklearn import preprocessing
import numpy as np
X = np.linspace(1,9,9).reshape((3,3))
'''
X = [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
'''
min_max_scaler = preprocessing.MinMaxScaler().fit(X)
min_max_scaler.transform(X)
'''
X= [[0. 0. 0. ]
[0.5 0.5 0.5]
[1. 1. 1. ]]
'''
calculate
The first number in the first column 1 − 1 7 − 1 = 0 \frac {1-1} {7-1}=0 7−11−1=0
The second number in the first column 4 − 1 7 − 1 = 0.5 \frac {4-1} {7-1}=0.5 7−14−1=0.5
版权声明
本文为[zjt597778912]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210550240962.html
边栏推荐
- PostgreSQL basic functions
- [PTA] l1-006 continuity factor
- Actual measurement of automatic ticket grabbing script of barley network based on selenium (the first part of the new year)
- How to do after winning the new debt? Is it safe to open an account online
- The construction and use of Fortress machine and springboard machine jumpserver are detailed in pictures and texts
- 上海回應“面粉官網是非法網站”:疏於運維被“黑”,警方已立案
- 三十.什么是vm和vc?
- Some basic configurations in interlij idea
- [graph theory brush question-5] Li Kou 1971 Find out if there is a path in the graph
- Commande dos pour la pénétration de l'Intranet
猜你喜欢
Scrapy教程 - (2)寫一個簡單爬蟲
Installation and use of NVM
UnhandledPromiseRejectionwarning:CastError: Cast to ObjectId failed for value
JS arrow function user and processing method of converting arrow function into ordinary function
上海回应“面粉官网是非法网站”:疏于运维被“黑”,警方已立案
Customize timeline component styles
BMP JPEG 图片转换为矢量图像 ContourTrace
GO语言开发天天生鲜项目第三天 案例-新闻发布系统二
SQL Server Connectors By Thread Pool | DTSQLServerTP 插件使用说明
After route link navigation, the sub page does not display the navigation style problem
随机推荐
Come in and teach you how to solve the problem of port occupation
bounding box iou
Mysql database and table building: the difference between utf8 and utf8mb4
三十.什么是vm和vc?
[PTA] l1-006 continuity factor
LeetCode 232、用栈实现队列
Es error: request contains unrecognized parameter [ignore_throttled]
黑客的入侵方式你知道几种?
16MySQL之DCL 中 COMMIT和ROllBACK
DNS cloud school | analysis of hidden tunnel attacks in the hidden corner of DNS
Commit and ROLLBACK in DCL of 16mysql
The ODB model calculates the data and outputs it to excel
Linux64Bit下安装MySQL5.6-不能修改root密码
Mathematical modeling column | Part 5: MATLAB optimization model solving method (Part I): Standard Model
BMP JPEG picture to vector image contourtrace
The second method of file upload in form form is implemented by fileitem class, servletfileupload class and diskfileitemfactory class.
【问题解决】‘ascii‘ codec can‘t encode characters in position xx-xx: ordinal not in range(128)
論文寫作 19: 會議論文與期刊論文的區別
Rédaction de thèses 19: différences entre les thèses de conférence et les thèses périodiques
Solution: NPM err! code ELIFECYCLE npm ERR! errno 1