当前位置:网站首页>Feature Engineering - feature preprocessing (normalization, standardization)
Feature Engineering - feature preprocessing (normalization, standardization)
2022-04-22 04:47:00 【weixin_ thirty-eight million eight hundred and seventy-one thou】
Why do we normalize / Standardization ?
Unification of feature weights
Dimensionless
The units or sizes of features vary greatly , Or the variance of a feature is several orders of magnitude larger than that of other features , Easy to influence ( control ) Target result , Some algorithms cannot learn other features
We need to use some methods to dimensionless , Convert data of different specifications to the same specification
Include content ( Dimensionless of numerical data )
normalization
Standardization

Feature preprocessing API
sklearn.preprocessing
sklearn.preprocessing.MinMaxScaler (feature_range=(0,1)… )
MinMaxScalar.fit_transform(X)
X:numpy array Formatted data [n_samples,n_features]
Return value : The transformed shape is the same array
Normalization demonstration
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
def minmax_demo():
""" Normalization demonstration :return: None """
data = pd.read_csv("./dating.txt")
print(data)
# 1、 Instantiate a converter class
transfer = MinMaxScaler(feature_range=(2, 3))
# 2、 call fit_transform
data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
print(" The result of normalization of minimum and maximum values :\n", data)
return None
if __name__ == '__main__':
minmax_demo()


Robustness is poor
Standardization :
Particularly vulnerable to maximum , The minimum value affects , That is, the influence of special points and outliers , such as , The age is counted into hundreds
( Robustness )
So we need to standardize

API
sklearn.preprocessing.StandardScaler( )
After processing, all data in each column is clustered in the mean value 0 The standard deviation is 1
StandardScaler.fit_transform(X)
X:numpy array Formatted data [n_samples,n_features]
Return value : The transformed shape is the same array
Standardized demonstration :
import pandas as pd
from sklearn.preprocessing import StandardScaler
def stand_demo():
""" Standardized demonstration :return: None """
data = pd.read_csv("dating.txt")
print(data)
# 1、 Instantiate a converter class
transfer = StandardScaler()
# 2、 call fit_transform
data = transfer.fit_transform(data[['milage','Liters','Consumtime']])
print(" The result of Standardization :\n", data)
print(" The average value of each column of features :\n", transfer.mean_)
print(" The variance of each column characteristic :\n", transfer.var_)
return None
if __name__ == '__main__':
# minmax_demo()
stand_demo()

Standardization is generally used , big data , Noisy scene
版权声明
本文为[weixin_ thirty-eight million eight hundred and seventy-one thou]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210804321430.html
边栏推荐
- Leetcode refers to offer 65 No addition, subtraction, multiplication and division***
- 14.buffferevent超时事件处理
- Shell variables $, $@, $0, $1, $2, ${},%% use explanation and easy-to-use shell formatting tools
- Carina's foundation and birth background | deeply understand the first issue of carina series
- Pycharm + Anaconda installation package
- 2022T电梯修理考试试题及在线模拟考试
- How to combine acrobat Pro DC with other files to create a single PDF file?
- 链表第四篇
- Replace vscode interpreter
- 安装opencv时遇到的报错
猜你喜欢

Carina local storage selected into the CNCF cloud native panorama
![[Chestnut Sugar GIS] SuperMap - How to make hyperlinks for Data](/img/6d/8b46c96d3ec1005481919e5e81ce80.png)
[Chestnut Sugar GIS] SuperMap - How to make hyperlinks for Data

线程池状态+ThreadPoolExecutor

【板栗糖GIS】supermap—如何为数据制造超链接
![[chestnut sugar GIS] SuperMap - how to create hyperlinks for data](/img/6d/8b46c96d3ec1005481919e5e81ce80.png)
[chestnut sugar GIS] SuperMap - how to create hyperlinks for data

企业数据防泄露方案分享

kaggle实战4.1--时间序列预测问题

2022P气瓶充装考试题库及模拟考试

C-随手写10

Deployment of web server, personal experience
随机推荐
Linked list Chapter 4
2022g2 power station boiler stoker examination exercises and online simulation examination
WAP picture
2021-08-14
Rsync overview
rpc error: code = Unavailable desc = error reading from server: EOF
【Selenium】UnitTest测试框架的基本应用
【板栗糖GIS】supermap—如何为数据制造超链接
2022t elevator repair test exercises and online simulation test
[chestnut sugar GIS] ArcMap - how to combine multiple images into one
博云 BeyondCMP 云管理平台 5.6 版本发布
ES next相关
Win10 Caton repair
-bash: /home/lylg/bin/kf. sh: /bin/bash^M: bad interpreter: No such file or directory
2021-08-14
Use specified graphics card
7_ Data analysis - Evaluation
Inotify简述
Convert a matrix into a sparse matrix, and then convert a sparse matrix into a matrix (Part I)
I'll test the timing again