当前位置:网站首页>实验四、数据预处理
实验四、数据预处理
2022-04-22 07:01:00 【学到地中海】
数据科学导论——数据预处理
第1关:引言-根深之树不怯风折,泉深之水不会涸竭

第2关:数据清理-查漏补缺
本关任务:模型的基线取决于数据的好坏,本关将学习数据处理的常用处理
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def student():
train = pd.read_csv('Task1/diabetes_null.csv', na_values=['#NAME?'])
train['Insulin'] = train['Insulin'].fillna(100)
train['SkinThickness'] = train['SkinThickness'].fillna(train['SkinThickness'].median())
train['BloodPressure'] = train['BloodPressure'].fillna(train['BloodPressure'].median())
train['BMI'] = train['BMI'].fillna(train['BMI'].mean())
train['Glucose'] = train['Glucose'].fillna(train['Glucose'].mean())
#********* Begin *********#
train.sort_values(by='Age', ascending=False)[:1]
train = train.drop((train[train['Age'] >= 80]).index)
plt.figure(figsize=(10, 10))
plt.scatter(x=train['Age'], y=train['Pregnancies'])
plt.savefig("Task1/img/T1.png")
plt.show()
#********* End *********#
第3关:数据集成-海纳百川
本关任务:编写一个能够将两个数据集合并的程序。
import numpy as np
import pandas as pd
def student():
#********* Begin *********#
train = pd.read_csv('Task2/diabetes_null.csv', na_values=['#NAME?'])
another_train = pd.read_csv('Task2/diabetes_zero.csv', na_values=['#NAME?'])
merge_data=pd.concat([train,another_train])
print(merge_data.shape)
#********* End *********#
第4关:数据变换-同源共流
本关任务:学习掌握常见的数据变换,完成给定数据的规范化的输出。
import numpy as np
import pandas as pd
from sklearn.preprocessing import normalize,MinMaxScaler
def student():
train = pd.read_csv('Task3/diabetes_null.csv', na_values=['#NAME?'])
train['Insulin'] = train['Insulin'].fillna(100)
train['SkinThickness'] = train['SkinThickness'].fillna(train['SkinThickness'].median())
train['BloodPressure'] = train['BloodPressure'].fillna(train['BloodPressure'].median())
train['BMI'] = train['BMI'].fillna(train['BMI'].mean())
train['Glucose'] = train['Glucose'].fillna(train['Glucose'].mean())
#********* Begin *********#
data_normalized=normalize(train,axis=0)
print("z-score规范化:\n",data_normalized)
data_scaler=MinMaxScaler()
data_scaled=data_scaler.fit_transform(train)
print("\n最小-最大规范化:\n",data_scaled)
#********* End *********#
数据科学导论——数据预处理进阶
第1关:数据归约
本关任务:使用直方图展示不同年龄的发病次数。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def student():
train = pd.read_csv('Task1/diabetes_null.csv', na_values=['#NAME?'])
train['Insulin'] = train['Insulin'].fillna(100)
train['SkinThickness'] = train['SkinThickness'].fillna(train['SkinThickness'].median())
train['BloodPressure'] = train['BloodPressure'].fillna(train['BloodPressure'].median())
train['BMI'] = train['BMI'].fillna(train['BMI'].mean())
train['Glucose'] = train['Glucose'].fillna(train['Glucose'].mean())
#********* Begin *********#
plt.figure(figsize=(10,10))
x=pd.Series(train['Age'])
count=x.value_counts()
count.plot(kind='bar')
plt.savefig("Task1/img/T1.png")
plt.show()
#********* End *********#
第2关:数据离散化
本关任务:对 Age 进行分区。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def student():
train = pd.read_csv('Task1/diabetes_null.csv', na_values=['#NAME?']).dropna()
#********* Begin *********#
bin=[0,30,50,90]
a=pd.cut(train["Age"],bin)
print(pd.value_counts(a))
#********* End *********#
版权声明
本文为[学到地中海]所创,转载请带上原文链接,感谢
https://blog.csdn.net/C_white_llj/article/details/124317163
边栏推荐
猜你喜欢

汉源高科PDH光端机双光口保护+4路E1+4路千兆网络+4路百兆网络光端机

Hanyuan hi tech 8-way multi service PDH optical transceiver double optical port protection + 8-way E1 + 4-way Gigabit Ethernet + 4-way 100m network electric port

2022年全国中职组网络安全国赛赛题思路(仅自己一个做题的思路)——网络安全竞赛试题(7)

OneNET连接流程

Monkey 简介操作
![FileNotFoundError: [Errno 2] No such file or directory](/img/ea/0c3f2768d14c1f4bb42bd1309ab996.png)
FileNotFoundError: [Errno 2] No such file or directory

Hanyuan hi tech 8e1 private network 4-way 100m isolated network PDH optical transceiver E1 private network service 16m service optical transceiver

Provide 4-way E1 service ports, 4-way 100m isolation network, 2m E1 private network, multi service PDH optical transceiver

汇编学习《汇编语言(第三版)》王爽著第三章学习

TP5继承Base,使用base中的变量
随机推荐
【newcoder】简单题2(格式化输出)
动态顺序表+OJ
LeetCode_118. 杨辉三角_动态规划_int**的学习
Eight elder brothers chronicle [3]
牛客白月赛5 【题解 数学场】
Provide 4-way E1 service ports, 4-way 100m isolation network, 2m E1 private network, multi service PDH optical transceiver
Dual optical port 1 + 1 backup 8-way E1 + 2-way Gigabit isolation network 4-way 100m isolation PDH optical transceiver
HLS / Chisel 利用CORDIC双曲系统实现平方根计算
php 使用redis简单实例
tf.keras.layers.InputLayer函数
16路E1光端机+4路百兆以太网络光端机PDH光端机2M综合业务光端机
ER 和 EER 模型
TP5 发送邮件(2020-05-27)
Integrated optical access equipment 4-way 100m isolated Ethernet + 16-way E1 private network service 2m integrated service optical transceiver
Byte experts jointly create the latest jetpack compose complete set of learning notes, including project practice exercises (with demo)
ACM入门之【TSP问题】
JMeter parameter request type
openFeign 服务调用
TP5 在 extend 目录下 自定义成功失败返回信息
PDH光端机4路E1+4路百兆以太网 4路2M光端机 FC单纤20公里 机架式