当前位置:网站首页>Logistic regression -- case: cancer classification and prediction
Logistic regression -- case: cancer classification and prediction
2022-04-22 04:47:00 【weixin_ thirty-eight million eight hundred and seventy-one thou】
Case study : Cancer classification prediction - good / Prediction of malignant breast cancer
Data is introduced :

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score
# 1. get data
names = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape',
'Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin',
'Normal Nucleoli', 'Mitoses', 'Class']
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",
names=names)
# print(data.head())
# 2. Basic data processing
# 2.1 Missing value processing
data = data.replace(to_replace="?", value=np.nan)
data = data.dropna()
x = data.iloc[:, 1:-1]
# Remove target value
print(x.head())
y = data["Class"]
print(y.head())
# 2.3 Split data
x_train, x_test, y_train, y_test = train_test_split\
(x, y, random_state=22, test_size=0.2)
# 3. Feature Engineering ( Standardization )
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# 4. machine learning ( Logical regression )
estimator = LogisticRegression()
estimator.fit(x_train, y_train)
# 5. Model to evaluate
# 5.1 Accuracy rate
ret = estimator.score(x_test, y_test)
print(" Accuracy rate is :\n", ret)
# 5.2 Predictive value
y_pre = estimator.predict(x_test)
print(" Predictive value for :\n", y_pre)
# 5.3 Accuracy \ Recall index evaluation
ret = classification_report(y_test, y_pre, labels=(2, 4), target_names=(" Benign ", " Malignant "))
print(ret)
# 5.4 auc Index calculation
y_test = np.where(y_test>3, 1, 0)
print(roc_auc_score(y_test, y_pre))
Be careful :
The difference between prediction call function and regression
Return to use mse, Mean square error
For classification score, Compare
版权声明
本文为[weixin_ thirty-eight million eight hundred and seventy-one thou]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210804320948.html
边栏推荐
猜你喜欢
![C language simple [stack and queue] (bracket matching problem)](/img/3d/092daf5d4896aabaf12d34497fe643.png)
C language simple [stack and queue] (bracket matching problem)

JVM——》CMS

SAP export material inventory code

The role and risk of setsql using lamdbaupdatewrapper

线程池状态+ThreadPoolExecutor

Peer interview sharing Lenovo WinForm direction 20220420

Vue project NPM run build when packaging the project, time stamp the version number of CSS and JS files to prevent the browser from caching

kaggle實戰4.1--時間序列預測問題

vue项目 npm run build 打包项目时为css、js文件加时间戳版本号,防止浏览器缓存

Les racines et le contexte de Carina
随机推荐
SAP export material inventory code
Es next related
How did Jian Daoyun, who started a business in a small team of 10 people and now has an income of more than 100 million without financing, do it?
队列第二篇
[chestnut sugar GIS] ArcMap - how to make map combination table with buffer
CommDGI: Community detection oriented deep graph infomax 2020 CIKM
2022g2 boiler operator certificate examination question bank and online simulation examination
Pycharm + Anaconda installation package
Combined sum leetcode
12.libevent循环函数和退出测试
Rsync overview
Matlab曲线的颜色、线型等参数设置方法
2022G2电站锅炉司炉操作证考试题库及在线模拟考试
labelme的常用命令
链表第四篇
Deployment of web server, personal experience
matlab如何实现不同的值显示不同的颜色
Carina local storage selected into the CNCF cloud native panorama
Peer interview sharing Lenovo WinForm direction 20220420
[untitled]