当前位置:网站首页>Machine learning III: classification prediction based on logistic regression
Machine learning III: classification prediction based on logistic regression
2022-04-23 07:19:00 【Amyniez】
1 Introduction and application of logistic regression
1.1 Introduction of logical regression
Logical regression (Logistic regression, abbreviation LR) Although it has " Return to " Two words , But logistic regression is actually a classification Model , And widely used in various fields . Although deep learning is now more popular than these traditional methods , But in fact, these traditional methods are still widely used in various fields due to their unique advantages .
And for logistic regression, and , The most prominent two points are its Simple model and The model has strong explanatory power .
Advantages and disadvantages of logistic regression model :
- advantage : Implement a simple , Easy to understand and implement ; It's not expensive to calculate , fast , Low storage resources ;
- shortcoming : Easy under fitting , Classification accuracy may not be high
1.1 Application of logistic regression
Logistic regression model is widely used in various fields , Including machine learning , Most medical fields and Social Sciences . for example , By the first Boyd The trauma and injury severity score developed by et al (TRISS) It is widely used to predict the mortality of injured patients , Use logical regression Based on the observed patient characteristics ( Age , Gender , Body mass index , Results of various blood tests, etc ) Analysis predicts the occurrence of specific diseases ( For example, diabetes , Coronary heart disease (CHD) ) The risk of . Logistic regression models are also used to predict in a given process , The possibility of system or product failure . It's also used for marketing applications , For example, predict the tendency of customers to buy products or stop ordering . In economics, it can be used to predict the possibility of a person choosing to enter the labor market , And commercial applications can be used to predict the likelihood that homeowners will default on their mortgage . Conditional random fields are extensions of logistic regression to sequential data , For natural language processing .
Logistic regression model is also the basic component of many classification algorithms , such as The classification task is based on GBDT Algorithm +LR Credit card transaction anti fraud realized by logistic regression ,CTR( Click through rate ) Estimate, etc , The advantage is that the output value naturally falls in 0 To 1 Between , And it has probabilistic significance . The model is clear , There is a corresponding theoretical basis of probability . The parameters it fits represent every feature (feature) The effect on the result . It's also a good tool for understanding data . But at the same time, because it is essentially a linear classifier , So we can't deal with the more complicated data situation . A lot of times, we will also use the logistic regression model to do some baseline tasks baseline( Basic level ).
4.1 Demo
Step1: Library function import
- notes : Seaborn Is in matplotlib On the basis of a more advanced API encapsulation , So it's easier to draw , Use... In most cases seaborn Can make a very attractive picture . Such as ,
- 1.set_style( ) Used to set the theme ;
- 2.distplot( ) by hist Enhanced Edition 、kdeplot( ) Is the density curve ;
- 3. Box figure boxplot( ): The biggest advantage is that it is not affected by outliers , It can describe the discrete distribution of data in a relatively stable way ;
- 4. Joint distribution jointplot( )
- 5. Hot spots heatmap( )
# Basic function library
import numpy as np
# Import drawing library
import matplotlib.pyplot as plt
import seaborn as sns
# Import logistic regression model function
from sklearn.linear_model import LogisticRegression
Step2: model training
# Construct data set
x_fearures = np.array([[-1, -2], [-2, -1], [-3, -2], [1, 3], [2, 1], [3, 2]])
y_label = np.array([0, 0, 0, 1, 1, 1])
# Call the logistic regression model
lr_clf = LogisticRegression()
# A logistic regression model is used to fit the constructed data set
lr_clf = lr_clf.fit(x_fearures, y_label) # The fitting equation is y=w0+w1*x1+w2*x2,w For weight ,w0 Is the offset or error value
Step3: Model parameters view
# Look at the corresponding model of w
# coef_:“ Slope ” Parameters (w, Also called weight )
print('the weight of Logistic Regression:',lr_clf.coef_)
# Look at the corresponding model of w0
# intercept_ : Offset or intercept (b)
print('the intercept(w0) of Logistic Regression:',lr_clf.intercept_)
result :
the weight of Logistic Regression: [[0.73462087 0.6947908 ]]
the intercept(w0) of Logistic Regression: [-0.03643213]
Step4: Data and Model Visualization
# Visual construction of data sample points
plt.figure()
# Scatter plot
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')
plt.show()
# Visualizing decision boundaries
plt.figure()
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')
nx, ny = 200, 100
# Set the boundaries of the image limit, That is, the maximum and the minimum
x_min, x_max = plt.xlim()
y_min, y_max = plt.ylim()
x_grid, y_grid = np.meshgrid(np.linspace(x_min, x_max, nx),np.linspace(y_min, y_max, ny))
z_proba = lr_clf.predict_proba(np.c_[x_grid.ravel(), y_grid.ravel()])
z_proba = z_proba[:, 1].reshape(x_grid.shape)
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')
plt.show()
Visual prediction of new samples :
plt.figure()
# new point 1
x_fearures_new1 = np.array([[0, -1]])
plt.scatter(x_fearures_new1[:,0],x_fearures_new1[:,1], s=50, cmap='viridis')
# annotate notes
# Annotate points with text x、y;
# By defining *arrowprops*( Arrow prop ) add to
plt.annotate(s='New point 1',xy=(0,-1),xytext=(-2,0),color='blue',arrowprops=dict(arrowstyle='-|>',connectionstyle='arc3',color='red'))
# new point 2
x_fearures_new2 = np.array([[1, 2]])
plt.scatter(x_fearures_new2[:,0],x_fearures_new2[:,1], s=50, cmap='viridis')
plt.annotate(s='New point 2',xy=(1,2),xytext=(-1.5,2.5),color='red',arrowprops=dict(arrowstyle='-|>',connectionstyle='arc3',color='red'))
# The training sample
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')
# Visualizing decision boundaries
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')
plt.show()
Step5: Model to predict
# In the training set and test set, the trained model is used to predict
y_label_new1_predict = lr_clf.predict(x_fearures_new1)
y_label_new2_predict = lr_clf.predict(x_fearures_new2)
print('The New point 1 predict class:\n',y_label_new1_predict)
print('The New point 2 predict class:\n',y_label_new2_predict)
# Since the logistic regression model is a probabilistic prediction model ( Previously described p = p(y=1|x,\theta)), So we can use predict_proba Function predicts its probability
y_label_new1_predict_proba = lr_clf.predict_proba(x_fearures_new1)
y_label_new2_predict_proba = lr_clf.predict_proba(x_fearures_new2)
print('The New point 1 predict Probability of each class:\n',y_label_new1_predict_proba)
print('The New point 2 predict Probability of each class:\n',y_label_new2_predict_proba)
The New point 1 predict class:
[0]
The New point 2 predict class:
[1]
The New point 1 predict Probability of each class:
[[0.67507358 0.32492642]]
The New point 2 predict Probability of each class:
[[0.11029117 0.88970883]]
It can be found that the trained regression model will X_new1 Forecast for category 0( The lower left side of the discriminant plane ),X_new2 Forecast for category 1( The upper right side of the discriminant plane ). The probability of the logistic regression model trained is 0.5 The discrimination surface is the blue line in the above figure .
版权声明
本文为[Amyniez]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230610096384.html
边栏推荐
- Android暴露组件——被忽略的组件安全
- PyTorch 模型剪枝实例教程三、多参数与全局剪枝
- Easyui combobox 判断输入项是否存在于下拉列表中
- SSL/TLS应用示例
- [2021 book recommendation] learn winui 3.0
- C connection of new world Internet of things cloud platform (simple understanding version)
- Android room database quick start
- MySQL notes 2_ data sheet
- Keras如何保存、加载Keras模型
- 组件化学习(3)ARouter中的Path和Group注解
猜你喜欢
adb shell top 命令详解
[2021 book recommendation] Red Hat Certified Engineer (RHCE) Study Guide
[recommendation of new books in 2021] enterprise application development with C 9 and NET 5
ArcGIS License Server Administrator 无法启动解决方法
C connection of new world Internet of things cloud platform (simple understanding version)
PyMySQL连接数据库
Fill the network gap
1.2 preliminary pytorch neural network
【2021年新书推荐】Practical Node-RED Programming
Component based learning (3) path and group annotations in arouter
随机推荐
Handler进阶之sendMessage原理探索
微信小程序 使用wxml2canvas插件生成图片部分问题记录
AVD Pixel_ 2_ API_ 24 is already running. If that is not the case, delete the files at C:\Users\admi
机器学习 三: 基于逻辑回归的分类预测
【2021年新书推荐】Red Hat RHCSA 8 Cert Guide: EX200
MySQL笔记2_数据表
红外传感器控制开关
PyMySQL连接数据库
adb shell top 命令详解
[Andorid] 通过JNI实现kernel与app进行spi通讯
Project, how to package
Visual Studio 2019安装与使用
Bottom navigation bar based on bottomnavigationview
三子棋小游戏
Kotlin征途之data class [数据类]
一款png生成webp,gif, apng,同时支持webp,gif, apng转化的工具iSparta
GEE配置本地开发环境
[2021 book recommendation] practical node red programming
Cancel remote dependency and use local dependency
[dynamic programming] triangle minimum path sum