当前位置：网站首页>Machine learning III: classification prediction based on logistic regression

Machine learning III: classification prediction based on logistic regression

2022-04-23 07:19:00 【Amyniez】

1 Introduction and application of logistic regression

1.1 Introduction of logical regression

Logical regression （Logistic regression, abbreviation LR） Although it has " Return to " Two words , But logistic regression is actually a classification Model , And widely used in various fields . Although deep learning is now more popular than these traditional methods , But in fact, these traditional methods are still widely used in various fields due to their unique advantages .

And for logistic regression, and , The most prominent two points are its Simple model and The model has strong explanatory power .

Advantages and disadvantages of logistic regression model :

advantage ： Implement a simple , Easy to understand and implement ; It's not expensive to calculate , fast , Low storage resources ;
shortcoming ： Easy under fitting , Classification accuracy may not be high

1.1 Application of logistic regression

Logistic regression model is widely used in various fields , Including machine learning , Most medical fields and Social Sciences . for example , By the first Boyd The trauma and injury severity score developed by et al （TRISS） It is widely used to predict the mortality of injured patients , Use logical regression Based on the observed patient characteristics （ Age , Gender , Body mass index , Results of various blood tests, etc ） Analysis predicts the occurrence of specific diseases （ For example, diabetes , Coronary heart disease (CHD) ） The risk of . Logistic regression models are also used to predict in a given process , The possibility of system or product failure . It's also used for marketing applications , For example, predict the tendency of customers to buy products or stop ordering . In economics, it can be used to predict the possibility of a person choosing to enter the labor market , And commercial applications can be used to predict the likelihood that homeowners will default on their mortgage . Conditional random fields are extensions of logistic regression to sequential data , For natural language processing .

Logistic regression model is also the basic component of many classification algorithms , such as The classification task is based on GBDT Algorithm +LR Credit card transaction anti fraud realized by logistic regression ,CTR( Click through rate ) Estimate, etc , The advantage is that the output value naturally falls in 0 To 1 Between , And it has probabilistic significance . The model is clear , There is a corresponding theoretical basis of probability . The parameters it fits represent every feature (feature) The effect on the result . It's also a good tool for understanding data . But at the same time, because it is essentially a linear classifier , So we can't deal with the more complicated data situation . A lot of times, we will also use the logistic regression model to do some baseline tasks baseline（ Basic level ）.

4.1 Demo

Step1: Library function import

notes ： Seaborn Is in matplotlib On the basis of a more advanced API encapsulation , So it's easier to draw , Use... In most cases seaborn Can make a very attractive picture . Such as ,
1.set_style( ) Used to set the theme ;
2.distplot( ) by hist Enhanced Edition 、kdeplot( ) Is the density curve ;
3. Box figure boxplot( )： The biggest advantage is that it is not affected by outliers , It can describe the discrete distribution of data in a relatively stable way ;
4. Joint distribution jointplot( )
5. Hot spots heatmap( )

#  Basic function library 
import numpy as np 

#  Import drawing library 
import matplotlib.pyplot as plt
import seaborn as sns

#  Import logistic regression model function 
from sklearn.linear_model import LogisticRegression

Step2: model training

#  Construct data set 
x_fearures = np.array([[-1, -2], [-2, -1], [-3, -2], [1, 3], [2, 1], [3, 2]])
y_label = np.array([0, 0, 0, 1, 1, 1])

#  Call the logistic regression model 
lr_clf = LogisticRegression()

#  A logistic regression model is used to fit the constructed data set 
lr_clf = lr_clf.fit(x_fearures, y_label) # The fitting equation is y=w0+w1*x1+w2*x2,w For weight ,w0 Is the offset or error value

Step3: Model parameters view

#  Look at the corresponding model of w
# coef_：“ Slope ” Parameters （w, Also called weight ）
print('the weight of Logistic Regression:',lr_clf.coef_)

#  Look at the corresponding model of w0
# intercept_ ： Offset or intercept （b）
print('the intercept(w0) of Logistic Regression:',lr_clf.intercept_)

result ：
the weight of Logistic Regression: [[0.73462087 0.6947908 ]]
the intercept(w0) of Logistic Regression: [-0.03643213]

Step4: Data and Model Visualization

#  Visual construction of data sample points 
plt.figure()

# Scatter plot 
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')
plt.show()

Dataset

#  Visualizing decision boundaries 
plt.figure()
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')

nx, ny = 200, 100
# Set the boundaries of the image limit, That is, the maximum and the minimum 
x_min, x_max = plt.xlim()
y_min, y_max = plt.ylim()
x_grid, y_grid = np.meshgrid(np.linspace(x_min, x_max, nx),np.linspace(y_min, y_max, ny))

z_proba = lr_clf.predict_proba(np.c_[x_grid.ravel(), y_grid.ravel()])
z_proba = z_proba[:, 1].reshape(x_grid.shape)
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')

plt.show()

Dataset

Visual prediction of new samples ：

plt.figure()

# new point 1
x_fearures_new1 = np.array([[0, -1]])
plt.scatter(x_fearures_new1[:,0],x_fearures_new1[:,1], s=50, cmap='viridis')

# annotate notes 
#  Annotate points with text x、y;
#  By defining *arrowprops*（ Arrow prop ） add to 
plt.annotate(s='New point 1',xy=(0,-1),xytext=(-2,0),color='blue',arrowprops=dict(arrowstyle='-|>',connectionstyle='arc3',color='red'))

# new point 2
x_fearures_new2 = np.array([[1, 2]])
plt.scatter(x_fearures_new2[:,0],x_fearures_new2[:,1], s=50, cmap='viridis')
plt.annotate(s='New point 2',xy=(1,2),xytext=(-1.5,2.5),color='red',arrowprops=dict(arrowstyle='-|>',connectionstyle='arc3',color='red'))

#  The training sample 
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')

#  Visualizing decision boundaries 
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')

plt.show()

Insert picture description here

Step5: Model to predict

#  In the training set and test set, the trained model is used to predict 
y_label_new1_predict = lr_clf.predict(x_fearures_new1)
y_label_new2_predict = lr_clf.predict(x_fearures_new2)

print('The New point 1 predict class:\n',y_label_new1_predict)
print('The New point 2 predict class:\n',y_label_new2_predict)

#  Since the logistic regression model is a probabilistic prediction model （ Previously described  p = p(y=1|x,\theta)）, So we can use  predict_proba  Function predicts its probability 
y_label_new1_predict_proba = lr_clf.predict_proba(x_fearures_new1)
y_label_new2_predict_proba = lr_clf.predict_proba(x_fearures_new2)

print('The New point 1 predict Probability of each class:\n',y_label_new1_predict_proba)
print('The New point 2 predict Probability of each class:\n',y_label_new2_predict_proba)

The New point 1 predict class:
[0]
The New point 2 predict class:
[1]
The New point 1 predict Probability of each class:
[[0.67507358 0.32492642]]
The New point 2 predict Probability of each class:
[[0.11029117 0.88970883]]

It can be found that the trained regression model will X_new1 Forecast for category 0（ The lower left side of the discriminant plane ）,X_new2 Forecast for category 1（ The upper right side of the discriminant plane ）. The probability of the logistic regression model trained is 0.5 The discrimination surface is the blue line in the above figure .

版权声明
本文为[Amyniez]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204230610096384.html

当前位置：网站首页>Machine learning III: classification prediction based on logistic regression

Machine learning III: classification prediction based on logistic regression

1 Introduction and application of logistic regression

1.1 Introduction of logical regression

1.1 Application of logistic regression

4.1 Demo

边栏推荐

猜你喜欢

随机推荐