当前位置：网站首页>Linear regression and logistic regression (logistic regression and linear regression)

Linear regression and logistic regression (logistic regression and linear regression)

2022-08-06 18:32:00 【sweetheart7-7】

Linear regression is generally used for data prediction,The predicted results are generally real numbers.
Logistic regression is generally used for classification prediction,The prediction results are generally some kind of probabilities.

在这里插入图片描述

线性回归

Step 1: Model

定义模型
在这里插入图片描述

Step 2: Goodness of Function

定义 Loss 函数,Used to judge whether the model is good or bad,selected here MSE
在这里插入图片描述
通过最小化 Loss 函数,来得到更好的模型

Step 3: Gradient Descent

Parameters are optimized by gradient descent
在这里插入图片描述
Gradient descent with two parameters

可视化

Linear regression There is no local optimal solution

分别对 $w$ 和 $b$ 求偏导
在这里插入图片描述

How’s the results?

在这里插入图片描述

Model Selection

Introduce multiple items,定义更复杂的 Model
在这里插入图片描述
It may appear when the model is more complex Overfitting 的情况

Back to step 1: Redesign the Model

重新定义模型,Consider the effect of species on the results
在这里插入图片描述

考虑其他 feature 对结果的影响,重新定义Model
在这里插入图片描述

Back top step 2: Regularization

对 Loss function 加入正则化来解决 Overfitting 问题

在这里插入图片描述

Regularization

正则化：Expect smaller parameters function,越平滑,output 对输入的变化是比较不敏感的,Can be insensitive to noise.

在这里插入图片描述
λ The bigger the description, the more consideration $w$ 本身大小,And the less you think about yourself Loss 大小,所以在 training data performance is getting worse.

为什么不考虑 $b$ ,Because what we need is a smooth one function,而 $b$ 的大小不会改变 function 的平滑程度.

逻辑回归

Ideally define the model function for the classification task

在这里插入图片描述

Solved by Gaussian distribution

The data is assumed to belong to a Gaussian distribution（Other distributions can also be assumed,There are subjective conscious influences here）,The problem is then solved by a Gaussian model.

Generative Model

在这里插入图片描述

最大化 Likelihood

在这里插入图片描述

求出 μ 和 ∑
在这里插入图片描述

used allfeature ,The result was still broken

在这里插入图片描述

Consider giving twoModel 公用 covariance matrix,这样就只需要较少的 parameters（不容易 overfitting）

在这里插入图片描述

求出 μ 和 ∑

在这里插入图片描述
found public ∑ 后,此时的 boundary 是线性的,The accuracy rate has improved a lot.

Three Steps

So the summary is as follows 3 步：
在这里插入图片描述

Naive Bayes Classifier

假设所有的 feture 是 independent,Its probability can be expressed in the following form,This model belongs to Naive Bayes Classifier

在这里插入图片描述

Posterior Probility

分析 Posterior Probability

在这里插入图片描述

Discovered by formula derivation：It can eventually also be written as $σ (w * x + b)$

Step 1: Function Set

推出来的 σ 就是 sigmoid 函数,其图像表示如下：

在这里插入图片描述
It can be represented in the form as follows：

在这里插入图片描述

Step 2: Goodness of a Function

在这里插入图片描述

最大化 Likelihood 就是最小化 $- l n L (w, b)$ ,Expand as follows：

在这里插入图片描述

这种 Loss 函数就是 cross entropy The meaning of representation is two distribution 有多接近,越小越接近

在这里插入图片描述

Step 3: Find the best function

在这里插入图片描述

$w$ 的 update Depends on three things：

learning rate
$x_i$ 来自于 data
$\hat{y} - f(x^n)$ ,代表 f 的 output 与理想的目标值 $\hat{y}$ 差距有多大,离目标越远,update the larger the amount

在这里插入图片描述

Why can't logistic regression be used MSE 作为 Loss 函数？

在这里插入图片描述

当 $\hat{y} = 1$ 时,If the predicted result is $f (x) = 0$ ,At this time, it is clearly far from the target value,But the gradient value at this time is $0$ ！
当 $\hat{y} = 0$ 时,If the predicted result is $f (x) = 1$ ,At this time, it is clearly far from the target value,But the gradient value at this time is $0$ ！

Cross Entropy vs Square Error

在这里插入图片描述

If the logistic regression problem is used square error 时,The gradient may appear in the distance 0 的情况,而不能更新.

Discriminative vs Generative

Discriminative Model Just define the function directly,Then optimize the function Model,Let the machine find it by itself distribution.
Generative Model is to assume one first distribution,Then find the parameter value (μ 和 ∑) 带入 Model.

在这里插入图片描述