当前位置：网站首页>Machine learning - logistic regression

Machine learning - logistic regression

2022-04-23 15:18:00 【Please call me Lei Feng】

One 、 Binomial logistic regression
1. Binomial logistic regression is a function , Final output between 0 To 1 Between the value of the , To solve problems similar to “ Success or failure ”,“ Yes or no " such ” No matter whether or not " The problem of .
2. Logistic regression is a model that maps linear regression model to probability , That is, the output of real number space [-∞,+∞] Mapping to (0,1), So as to obtain the probability .（ Personal understanding ： The meaning of regression —— The process of making cognition close to truth with observation , Back to the original .）
3. We can intuitively understand this mapping through drawing , We first define a binary linear regression model ：
$\hat{y}=\theta_1x_1+\theta_2x_2+bias, among \hat{y}∈(-∞,＋∞)$
Linear regression chart ：
Insert picture description here
Logistic regression diagram ：

Two 、probability and odds The definition of
1.probability refer to Number of occurrences / The total number of times , Take the coin toss, for example ：

p The value range of is [0,+∞）
2.odds Is a ratio , It refers to the possibility of an event ( probability ) And the possibility of not happening ( probability ) The ratio of the . namely Number of occurrences / The number of times it didn't happen , Take the coin toss, for example ：
Insert picture description here
odds The value range of is [0,+∞）
3. Review the Bernoulli distribution ： If X Is a random variable in Bernoulli distribution ,X The values for {0,1}, Not 0 namely 1, Such as the front and back of a coin flip ：
be ：P(X=1)=p,P(X=0)=1-p
Plug in odds：

3、 ... and 、logit Functions and sigmoid Functions and their properties :
1.Odds The logarithm of is called Logit, Also writing log-it.
2. We are right. odds take log, Expand odds The value range of is from the space of real numbers [-∞,+∞], This is it. logit function ：
$logit(p)=log_e(odds)=log_e(\frac{p}{1-p}),p∈(0,1),logit(p)∈(-∞,＋∞)$
3. We can use linear regression model to express logit§, Because linear regression model and logit The output function has the same range of values ：
for example ： $logit(p)=\theta_1x_1+\theta_2x_2+bias$
Here are logit§ Function image of , Be careful p∈(0,1), When p=0 perhaps p=1 when ,logit Belongs to undefined .
Insert picture description here
from $logit(p)=\theta_1x_1+\theta_2x_2+bias$
have to
$log(\frac{p}{1-p} )=\theta_1x_1+\theta_2x_2+bias$
notes ： Some people may have misunderstandings , Don't understand how to convert ,logit§ Represents the and parameter p Related logarithmic function , ad locum
logit( p )=log(p/(1-p)).

set up $z=\theta_1x_1+\theta_2x_2+bias$
have to $log(\frac{p}{1-p} )=z$
Take... On both sides of the equation e The exponential function of the enemy ：
$\frac{p}{1-p}=e^{z}$
$p=e^{z}(1-p)=e^{z}-e^{z}p$
$p(1+e^z)=e^z$
$p=\frac{e^z}{(1+e^z)}$
Both numerator and denominator are divided by $e^z$ , have to
$p=\frac{1}{(1+e^{-z})} ,p∈(0,1)$
After the above derivation , We have come to sigmoid function , Finally, the value of the real number space output by the linear regression model is mapped into probability .
$sigmoid(z)=\frac{1}{1+e^{-z}} ,p∈(0,1)$
Here is sigmoid Function image of , Be careful sigmoid(z) Value range of
Insert picture description here
Four 、 Maximum likelihood estimation
1. The assumption function is introduced $h_\theta(X)$ , set up $\theta^TX$ It is a linear regression model ：
$\theta^TX$ in , $\theta^T$ and X All vectors are columns , for example ：
$\theta^T=\begin{bmatrix} bias & \theta_1 &\theta_2 \end{bmatrix}$
$X=\begin{bmatrix} 1 \\ x_1 \\ x_2 \end{bmatrix}$
Find the dot product of the matrix , obtain ：
$\theta^TX=bias*1+\theta_1*x_1+\theta_2*x_2=\theta_1x_1+\theta_2*x_2+bias$
set up $\theta^TX=z$ , Then there is a hypothetical function :
$h_\theta (X)=\frac{1}{1+e^{-z}} =P(Y=1|X;\theta )$
The above expression is under the condition X and $\theta$ Next Y=1 Probability ;
$P(Y=1|X;\theta )=1-h_\theta(X)$
The above expression is under the condition X and $\theta$ Next Y=1=0 Probability .
2. Review the Bernoulli distribution
$f(k;p)\left\{\begin{matrix} p, &if&k=1 \\ q=1-p, &if&k=0 \end{matrix}\right.$
perhaps $f(k;p)=p^k(1-p)^{1-k}$ ,for k∈{0,1}. Be careful f(k;p) It means k by 0 or 1 Probability , That is to say P(k)
3. The purpose of maximum likelihood estimation is to find a probability distribution that best matches the data .
For example XX Refers to data points , The product of the lengths of all the red arrows in the figure is the output of the likelihood function , obviously , The distribution likelihood function of the upper graph is larger than that of the lower graph , So the distribution of the upper graph is more consistent with the data , Maximum likelihood estimation is to find a distribution that best matches the current data .
Insert picture description here

4. Define the likelihood function
$L(\theta|x)=P(Y|X;\theta )=\prod_{i}^{m} P(y_i|x_i;\theta )=\prod_{i}^{m} h_\theta(x_i)^{y_i}(1-h_\theta(x_i))^{(1-{y_i})}$ ,
among i For each data sample , share m Data samples , The purpose of maximum likelihood estimation is to make the above formula “ From output value ” As big as possible ; Top-down extraction log, To facilitate calculation , because log You can convert a product into an addition , And it doesn't affect our optimization goal ：
$L(\theta|x)=log(P(Y|X;\theta ))=\sum_{i=1}^{m} y_ilog(h_\theta (x_i))+(1-y_i)log(1-h_\theta (x_i))$
We just need to add a minus sign in front of the formula , Then the maximum can be transformed into the minimum , set up $h_\theta (X)=\hat{Y}$ , Get the loss function $J(\theta)$ , We just minimize this function , We can get what we want by deriving $\theta$ :
$J(\theta)=-\sum_{i}^{m} Ylog(\hat{Y})-(1-Y)log(1-\hat{Y})$

版权声明
本文为[Please call me Lei Feng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204231508367089.html

当前位置：网站首页>Machine learning - logistic regression

Machine learning - logistic regression

边栏推荐

猜你喜欢

随机推荐