当前位置:网站首页>3.1-Classification-probabilistic generative model
3.1-Classification-probabilistic generative model
2022-08-11 07:52:00 【A boa constrictor. 6666】
一、分类(Classification)
- for the classification task,Similar to regression,The same is to find a function f,His input is an object x,The output is a category class n
- There are many such classification tasks in reality,比如:
- 信用评分
- 输入:收入,存款,职业,年龄,past financial situation
- 输出:接受或拒绝
- 医学诊断
- 输入:current symptoms、年龄、性别、Past medical records
- 输出:哪种疾病
- 手写字符识别
- 输入:手写字符
- 输出:The corresponding numerically encoded character
- 人脸识别
- 输入:face image
- 输出:corresponding person
- 信用评分
1.1 应用示例
- The following is an example of the classification of Pokémon,Every Pokémon can be used7properties to describe:Total,HP,Attack,Defense,SP Atk,SP Def,Speed.Below we will use these attributes to predict what category a Pokémon will belong to.
- Suppose we haven't learned how to use classification to solve this problem,At this time, we solve this problem by means of regression,See what happens below.Take the problem of binary classification as an example:
- The regression model as shown will penalize those that are too correct,Output those points where the value is too large,The result obtained in this way is not good.
- 理想的选择(Ideal Alternatives)
- 定义一个模型g(x),当输入x时,输出大于0就输出class 1,否则输出class 2
- lossThe function is used to count the number of times the prediction was wrong in the training set
- The way to find the optimal solution is the perceptron(Perceptron),支持向量机(SVM),生成模型(Generative Model)
生成模型(Generative Model)
- estimated from the training setx发生的概率P(x),This is the generative model.
- 先验概率(Prior):P(C1)和P(C2)It can be calculated based on the existing training set
高斯分布(Gaussian distribution):Any Pokémon can be represented by a set of their attribute vectors,下面我们取Defense,SP DefA two-dimensional vector of these two attributes to represent a Pokémon.Because of the turtle in the test set we cannot know its prior probabilityP(x),Therefore we assume that the existing training set is sampled from a Gaussian distribution,So now we can estimate what the prior probability is for this turtle.
for a Gaussian distribution,它的输入是一个向量x,The output is a sampling probabilityFont metrics not found for font: .,The shape of its function is given by the mean𝝁和协方差矩阵𝜮确定
最大似然估计(Maximum Likelihood):Since each point is independently sampled from a Gaussian distribution,而这79One point is that it is possible to sample from a different Gaussian distribution.for different Gaussian distributions,There will be different degrees of similarity(Different Likelihood).So we need to find a similarityFont metrics not found for font: .The highest Gaussian distributionFont metrics not found for font: .
The picture on the right is the average value of the water-type and normal-type Pokémon we actually calculated𝝁和协方差矩阵𝜮
二、预测
- The graph on the right is the accuracy we get based on the model's predictions on the test set,2The accuracy of each parameter is 47%,7The accuracy of each parameter is 54%,Obviously our model works very poorly,Continued optimization is required.
- 修正模型(Modifying Model):A common practice is that different classes can share the same covariance matrix𝜮,This way we can reduce the variance by reducing the parameters of the model(variance),This results in a simpler model.其中u1和u2The algorithm has not changed,而𝜮became before𝜮1和𝜮2The weighted average sum between the two.结果从53%提高到了73%
三、总结
- The figure on the left is a three-step analysis process for the task of classification,The picture on the right is that the probability distribution model we use is not necessarily a Gaussian distribution,If we have a binary classification problem,可以使用伯努利分布(Bernoulli distributions);If all dimensions are assumed to be independent,Then a Naive Bayes classifier can be used(Naive Bayes Classifier).
- 后验概率(Posterior Probability):After deriving from a whole bunch of boring math,我们得到了最终的P(C1|x)的数学表达式.但是为了得到w和b,在生成模型中,我们估计了𝑁1,𝑁2, 𝜇1, 𝜇2, Σ这么多的参数,It seems a little far-fetched,Why don't we just look for it right from the startw和b呢?We will delve into this issue in the next chapter on logistic regression.
边栏推荐
- 【LeetCode每日一题】——844.比较含退格的字符串
- Serverless + domain name can also build a personal blog? Really, and soon
- 2.1-梯度下降
- Edge provides label grouping functionality
- Production and optimization of Unity game leaderboards
- DDR4内存条电路设计
- 关于#sql#的问题:怎么将下面的数据按逗号分隔成多行,以列的形式展示出来
- 查找最新人员工资和上上次人员工资的变动情况
- The most complete documentation on Excel's implementation of grouped summation
- When MySQL uses GROUP BY to group the query, the SELECT query field contains non-grouping fields
猜你喜欢
随机推荐
2021-08-11 for循环结合多线程异步查询并收集结果
Pico neo3 Unity打包设置
为什么我使用C#操作MySQL进行中文查询失败
动态代理学习
梅科尔工作室——BP神经网络
easyrecovery15数据恢复软件收费吗?功能强大吗?
Internet phone software or consolidation of attack must be "free" calls security clearance
1051 复数乘法 (15 分)
1081 检查密码 (15 分)
国密规范 SM2 SM3 SM4
Unity开发者必备的C#脚本技巧
进制转换间的那点事
我的创作纪念日丨感恩这365天来有你相伴,不忘初心,各自精彩
1036 Programming with Obama (15 points)
tf.reduce_mean() and tf.reduce_sum()
机器学习总结(二)
Service的两种启动方式与区别
Four startup modes of Activity
关于#sql#的问题:怎么将下面的数据按逗号分隔成多行,以列的形式展示出来
【LeetCode】链表题解汇总