当前位置:网站首页>Machine learning -- naive Bayes
Machine learning -- naive Bayes
2022-04-23 13:16:00 【DCGJ666】
machine learning —— Naive Bayes
advantage :
- The algorithm logic is simple , Easy to implement
- The cost of time and space is small in the process of classification , High classification accuracy , Fast
- Naive Bayesian model originated from classical mathematical theory , Stable classification efficiency
- Less sensitive to missing data , The algorithm is also relatively simple , It is often used in text categorization
- Good for small-scale data , Be able to handle multi category tasks , For incremental training
shortcoming :
- Theoretically , Compared with other classification methods, naive Bayesian model has the smallest error rate . But it's not always the case , This is because the naive Bayesian model assumes Properties are independent of each other , This assumption is often not true in practical application , When the number of attributes is large or the correlation between attributes is large , The classification effect is not good .
- Need to know Prior probability , And a priori probability is often based on assumptions or existing training data , In some cases, there may be errors in classification decision-making due to the assumption of a priori probability .
Naive Bayes
Naive Bayes It's based on Independent hypothesis of characteristic conditions and Bayesian principle It's a new classification algorithm . Naive Bayes Obtained through training data X And y Of Joint distribution ; Then for what to predict X, according to Bayes' formula , Output Posterior probability maximal y.
Naive Bayes It's a kind of Generative Learning algorithms , Its generation method is through learning X,Y The joint distribution of . Assume that each feature is given y Are independent of each other .
Bayes' formula
P ( B ∣ A ) = P ( B ) P ( A ∣ B ) P ( A ) P(B \mid A)=\frac{P(B) P(A \mid B)}{P(A)} P(B∣A)=P(A)P(B)P(A∣B)
In the formula , event B The probability of is P(B), event B A conditional event has occurred A The probability of is P(A|B), event A The occurrence of a conditional event B The probability of is P(B|A)
Naive Bayes “ simple ” How to understand
Naive Bayes The simplicity in can be understood as “ Simple , naive ” It means , because “ simple ” It's a hypothesis Features are equally important , Are independent of each other , Not affecting each other , But in our real society , Attributes are not always independent of each other .
What is Laplace smoothing
Laplace smoothing yes Naive Bayes In dealing with Zero probability A way to correct the problem . When it comes to classification , There may be a case where an attribute does not appear at the same time with a class in the training set , If the calculation is directly based on the expression of naive Bayesian classifier, there will be Zero probability The phenomenon . In order to prevent the information carried by other attributes from being used by attribute values that have not appeared in the training set “ erase ”, That's why Laplace estimator Amendment . The way to do it is : Add... To the molecule 1, For a priori probability , Add the number of possible categories in the training set to the denominator ; For conditional probability , Add... To the denominator i Possible values of attributes
The application of naive Bayes
Naive Bayes The most widely used should be Document classification , Spam text filtering , Sentiment analysis , Recommendation system , Spelling correction etc. .
Naive Bayes is not sensitive to outliers
Naive Bayes Yes outliers Insensitivity . So in data processing , We can not remove outliers , Because preserving outliers can maintain the overall accuracy of naive Bayesian algorithm , Removing outliers may lead to the decline of generalization ability of the model due to the loss of some outliers in the process of prediction
A priori probability and a posteriori probability
Prior probability : It's directly the probability of something happening
Posterior probability : Know that something has happened , The probability of this happening
版权声明
本文为[DCGJ666]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611343284.html
边栏推荐
- async void 导致程序崩溃
- POM of SSM integration xml
- 十万大学生都已成为猿粉,你还在等什么?
- Solve the problem of Oracle Chinese garbled code
- vscode小技巧
- 【快排】215. 数组中的第K个最大元素
- @优秀的你!CSDN高校俱乐部主席招募!
- 数据仓库—什么是OLAP
- Three channel ultrasonic ranging system based on 51 single chip microcomputer (timer ranging)
- 100 GIS practical application cases (53) - making three-dimensional image map as the base map of urban spatial pattern analysis
猜你喜欢

MySQL5. 5 installation tutorial

AUTOSAR from introduction to mastery 100 lectures (52) - diagnosis and communication management function unit

数据仓库—什么是OLAP

According to the salary statistics of programmers in June 2021, the average salary is 15052 yuan. Are you holding back?

100 GIS practical application cases (52) - how to keep the number of rows and columns consistent and aligned when cutting grids with grids in ArcGIS?

9419 page analysis of the latest first-line Internet Android interview questions

SPI NAND flash summary

@优秀的你!CSDN高校俱乐部主席招募!

Example interview | sun Guanghao: College Club grows and starts a business with me

vscode小技巧
随机推荐
2021年6月程序员工资统计,平均15052元,你拖后腿了吗?
X509 parsing
FFmpeg常用命令
The difference between string and character array in C language
【动态规划】221. 最大正方形
Melt reshape decast long data short data length conversion data cleaning row column conversion
Playwright controls local Google browsing to open and download files
"Xiangjian" Technology Salon | programmer & CSDN's advanced road
“湘见”技术沙龙 | 程序员&CSDN的进阶之路
R语言中dcast 和 melt的使用 简单易懂
Solve the problem that Oracle needs to set IP every time in the virtual machine
hbuilderx + uniapp 打包ipa提交App store踩坑记
The filter() traverses the array, which is extremely friendly
SPI NAND flash summary
STD:: shared of smart pointer_ ptr、std::unique_ ptr
2020年最新字节跳动Android开发者常见面试题及详细解析
Super 40W bonus pool waiting for you to fight! The second "Changsha bank Cup" Tencent yunqi innovation competition is hot!
XML
100 GIS practical application cases (51) - a method for calculating the hourly spatial average of NC files according to the specified range in ArcGIS
鸿蒙系统是抄袭?还是未来?3分钟听完就懂的专业讲解