当前位置:网站首页>Machine learning -- naive Bayes
Machine learning -- naive Bayes
2022-04-23 13:16:00 【DCGJ666】
machine learning —— Naive Bayes
advantage :
- The algorithm logic is simple , Easy to implement
- The cost of time and space is small in the process of classification , High classification accuracy , Fast
- Naive Bayesian model originated from classical mathematical theory , Stable classification efficiency
- Less sensitive to missing data , The algorithm is also relatively simple , It is often used in text categorization
- Good for small-scale data , Be able to handle multi category tasks , For incremental training
shortcoming :
- Theoretically , Compared with other classification methods, naive Bayesian model has the smallest error rate . But it's not always the case , This is because the naive Bayesian model assumes Properties are independent of each other , This assumption is often not true in practical application , When the number of attributes is large or the correlation between attributes is large , The classification effect is not good .
- Need to know Prior probability , And a priori probability is often based on assumptions or existing training data , In some cases, there may be errors in classification decision-making due to the assumption of a priori probability .
Naive Bayes
Naive Bayes It's based on Independent hypothesis of characteristic conditions and Bayesian principle It's a new classification algorithm . Naive Bayes Obtained through training data X And y Of Joint distribution ; Then for what to predict X, according to Bayes' formula , Output Posterior probability maximal y.
Naive Bayes It's a kind of Generative Learning algorithms , Its generation method is through learning X,Y The joint distribution of . Assume that each feature is given y Are independent of each other .
Bayes' formula
P ( B ∣ A ) = P ( B ) P ( A ∣ B ) P ( A ) P(B \mid A)=\frac{P(B) P(A \mid B)}{P(A)} P(B∣A)=P(A)P(B)P(A∣B)
In the formula , event B The probability of is P(B), event B A conditional event has occurred A The probability of is P(A|B), event A The occurrence of a conditional event B The probability of is P(B|A)
Naive Bayes “ simple ” How to understand
Naive Bayes The simplicity in can be understood as “ Simple , naive ” It means , because “ simple ” It's a hypothesis Features are equally important , Are independent of each other , Not affecting each other , But in our real society , Attributes are not always independent of each other .
What is Laplace smoothing
Laplace smoothing yes Naive Bayes In dealing with Zero probability A way to correct the problem . When it comes to classification , There may be a case where an attribute does not appear at the same time with a class in the training set , If the calculation is directly based on the expression of naive Bayesian classifier, there will be Zero probability The phenomenon . In order to prevent the information carried by other attributes from being used by attribute values that have not appeared in the training set “ erase ”, That's why Laplace estimator Amendment . The way to do it is : Add... To the molecule 1, For a priori probability , Add the number of possible categories in the training set to the denominator ; For conditional probability , Add... To the denominator i Possible values of attributes
The application of naive Bayes
Naive Bayes The most widely used should be Document classification , Spam text filtering , Sentiment analysis , Recommendation system , Spelling correction etc. .
Naive Bayes is not sensitive to outliers
Naive Bayes Yes outliers Insensitivity . So in data processing , We can not remove outliers , Because preserving outliers can maintain the overall accuracy of naive Bayesian algorithm , Removing outliers may lead to the decline of generalization ability of the model due to the loss of some outliers in the process of prediction
A priori probability and a posteriori probability
Prior probability : It's directly the probability of something happening
Posterior probability : Know that something has happened , The probability of this happening
版权声明
本文为[DCGJ666]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611343284.html
边栏推荐
- melt reshape decast 长数据短数据 长短转化 数据清洗 行列转化
- SSM整合之pom.xml
- The project file '' has been renamed or is no longer in the solution, and the source control provider associated with the solution could not be found - two engineering problems
- POM of SSM integration xml
- 100 GIS practical application cases (34) - splicing 2020globeland30
- Mui wechat payment pit
- mui + hbuilder + h5api模拟弹出支付样式
- 解决Oracle中文乱码的问题
- STM32 tracking based on open MV
- [quick platoon] 215 The kth largest element in the array
猜你喜欢

榜样专访 | 孙光浩:高校俱乐部伴我成长并创业

mui + hbuilder + h5api模拟弹出支付样式

three. JS text ambiguity problem

Data warehouse - what is OLAP

Complete project data of UAV apriltag dynamic tracking landing based on openmv (LabVIEW + openmv + apriltag + punctual atom four axes)

MySQL 8.0.11 download, install and connect tutorials using visualization tools

51 single chip microcomputer stepping motor control system based on LabVIEW upper computer (upper computer code + lower computer source code + ad schematic + 51 complete development environment)

【快排】215. 数组中的第K个最大元素

Solve the problem of Oracle Chinese garbled code

Important knowledge of network layer (interview, reexamination, term end)
随机推荐
You and the 42W bonus pool are one short of the "Changsha bank Cup" Tencent yunqi innovation competition!
@Excellent you! CSDN College Club President Recruitment!
filter()遍历Array异常友好
Uninstall MySQL database
Esp32 vhci architecture sets scan mode for traditional Bluetooth, so that the device can be searched
在 pytorch 中加载和使用图像分类数据集 Fashion-MNIST
[dynamic programming] 221 Largest Square
According to the salary statistics of programmers in June 2021, the average salary is 15052 yuan. Are you holding back?
数据仓库—什么是OLAP
Introduction to metalama 4 Use fabric to manipulate items or namespaces
(personal) sorting out system vulnerabilities after recent project development
内核错误: No rule to make target ‘debian/canonical-certs.pem‘, needed by ‘certs/x509_certificate_list‘
Brief introduction of asynchronous encapsulation interface request based on uniapp
SPI NAND flash summary
XML
【行走的笔记】
7_ The cell type scores obtained by addmodule and gene addition method are compared in space
POM of SSM integration xml
100000 college students have become ape powder. What are you waiting for?
AUTOSAR from introduction to mastery 100 lectures (51) - AUTOSAR network management