当前位置：网站首页>Machine learning practice - naive Bayes

Machine learning practice - naive Bayes

2022-04-23 18:34:00 【Xuanche_】

Insert picture description here

Naive Bayes

One 、 summary

Bayesian classification algorithm is a probabilistic classification method of statistics , Naive Bayesian classification is the simplest of Bayesian classification . The classification principle is to use Bayesian formula to calculate the posterior probability according to the prior probability of a feature , Then, the class with the maximum a posteriori probability is selected as the class to which the feature belongs . And
So it's called ” simple ”, Because Bayesian classification is only the most primitive 、 The simplest hypothesis ： All features are statistically independent .
Suppose a sample X Yes ${a}_{1}$ , ${a}_{2}$ , ${a}_{3}$ … ${a}_{n}$ Attributes , So there are P(X) = P( ${a}_{1}$ , ${a}_{2}$ … ${a}_{n}$ ) =P( ${a}_{1}$ )P( ${a}_{2}$ )…*P( ${a}_{n}$ ) Satisfying the sample formula means that the characteristic statistics are independent .

1. Conditional probability formula

Conditional probability (Condittional probability), It means in the event B When it happens , event A Probability of occurrence , use P(A|B) To express .

Insert picture description here

According to Wen's diagram ： stay B In the event of an incident , event A The probability of occurrence is P(A∩B) Divide P(B)

$P(A|B)\, =\, \frac {P(A\cap B)} {P(B)}$ ⇒ $P(A|B)\, P(B)=\, P(A\cap B)$

The same can be ： $P(B|A)\, P(A)=\, P(A\cap B)$

therefore ： $P(B|A)\, P(A)=\, P(A|B)P(B)$ ⇒ $P(A|B)=\frac {P(B|A)P(A)} {P(B)}$

Then look at the full probability formula , If the event ${A}_{1}$ , ${A}_{2}$ ,… ${A}_{n}$ Constitute a complete event with positive probability , So for any event B Then there are ：

$P(B)\, =\, P(B{A}_{1})+P(B{A}_{2})+...+P(B{A}_{n})$

Insert picture description here
$P(B)\, =\, \sum ^{n}_{i=1} {P({ {A}_{i}}_{})P(B|{A}_{i})}$

Bayesian judgment

According to the formula of conditional probability and total probability , The Bayesian formula can be obtained as follows ：

$P(A|B)=P(A)\frac {P(B|A)} {P(B)}$

$P({A}_{i}|B)=P({A}_{i})\frac {P(B|A)} {\sum ^{n}_{i=1} {P({A}_{i})P(B|{A}_{i})}}$

P(A) be called " Prior probability "（Prior probability）, That is to say B Before the incident , We are right. A The probability of an event .

P(A|B) be called " Posterior probability "（Posterior probability）, That is to say B After the event , We are right. A Reassessment of event probabilities .
P(B|A)/P(B) be called " Possibility function "（Likely hood）, This is an adjustment factor , Make the estimated probability closer to the real probability .

So conditional probability can be understood as ： Posterior probability = Prior probability * Adjustment factor
If " Possibility function ">1, signify " Prior probability " Enhanced , event A More likely to happen ;
If " Possibility function "=1, signify B Events do not help to judge events A The possibility of ;
If " Possibility function "<1, signify " Prior probability " Weakened , event A Less likely .

Naive Bayes species

stay scikit-learn in , Altogether 3 A naive Bayesian classification algorithm .
Namely GaussianNB,MultinomialNB and BernoulliNB.

1. GaussianNB

GaussianNB A priori is ** Gaussian distribution （ Normal distribution ） Naive Bayes **, Assume that the data of each tag follows a simple normal distribution .

Insert picture description here

among by Y Of the k Class category . and For the values that need to be estimated from the training set .
here , use scikit-learn It's a simple implementation GaussianNB.

# Import package 
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Import dataset 
from sklearn import datasets
iris=datasets.load_iris()

# Sharding data sets 
Xtrain, Xtest, ytrain, ytest = train_test_split(iris.data,
                                                iris.target, 
                                                random_state=12)
# modeling 
clf = GaussianNB()
clf.fit(Xtrain, ytrain)

# Perform prediction on the test set ,proba Derived is the probability that each sample belongs to a certain class 
clf.predict(Xtest)
clf.predict_proba(Xtest)

# Test accuracy 
accuracy_score(ytest, clf.predict(Xtest))

MultinomialNB

MultinomialNB Is a naive Bayes with a priori polynomial distribution . It assumes that the feature is generated by a simple polynomial distribution . Multinomial distribution can
Describe the probability of occurrence of various types of samples , Therefore, polynomial naive Bayes is very suitable for describing the characteristics of the number of occurrences or the proportion of occurrences .
This model is often used in text classification , The feature represents the number of times , For example, the number of occurrences of a word .
The polynomial distribution formula is as follows ：

$P({X}_{j}={x}_{j}|Y={C}_{k})=\frac { {x}_{jl\, }+ξ} { {m}_{k}+nξ}$

among , $P({X}_{j}={x}_{j}|Y={C}_{k})$ It's No k Of categories j The number of dimensional features l The probability of the value conditions . ${m}_{k}$ It's the output of training concentration k A sample of class
Count . ξ For a greater than 0 The constant , Often taken as 1, Laplace smoothing . You can also take other values .

BernoulliNB

BernoulliNB Is the naive Bayes with Bernoulli distribution a priori . Suppose that the prior probability of the feature is a binary Bernoulli distribution , It's like the following ：
Insert picture description here
here There are only two values . ${x}_{jl}$ Only value 0 perhaps 1.
In the Bernoulli model , The value of each feature is Boolean , namely true and false, perhaps 1 and 0.

In text classification , Is whether a feature appears in a document .

summary

Generally speaking , If the distribution of sample characteristics is mostly continuous , Use GaussianNB It will be better. .
If the distribution of sample features is mostly multivariate discrete values , Use MultinomialNB More appropriate .
If the sample feature is binary discrete value or very sparse multivariate discrete value , You should use BernoulliNB.

版权声明
本文为[Xuanche_]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204231824246309.html