当前位置:网站首页>Data mining -- naive Bayesian classification

Data mining -- naive Bayesian classification

2022-04-22 05:44:00 Muxi dare

《 data mining 》 National Defense University of science and technology
《 data mining 》 Qingdao University
Python: Bayesian classification
Bayesian classification is based on Bayesian theorem , It is one of the core methods of machine learning .
At present, there are four kinds of Bayesian classifiers :

  • Naive Bayes classifier (Naive Bayes Classifier, or NBC)
  • TAN
  • BAN
  • GBN

Naive Bayesian classification of data mining

• Naive Bayesian classifier has a solid mathematical foundation , And stable classification efficiency . meanwhile , The estimation required for this model
There are few parameters in the meter , Less sensitive to missing data , The algorithm is also relatively simple .

Bayes theorem

 Insert picture description here
 Insert picture description here
The ultimate goal is to achieve p( Category | features ) .
Simplicity in naive Bayes is to assume that features are independent of each other .

working process

 Insert picture description here
 Insert picture description here

Advantages and disadvantages

  • advantage :
    (1) The logic is simple 、 Easy to implement 、 The time and space overhead of the algorithm in the classification process is relatively small ;
    (2) The algorithm is relatively stable 、 It has good robustness
  • shortcoming : There is the assumption of conditional independence between attributes , In many practical problems, this independence assumption is not tenable , If you ignore this in the practical problem of correlation between attributes , It will reduce the classification effect .

Add

 Insert picture description here
come from : Whiteboard derivation notes

Python Realization

sklearn.naive_bayes: Naive Bayes modular

According to the characteristic data Prior distribution Different ,scikit-learn The library provides 5 Different naive Bayesian classification algorithms :

  1. Bernoulli naive Bayes (BernoulliNB)
  2. Like naive Bayes (CategoricalNB)
  3. Gaussian naive Bayes (GaussianNB)
  4. Polynomial naive Bayes (MultinomialNB)
  5. Add naive Bayes (ComplementNB)

example

import pandas as pd
from sklearn.naive_bayes import GaussianNB

data_url="../1.  data mining - National Defense University of science and technology / Data packets /diabetes.csv"
df = pd.read_csv(data_url)
x = df.iloc[0:735,0:8]
y = df.iloc[0:735,8]
clf = GaussianNB().fit(x,y)
dftest = df.iloc[735:768,0:8]
df1 = pd.DataFrame(columns=['test','true'])

df2 = df.iloc[735:768,8].to_frame()
df2 = df2.reset_index(drop=True)

df1['test'] = clf.predict(dftest)
df1['true']=df2

#  Accuracy calculation 
m = 0
for i in range(0,df1.shape[0]):
    if df1.at[i,'test']==df1.at[i,'true']:
        m = m + 1
    i = i + 1
acc = m/df1.shape[0]
print(" Accuracy rate is :",acc)
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
import numpy as np

#  Function encapsulation :
def Bayes_test(df):
    X = df.iloc[:,1:4]
    y = df.iloc[:,4]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    print(y_test)
    #  Use Gaussian naive Bayes to calculate 
    clf = GaussianNB()
    clf.fit(X_train, y_train)
    #  assessment 
    y_pred = clf.predict(X_test)
    print(y_pred)
    acc = np.sum(y_test == y_pred) / X_test.shape[0]
    return acc

#  Read the data and calculate the accuracy :
data_url = "../1.  data mining - National Defense University of science and technology / Data packets /iris.csv"
df = pd.read_csv(data_url,index_col=0)
acc = Bayes_test(df)
print(" Accuracy rate is :",acc)

版权声明
本文为[Muxi dare]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220535577182.html