Naïve Bayes Classifier Algorithm
Why is it called Naïve Bayes?
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier:
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps:
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Frequency table for the Weather Conditions:
Likelihood table weather condition:
P(Sunny|Yes)= 3/10= 0.3
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Advantages of Naïve Bayes Classifier:
Disadvantages of Naïve Bayes Classifier:
Applications of Naïve Bayes Classifier:
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
Python Implementation of the Naïve Bayes algorithm:
Now we will implement a Naive Bayes Algorithm using Python. So for this, we will use the "user_data" dataset, which we have used in our other classification model. Therefore we can easily compare the Naive Bayes model with the other models.
Steps to implement:
1) Data Pre-processing step:
In this step, we will pre-process/prepare the data so that we can use it efficiently in our code. It is similar as we did in data-pre-processing. The code for this is given below:
In the above code, we have loaded the dataset into our program using "dataset = pd.read_csv('user_data.csv'). The loaded dataset is divided into training and test set, and then we have scaled the feature variable.
The output for the dataset is given as:
2) Fitting Naive Bayes to the Training Set:
After the pre-processing step, now we will fit the Naive Bayes model to the Training set. Below is the code for it:
In the above code, we have used the GaussianNB classifier to fit it to the training dataset. We can also use other classifiers as per our requirement.
Out: GaussianNB(priors=None, var_smoothing=1e-09)
3) Prediction of the test set result:
Now we will predict the test set result. For this, we will create a new predictor variable y_pred, and will use the predict function to make the predictions.
The above output shows the result for prediction vector y_pred and real vector y_test. We can see that some predications are different from the real values, which are the incorrect predictions.
4) Creating Confusion Matrix:
Now we will check the accuracy of the Naive Bayes classifier using the Confusion matrix. Below is the code for it:
As we can see in the above confusion matrix output, there are 7+3= 10 incorrect predictions, and 65+25=90 correct predictions.
5) Visualizing the training set result:
Next we will visualize the training set result using Na´ve Bayes Classifier. Below is the code for it:
In the above output we can see that the Na´ve Bayes classifier has segregated the data points with the fine boundary. It is Gaussian curve as we have used GaussianNB classifier in our code.
6) Visualizing the Test set result:
The above output is final output for test set data. As we can see the classifier has created a Gaussian curve to divide the "purchased" and "not purchased" variables. There are some wrong predictions which we have calculated in Confusion matrix. But still it is pretty good classifier.