Naive Bayes Classifier — Mathematical intuition
What is Naive Bayes Algorithm?
It is a classification technique based on Bayes’ Theorem with an assumption that the presence of a particular feature in a class is unrelated to the presence of any other feature.
For example, a fruit may be considered a Banana if its color is yellow and it is long. Similarly, a round-shaped red fruit may be considered an Apple. Here the features like shape and color of the fruit are both independent features.
Naive Bayes Classifier is based upon Bayes Theorem which uses Conditional Probabilities.
1. Conditional Probability
Conditional probability is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome. It is calculated by multiplying the probability of the preceding event by the updated probability of the succeeding, or conditional, event.
The above equation is read as Probability of event A given that event B has occurred = Probability that both events A and B have occurred / Probability that event B has occurred.
To better understand Conditional Probability, consider an example where, in a family with two children, we want to find the probability that both the children are girls given that at least one child is a girl.
The sample space ‘S’ will be {BG, GB, GG, BB} where G means girl and B means boy.
Event ‘A’ -> Both the children are girls ; Event ‘B’ -> At least one child is a girl
P(A∩B) = 1/4 ; P(B) = 3/4
∴ P(A|B) = 1/3
2. Bayes Theorem
Bayes theorem can mathematically be stated as:
where y=class of the dependent variable; x=independent feature
For multiple independent features, this formula can be rewritten as:
Since the denominator is a constant, we will ignore it for mathematical simplicity.
3. Working of Naive Bayes algorithm with an example
Let us take a dataset to predict whether we can pet an animal or not where our test data = (Cow, Medium, Black)
Steps to be followed:
1. Calculate the conditional probabilities
2. Apply Naive Bayes algorithm
3. Normalize the result
Step1: The assumption to be made is that all the features are independent. We will also need to precompute the following probabilities
Step 2: Now, the probability of getting a pet animal:
Step 3: As P(yes|test) + P(no|test) = 1, so we will normalize the above results
Result: As P(yes|test) > P(no|test), so we can predict that the test data animal is a pet animal.