MachineX: One more step towards NAIVE BAYES

Table of contents

Reading Time: 4 minutes

I hope we understand the conditional probabilities and Bayes theorem through our previous blog. Now let’s use this understanding to find out more about the naive Bayes classifier.

NAIVE BAYES CLASSIFIER

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set.

Naive Bayes classifier is a straightforward and powerful algorithm for the classification task. Even if we are working on a data set with millions of records with some attributes, it is suggested to try Naive Bayes approach.

Naive Bayes classifier gives great results when we use it for textual data analysis. Such the as Natural Language Processing.

Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class. The class with the highest probability is considered as the most likely class. This is also known as Maximum A Posteriori (MAP).

MAP(H)
= max( P(H|E) )
= max( (P(E|H)*P(H))/P(E))
= max(P(E|H)*P(H))
P(E) is evidence probability, and it is used to normalize the result. It remains same so, removing it won’t affect.

Naive Bayes classifier assumes that all the features are unrelated to each other. Presence or absence of a feature does not influence the presence or absence of any other feature.

For Example:-
“A fruit may be considered to be an apple if it is red, round, and about 4″ in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.”

In real datasets, we test a hypothesis given multiple evidence(feature). So, calculations become complicated. To simplify the work, the feature independence approach is used to ‘uncouple’ multiple evidence and treat each as an independent one.

P(H|Multiple Evidences) = P(E1| H)* P(E2|H) ……*P(En|H) * P(H) / P(Multiple Evidences)

For understanding a theoretical concept, the best procedure is to try it on an example.

So, let’s say we have data on 1000 pieces of fruit. The fruit is a Banana, Orange or some Other fruit and imagines we know 3 features of each fruit, whether it’s long or not, sweet or not and yellow or not, as displayed in the table below:

So from the table what do we already know?

50% of the fruits are bananas
30% are oranges
20% are other fruits

Based on our training set we can also say the following:

From 500 bananas 400 (0.8) are Long, 350 (0.7) are Sweet and 450 (0.9) are Yellow

Out of 300 oranges, 0 are Long, 150 (0.5) are Sweet and 300 (1) are Yellow

From the remaining 200 fruits, 100 (0.5) are Long, 150 (0.75) are Sweet and 50 (0.25) are Yellow

Which should provide enough evidence to predict the class of another fruit as it’s introduced.

So let’s say we’re given the features of a piece of fruit and we need to predict the class. If we’re told that the additional fruit is Long, Sweet and Yellow, we can classify it using the following formula and be subbing in the values for each outcome, whether it’s a Banana, an Orange or Other Fruit. The one with the highest probability (score) being the winner.

Banana:

P(Banana|Long, Sweet, Yellow) = $P(Long|Banana)*P(Sweet|Banana)* P(Yellow|Banana)$ * P(Banana) / P(Long, Sweet, Yellow)

= 0.8 * 0.7 * 0.9 * 0.5 / P(Long, Sweet, Yellow)

= 0.252 / P(Long, Sweet, Yellow)

Orange:

P(Orange|Long, Sweet, Yellow) = $P(Long|Orange)*P(Sweet|Orange)* P(Yellow|Orange)$ * P(Orange) / P(Long, Sweet, Yellow)

= 0

Other Fruit:

P(Other|Long, Sweet, Yellow) = P(Long|Other)*P(Sweet|Other)* P(Yellow|Other) * P(Other) / P(Long, Sweet, Yellow)

= 0.5 * 0.75 * 0.25 * 0.2 / P(Long, Sweet, Yellow)

= 0.01875 / P(Long, Sweet, Yellow)

In this case, based on the higher score (0.252) we can assume this Long, Sweet and Yellow fruit is, a Banana.

Now that we’ve seen a basic example of Naive Bayes in action, you can easily see how it can be applied to Text Classification problems such as spam detection, sentiment analysis, and categorization.

There you have it, a simple explanation of Naive Bayes along with an example. We hope this helps you get your head around this simple but common classifying method.

References:-