MachineX: Unfolding Mystery Behind NAIVE BAYES CLASSIFIER

Reading Time: 4 minutes

In machine learning, Naive Bayes classifiers are a family of simple “probabilistic classifiers “based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.

The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often outperform more sophisticated classification methods.


Image result for mystery

In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

To understand the naive Bayes classifier we need to understand the Bayes theorem and to understand Bayes theorem we need to understand what is a conditional probability.

This blog will give you a brief of both conditional probabilities and Bayes theorem. Let’s first quickly discuss the conditional probability and then we will move to Bayes Theorem.

What is Conditional Probability?

In probability theory, the conditional probability is a measure of the probability of an event given that another event has already occurred.

If the event of interest is A and the event B is assumed to have occurred, “the conditional probability of A given B”, or “the probability of A under the condition B”, is usually written as P(A|B), or sometimes PB(A).

For Example:-

Chances of a cough
The probability that any given person has a cough on any given day maybe only 5%. But if we know or assume that the person has a cold, then they are much more likely to be coughing. The conditional probability of coughing given that you have a cold might be a much higher 75%.

Marbles in a Bag
2 blue and 3 red marbles are in a bag.

What are the chances of getting a blue marble?

The chance is 2 in 5

But after taking one out of these chances situation may change!

So the next time:

  1. if we got a red marble before, then the chance of a blue marble next is 2 in 4
  2. if we got a blue marble before, then the chance of a blue marble next is 1 in 4

Drawing a second ace from a deck given we got the first ace
Finding the probability of having a disease given you were tested positive
Finding the probability of liking Harry Potter given we know the person likes fiction.

All these are instances of conditional probability.

Now let us move on Bayes Theorem

What is Bayes Theorem?

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes’ theorem, a person’s age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person’s age.

Bayes theorem named after Rev. Thomas Bayes. It works on conditional probability. Conditional probability is the probability that something will happen, given that something else has already occurred. Using the conditional probability, we can calculate the probability of an event using its prior knowledge.

The formula for Bayes’ theorem

P(H|E) = P(E|H) * P(H)/ P(E)


  • P(H) is the probability of hypothesis H being true. This is known as the prior probability.
  • P(E) is the probability of the evidence(regardless of the hypothesis).
  • P(E|H) is the probability of the evidence given that hypothesis is true.
  • P(H|E) is the probability of the hypothesis given that the evidence is there.


Suppose there are three bowls B1, B2, B3 and bowl B1 has 2 red and 4 blue coins; bowl B2 has 1 red and 2 blue coins; bowl B3 contains 5 red and 4 blue coins.
Suppose the probabilities for selecting the bowls is not the same but are:-

  • P(B1) = 1/3
  • P(B2) = 1/6
  • P(B3) = 1/2

Now, let us compute, assuming that a red coin was drawn what will be the probability that it came from bowl B1.

Means we have given that a red coin was drawn and based on this previous event we need to calculate what is the probability that red coin was drawn from bowl 1 i.e B1.

In mathematics teams, we need to find out P(B1|R) = ???

And according to Bayes’ theorem

P(B1|R) = P(R|B1) * P(B1) / P(R) 

For that, we need to calculate some probabilities which are:-

  • Probability to select a red coin i.e P(R)
  • Probability to select the bowl 1 (B1) i.e P(B1) which is already given 1/3
  • Probability to select a red coin from B1 i.e P(R|B1)
  1. P(R) = P(B1 ⋂ R) + P(B2 ⋂ R) + P(B3 ⋂ R)
    • P(B1 ⋂ R) is probability to select bowl 1 and red coin
    • P(B2 ⋂ R) is probability to select bowl 2 and red coin
    • P(B3 ⋂ R) is probability to select bowl 3and red coin

    = P(selecting B1) * P(Number of Red coins / total number of coins in B1 ) + P(selecting B2) * P(Number of Red coins in B2 / total number of coins in B2 ) + P(selecting B3) * P(Number of Red coins in B3/ total number of coins in B3 )

    = 1/3 * 2/6 + 1/6 * 1/3 + 1/2* 5/9
    = 4/9

  2.  P(R|B1)
    The probability of selecting a red coin given that it will be drawn from B1 is 2/6
  3. P(B1) was given i.e 1/3.

By putting all the values in the formula:

P(B1|R) = (2/6 *1/3) / 4/9

= 2/8 = 0.25

so we can say that if a red coin was drawn that it will be 25% chances that it was drawn from bowl 1 i.e B1.

I hope through this blog now we have understood the conditional probabilities and Bayes theorem and in my next blog, we will make use of this knowledge and try to understand Naive Bayes classifiers algorithm.


This blog is all about creating a basic knowledge, which will help you to understand Naive Bayes classifiers, one of simplest and popular machine learning algorithm. This blog will explain to you what powers are actually working under the hood of Naive Bayes classifiers and what is this Bayes in naive Bayes classifiers algorithm.




Written by 

Nitin Aggarwal is a software consultant at Knoldus Software INC having more than 1.5 years of experience. Nitin likes to explore new technologies and learn new things every day. He loves watching cricket, marvels movies, playing guitar and exploring new places. Nitin is familiar with programming languages such as Java, Scala, C, C++, Html, CSS, technologies like lagom, Akka, Kafka, spark, and databases like Cassandra, MySql, PostgreSQL, graph DB like Titan DB.

2 thoughts on “MachineX: Unfolding Mystery Behind NAIVE BAYES CLASSIFIER5 min read

  1. Thank you for that … it helps. Personally, I wouldn’t mind if you left a challenge question at the end to test my understanding – something that makes you work out the answer and reinforce the knowledge.

Comments are closed.