In this blog, we are going to learn about one of the evaluation metrics that is used for evaluating a classification ML model, which is, Jaccard Index. But first, let’s see what evaluation metrics are.
Evaluation metrics help us in telling the performance of our ML models. They help us in calculating an ML model’s accuracy. Accuracy tells us how good or bad our ML model is, i.e., how our ML model is going to perform on an unknown data sample, based on the training that it has received by the training set. For evaluating an ML model, we need a test set, which is usually different from the training set, that we feed into our ML model and see what the outputs are and compare these outputs with already known outputs. So now that we are clear with what evaluation metrics are, let’s move on to the actual topic of our blog, Jaccard Index.
Jaccard Index is one of the simplest ways to calculate and find out the accuracy of a classification ML model. Let’s understand it with an example. Suppose we have a labelled test set, with labels as –
y = [0,0,0,0,0,1,1,1,1,1]
And our model has predicted the labels as –
y1 = [1,1,0,0,0,1,1,1,1,1]
The above Venn diagram shows us the labels of the test set and the labels of the predictions, and their intersection and union.
The Jaccard Index is defined as the size of the intersection divided by the size of the union of the two labelled sets, with formula as –
So, for our example, we can see that the intersection of the two sets is equal to 8 (since eight values are predicted correctly) and the union is 10 + 10 – 8 = 12. So, the Jaccard index gives us the accuracy as –
So, the accuracy of our model, according to Jaccard Index, becomes 0.66, or 66%.
That was all there is to know about the Jaccard Index. Hope this blog was helpful to you. Thanks for reading.