Introduction
Supervised Machine Learning algorithm can be broadly classified into Regression and Classification Algorithms. In Regression algorithms , we have to predict the output for continuous values, but to predict the categorical values , we need Classification algorithms.
Classification can be performed on structured or unstructured data. The main goal of a classification problem is to identify the category or a class to which a new data will fall under.
We use the training Data set to get better boundary conditions that could be used to determine each target class. Once the boundary conditions are determined, the next task is to predict the target class. The whole process is known as classification.
In this Blog we will Discuss some of the popular classification models which include :
1)Support Vector Classifiers
2)Decision Trees
3)Random Forest Classifiers.
There are various evaluation methods to find out the accuracy of these models also. We will discuss these models, the evaluation methods and a technique to improve these models called Hyper parameter tuning in detail.
Let us first dive into Classification types: Binary Classification , Multi Class Classification , Multi Label Classification.
Binary Classification
Binary classification has only two categories. Usually, they are boolean values – 1 or 0 , True or False, High or Low. Some examples where such a classification could be used is in cancer detection or email spam detection where the labels would be positive or negative for cancer and spam or not spam for spam detection.
Models that can be used for Binary classification are:
> Logistic Regression .
> Support vector Classifiers.
You can also use Decision Trees, Random Forests and other algorithms but Logistic Regression and Support Vector Classification are used exclusively for binary classification.
We are using a breast cancer detection data-set that can be downloaded from here.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data= pd.read_csv("data.csv")
data.head()
Now we will try to Scatter Plot
sns.scatterplot(x="radius_mean",y="texture_mean",hue="diagnosis",data=data)

Here you can see the two ‘classes’ – ‘M’ stands for malignant and ‘B’ stands for benign. As you can see, the classes are well divided and are easily differentiable to the naked eye for these two features. However, this will not be true for all pairs of features.
Models that can be used for such a classification are:
- Logistic Regression
- Support Vector Classifiers
You can also use Decision Trees, Random Forests and other algorithms but Logistic Regression and Support Vector Classification are used exclusively for binary classification.
Regression and Support Vector Classification are used exclusively for binary classification.
Multi Class Classification:
Multi-class classifiers or multi-nominal classifiers can distinguish between more than two classes.
Example : Classifications of types of crops, Classification of types of music. Algorithms such as Random Forests and Naive Bayes can easily build a multi-class classifier model.
import seaborn as sns
penguins = sns.load_dataset("penguins")
penguins.head()
sns.scatterplot(x="bill_length_mm",y="flipper_length_mm",hue="species",data=penguins)



Algorithms such as Random Forests and Naive Bayes can easily build a multi class classifier model. Other algorithms like Support Vector Classifiers and Logistic Regression are used only for Binary Classification.
Multi Label Classification
This type of classification occurs when a single observation contains multiple labels. For example, a single image might contain a car, a truck and a human. The algorithm must be able to classify each of them separately. Thus it has to be trained for many labels and should report True for a car, truck and human and False for any other labels it has trained for.
Classification Algorithms can be further divided into the Mainly two category:
–>Linear Model : Logistic Regression,Support Vector Machines
–>Non-linear Models : K-Nearest Neighbours, Kernel , Naive Bayes, Decision Tree Classification, Random Forest Classification.
For this section of the blog you can access to the codes in this link https://github.com/mohana-sai/presentationcodes
Classifier Models:
We need to prepare the data for training the algorithm. The first step is to pre-process and clean the data.The cleaning we need for this data-set is to change the string names of the flowers to integer values so the algorithm can classify them properly. We also need to drop the observations having “NaN” values.
In the next part we will discuss more about classifier models and the how the implementation is done . That is all for now .
1 thought on “Fundamentals of Classification Models Part-14 min read”
Comments are closed.