All You Need To Know About Decision Tree Algorithm

Reading Time: 5 minutes

In this blog, we are going to discuss Decision Tree algorithm, a supervised algorithm which can be used to solve both regression and as well as classification problem too.

A classification algorithm, in general, is a function that weighs the input features so that the output separates one class into positive values and the other into negative values.

Introduction to Decision Tree Algorithm

A decision tree is a graphical representation of all possible solutions to a decision.

The objective of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from training data.

It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.

Let’s understand this using a scenario. So suppose a person plans to go out on weekend.

Decsion Tree Representation
Representation of a Decision Tree

Important Terminologies:

Decision Tree Explanation
  1. Root Node: Represents the entire population or sample or dataset which further gets divided into two or more homogeneous sets.
  2. Decision Node: A sub-node splits into further sub-nodes.
  3. Leaf / Terminal Node: Final output nodes. Tree cannot be segregated further.
  4. Splitting: It is a process of dividing a node into two or more sub-nodes.
  5. Pruning: Process to remove sub-nodes/unwanted nodes of a decision node. You can say the opposite process of splitting.
  6. Branch / Sub-Tree: The subsection of the entire tree is called branch or sub-tree.
  7. Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes
    whereas sub-nodes are the child of a parent node.

Building a Decision Tree Classifier

Before building the decision tree classifier, First lets understand how it works.

  1. Begin the tree with the root node, says S, which contains the complete dataset.
  2. Find the best attribute in the dataset using Attribute Selection Measure (ASM).
  3. Divide the S into subsets that contains possible values for the best attributes.
  4. Generate the decision tree node, which contains the best attribute.
  5. Recursively make new decision trees using the subsets of the dataset created in step -3.
    Continue this process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf node.

Attribute Selection Measures

If the dataset consists of N attributes then deciding which attribute to place at the root or at different levels of the tree as internal nodes is a complicated step. Randomly selecting any node as root node will cause bad results and low accuracy and will not solve the problem.

So a big question here is how to select the best attribute for the root node and for sub-nodes?

The answer to this is Attribute Selection Measures i.e., ASM.

Using ASM we select the best attribute for the nodes of the tree. There are multiple techniques for ASM like Entropy, Information Gain, Gini Index, Gain Ratio, Reduction in Variance, and Chi-Square. Among these two are popular techniques for ASM, which are:

  • Information Gain
  • Gini Index

Information Gain

The reduction in entropy or surprise by transforming a dataset.
Used in training decision trees.

Constructing a decision tree is all about finding an attribute that returns the highest information gain and the smallest entropy.

Information Gain = entropy(parent) – [average entropy(children)]

Gini Index

A function that determines how well a decision tree was split. Basically, it helps us to determine which splitter is best so that we can build a pure decision tree.
Gini impurity ranges values from 0 to 0.5.18

An attribute with the low Gini index should be preferred as compared to the high Gini index.

Gini Index= 1- ∑jPj2

Now, lets build our decision tree.

Loading Dataset……..

We are using Car Evaluation Data Set to build our decision tree classifier model which will predict the safety of the car.

You can download the data from here.

Lets load the dataset into pandas dataframe.

data = 'car_evaluation.csv'

df = pd.read_csv(data, header=None)

Viewing dataset

After loading the dataset we will do some data pre-processing like changing the column names.

Metadata of Dataset

Splitting the data

Now we will define our target variable and split our dataset.

Our target variable will be ‘class’, accordingly we define our feature vector.

X = df.drop(['class'], axis=1)

y = df['class']

Now, lets split our dataset in 8:2 i.e., 80% for training and 20% for testing.

The shape of training and testing data will be:

Shape of splitted data

Training Decision Tree Classifier Model

We will be training our classifier model two ASM (i.e., Attribute Selection Measure).

Before training, we will to encode the categorical variables of training dataset.

Decision Tree Classifier with ASM Gini index

Lets build and train our classifier model using criterion as gini index.

# import DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

gini_classifier = DecisionTreeClassifier(criterion='gini', max_depth=3,


# fit the model, y_train)

Okay, lets check the accuracy of the trained model on training and testing data.

Accuracy of trained classifier model on gini index
Decision tree visualisation of trained classifier model on gini index

Decision Tree Classifier with ASM Entropy

Lets build and train our classifier model using criterion as entropy.

entropy_classifier = DecisionTreeClassifier(criterion='entropy', max_depth=3, 


# fit the model, y_train)

The accuracy of the trained model on training and testing data.

Accuracy of classification model on entropy
Decision tree visualisation of trained classifier model on entropy

Based on the above analysis we can conclude that our classification model accuracy is very good.
Our trained classification model is very good at predicting the class labels.

We can also get the classification report of model to evaluate it.

It will tell us the underlying distribution of values, and about the type of errors our classifier is making.

Classification report


So, In this blog we have learned about a CART algorithm i.e., Decision Tree Algorithm. It can used to solve both the regression as well as classification problems. we also understand the it how the algorithm works and what are attribute selection measure, what role ASM plays in building a decision tree classifier. Then we build our own classification model to predict the safety of the car using two ASM’s and we got good accuracy on that, and also prepare the classification report of the model.


Written by 

Durgesh Gupta is a Software Consultant working in the domain of AI/ML.

1 thought on “All You Need To Know About Decision Tree Algorithm7 min read

Comments are closed.