Analysis of campus placement dataset using decision tree

Reading Time: 3 minutes

KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.

With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding.

Hello, folks! In this blog, we will analyse the Campus placement data using KNIME analytics platform and find some exciting results. I hope you will enjoy the blog.

This image has an empty alt attribute; its file name is screenshot-222.png

Exploring the Dataset:

This data set consists of Placement data of students in a XYZ campus. It includes secondary and higher secondary school percentage and specialization. It also includes degree specialization, type and Work experience and salary offers to the placed students.

Sample of dataset:

This image has an empty alt attribute; its file name is screenshot-223.png

The dataset consists of several features:

FeatureDescriptionData Type
genderGender of studentString
ssc_pPercentage in intermediateNumber (double)
ssc_bBoard of intermediateString
hsc_pPercentage of High schoolNumber (double)
hsc_bBoard of High schoolString
hsc_sHigh school StreamString
degree_pPercentage of degree educationNumber (double)
degree_tDegree streamString
workexWork ExperienceString
etest_pPercentage in EtestNumber (double)
specialisationMBA specialisationString
mba_pPercentage in MBANumber (double)
statusPlacement Status( Placed/ Unplaced)String
salarySalary offered in placement Number (integer)

So, first we have read the data from the given .csv file with the help of KNIME File reader node.

Pre-Processing the data:

As you know we are going to Predict the placement status of the students and the expected salary of the students in the placement with the help of Decision tree algorithm.

In the pre-processing of data first we have used the the Numeric Outliers to detects and treats the outliers for each of the selected columns individually by means of interquartile range (IQR).

First we have to prepare the data to predict the placement status of the students. So for that first we have used column filter to filter some extra column for the data and used missing value node to fill the missing values in the data.

This image has an empty alt attribute; its file name is screenshot-225.png

Then, to predict the expected salary of the students. we have used Row filter node to filter out the placed candidate of the data. After that we have used missing value node to fill the missing values in the data. and Rule Engine node to segregate the salary into 5 slabs.

This image has an empty alt attribute; its file name is screenshot-226.png

We have also used Rank correlation Node to find the correlation between the nodes.

Look over the results:

To Predict the placement status of the students and the expected salary of the students in the placement with the help of Decision tree algorithm.

This image has an empty alt attribute; its file name is screenshot-227.png

We have got the following accuracy from the model:

This image has an empty alt attribute; its file name is screenshot-228.png
This image has an empty alt attribute; its file name is screenshot-229.png

We have also Visualize the data to get some interesting information.

  • No of students placed/unplaced in different streams using bar chart.
This image has an empty alt attribute; its file name is screenshot-230.png
  • No of different streams students got the particular segment package in placements.
This image has an empty alt attribute; its file name is screenshot-231.png
  • Salary packages got from males and female candidates.
This image has an empty alt attribute; its file name is screenshot-232.png

You can download and view the complete workflow on the KNIME-HUB.

Note: I hope our blogs help you to enhance your learning. I’ll post more blogs on KNIME. Stay Tuned.

This image has an empty alt attribute; its file name is footer-2.jpg

Written by 

Pankaj Chaudhary is a Software Consultant at Knoldus LLP. He has 1.5+ years of experience with good knowledge of Rust, Python, Java, and C. Now he is working as Rust developer and also works on machine learning and data analysis because he loves to play with data and extract some useful information from it. His hobbies are bike riding and explore new places.