KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.
With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding.
Hello, folks! In this blog, we will analyse the Campus placement data using KNIME analytics platform and find some exciting results. I hope you will enjoy the blog.
Exploring the Dataset:
This data set consists of Placement data of students in a XYZ campus. It includes secondary and higher secondary school percentage and specialization. It also includes degree specialization, type and Work experience and salary offers to the placed students.
Sample of dataset:
The dataset consists of several features:
|gender||Gender of student||String|
|ssc_p||Percentage in intermediate||Number (double)|
|ssc_b||Board of intermediate||String|
|hsc_p||Percentage of High school||Number (double)|
|hsc_b||Board of High school||String|
|hsc_s||High school Stream||String|
|degree_p||Percentage of degree education||Number (double)|
|etest_p||Percentage in Etest||Number (double)|
|mba_p||Percentage in MBA||Number (double)|
|status||Placement Status( Placed/ Unplaced)||String|
|salary||Salary offered in placement||Number (integer)|
So, first we have read the data from the given .csv file with the help of KNIME File reader node.
Pre-Processing the data:
As you know we are going to Predict the placement status of the students and the expected salary of the students in the placement with the help of Decision tree algorithm.
In the pre-processing of data first we have used the the Numeric Outliers to detects and treats the outliers for each of the selected columns individually by means of interquartile range (IQR).
First we have to prepare the data to predict the placement status of the students. So for that first we have used column filter to filter some extra column for the data and used missing value node to fill the missing values in the data.
Then, to predict the expected salary of the students. we have used Row filter node to filter out the placed candidate of the data. After that we have used missing value node to fill the missing values in the data. and Rule Engine node to segregate the salary into 5 slabs.
We have also used Rank correlation Node to find the correlation between the nodes.
Look over the results:
To Predict the placement status of the students and the expected salary of the students in the placement with the help of Decision tree algorithm.
We have got the following accuracy from the model:
We have also Visualize the data to get some interesting information.
- No of students placed/unplaced in different streams using bar chart.
- No of different streams students got the particular segment package in placements.
- Salary packages got from males and female candidates.
You can download and view the complete workflow on the KNIME-HUB.
Note: I hope our blogs help you to enhance your learning. I’ll post more blogs on KNIME. Stay Tuned.