ICC Test Cricket Data Analysis using KNIME

Reading Time: 4 minutes

KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.

With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding.

Hello, folks! In this blog, we will analyse the ICC test cricket data using KNIME analytics platform and find some exciting results. I hope you will enjoy the blog.

This image has an empty alt attribute; its file name is screenshot-212.png

Exploring the Dataset:

ICC Test cricket is the form of the sport of cricket with the longest match duration and is considered the game’s highest standard. Test matches are played between national representative teams that have been granted ‘Test status’, as determined and conferred by the International Cricket Council (ICC). The term Test stems from the fact that the long, gruelling matches are mentally and physically testing. Two teams of 11 players each play a four-innings match, which may last up to five days (or longer in some historical cases). It is generally considered the most complete examination of a team’s endurance and ability.

The Data consists of runs scored by the batsmen from 1877 to 2019 December.

Sample of dataset:

This image has an empty alt attribute; its file name is screenshot-208.png

The dataset consists of several features:

FeatureDescriptionData Type
PlayerThe Player details including name and countryString
SpanThe Time span of player played for the countryString
MatTotal No of matches playedInteger
InnTotal no of innings played in the matchesInteger
NONo of times player Not Out on his high scoreInteger
RunsTotal no of runsInteger
HSHigh score in whole SpanString
AvgAverage runs per matchDouble
100No of Centuries in whole spanInteger
50No of Fifties in whole spanInteger
0No of duck in his SpanInteger

So, first we have read the data from the given .csv file with the help of KNIME File reader node.

Pre-Processing the data:

This image has an empty alt attribute; its file name is screenshot-210.png

After Read the data now, there is a profile column which is not usable for us. So we have used column filter to filter out that row from the data.

Then, the HS(High score) column contains the highest score and the status at high score whether they were out or not out. So in next node we have filter out that status and saved that status in a separate column.

Now, we have to find out the time-span( No of years) of a player they played for his country. So we have used column expression node in which we have find out the time-span and separate it into a new column.

After that, In the player column of data we have the player name and the country name of the player. So we have to filter out the player name and the country. For that we have used the component in which we have done it with the help of multiple column expression node.

Cleaning the Data:

We have done with the pre-processing part of data. Now we have to clean the data in which we have to handle the missing value from the data and remove some extra columns from the data set.

Look over the results:

1. TOP 5 Batsman of the ICC cricket test history.

To find the Top 5 Batsman of the ICC Test Cricket history, we have used the Top k selector node which the following configuration:

This image has an empty alt attribute; its file name is screenshot-213-1.png

Result data:

This image has an empty alt attribute; its file name is screenshot-214.png

2. Top 5 Players having highest Average in the ICC Test history.

To find the Top 5 players having highest average in the ICC Test history, we have used Top k selector having Highest score in the configuration.

Result data:

This image has an empty alt attribute; its file name is screenshot-215.png

3. Top 5 Countries having highest average Runs by the Players.

To find out the Top 5 countries having highest average runs by the players in his ICC Test career first we have filter out the countries with his average player score by the help of Group by node. After that we have used Top k selector node to filter out the top 5 countries.

Result data:

This image has an empty alt attribute; its file name is screenshot-216.png

4. Top 5 Countries whose maximum player remain Not Out on his Highest Score.

To find out the Top 5 countries whose maximum player remain Not out on his highest score, first we have used the pivoting node to aggregate the countries with the highest score status. After that we have used Top k selector to filter the Top 5 countries with respect to the highest score status.

Result data:

This image has an empty alt attribute; its file name is screenshot-217.png

You can download and view the complete workflow on the KNIME-HUB.

Note: I hope our blogs help you to enhance your learning. I’ll post more blogs on KNIME. Stay Tuned.

This image has an empty alt attribute; its file name is footer-2.jpg

Written by 

Pankaj Chaudhary is a Software Consultant at Knoldus LLP. Who has good knowledge of Rust, Python, Java, and C. Now he is working as Rust developer and also work on machine learning and data analysis because he loves to play with data and extract some useful information from it. His hobbies are bike riding and explore new places.