ICC Test Cricket Data Analysis using KNIME

Table of contents

Reading Time: 4 minutes

KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.

With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding.

Hello, folks! In this blog, we will analyse the ICC test cricket data using KNIME analytics platform and find some exciting results. I hope you will enjoy the blog.

This image has an empty alt attribute; its file name is screenshot-212.png

Exploring the Dataset:

ICC Test cricket is the form of the sport of cricket with the longest match duration and is considered the game’s highest standard. Test matches are played between national representative teams that have been granted ‘Test status’, as determined and conferred by the International Cricket Council (ICC). The term Test stems from the fact that the long, gruelling matches are mentally and physically testing. Two teams of 11 players each play a four-innings match, which may last up to five days (or longer in some historical cases). It is generally considered the most complete examination of a team’s endurance and ability.

The Data consists of runs scored by the batsmen from 1877 to 2019 December.

Sample of dataset:

This image has an empty alt attribute; its file name is screenshot-208.png

The dataset consists of several features:

Feature	Description	Data Type
Player	The Player details including name and country	String
Span	The Time span of player played for the country	String
Mat	Total No of matches played	Integer
Inn	Total no of innings played in the matches	Integer
NO	No of times player Not Out on his high score	Integer
Runs	Total no of runs	Integer
HS	High score in whole Span	String
Avg	Average runs per match	Double
100	No of Centuries in whole span	Integer
50	No of Fifties in whole span	Integer
0	No of duck in his Span	Integer

So, first we have read the data from the given .csv file with the help of KNIME File reader node.

Pre-Processing the data:

This image has an empty alt attribute; its file name is screenshot-210.png

After Read the data now, there is a profile column which is not usable for us. So we have used column filter to filter out that row from the data.

Then, the HS(High score) column contains the highest score and the status at high score whether they were out or not out. So in next node we have filter out that status and saved that status in a separate column.

Now, we have to find out the time-span( No of years) of a player they played for his country. So we have used column expression node in which we have find out the time-span and separate it into a new column.

After that, In the player column of data we have the player name and the country name of the player. So we have to filter out the player name and the country. For that we have used the component in which we have done it with the help of multiple column expression node.

Cleaning the Data:

We have done with the pre-processing part of data. Now we have to clean the data in which we have to handle the missing value from the data and remove some extra columns from the data set.

Look over the results:

1. TOP 5 Batsman of the ICC cricket test history.

To find the Top 5 Batsman of the ICC Test Cricket history, we have used the Top k selector node which the following configuration:

This image has an empty alt attribute; its file name is screenshot-213-1.png

Result data:

This image has an empty alt attribute; its file name is screenshot-214.png

2. Top 5 Players having highest Average in the ICC Test history.

To find the Top 5 players having highest average in the ICC Test history, we have used Top k selector having Highest score in the configuration.

Result data:

This image has an empty alt attribute; its file name is screenshot-215.png

3. Top 5 Countries having highest average Runs by the Players.

To find out the Top 5 countries having highest average runs by the players in his ICC Test career first we have filter out the countries with his average player score by the help of Group by node. After that we have used Top k selector node to filter out the top 5 countries.

Result data:

This image has an empty alt attribute; its file name is screenshot-216.png

4. Top 5 Countries whose maximum player remain Not Out on his Highest Score.

To find out the Top 5 countries whose maximum player remain Not out on his highest score, first we have used the pivoting node to aggregate the countries with the highest score status. After that we have used Top k selector to filter the Top 5 countries with respect to the highest score status.

Result data: