Data Science Project Life Cycle

Reading Time: 2 minutes

Overview

The development life cycle of a data science project is different than the traditional software development life cycle. Though the development methodologies and practices vary across organisations but most of them have similar processes. One such well known process is the Cross Industry Standard Process for Data Mining (CRISP-DM) and this blog will present a summarised version of it.

Data Science Life Cycle

The life cycle of a data science project is divided into six phases.

Business understanding – Understanding the business context and objectives both short and long term

Data understanding – Understanding the availability of quality and quantity of data

Data preparation – Prepare right datasets, feature and data engineering to use in the models

Modeling – Choosing the right modeling techinques, algorithms and frameworks

Evaluation – Model evaluation, bench marking and metrics

Deployment – Deployment of the final model

The following diagram show a typical data science project life cycle.

Figure : Data Science Project Life Cycle

Business Understanding

In this phase, business requirements and goals are understood. This phase is about assessment, planning, defining the governance model and the success criteria.

Data Understanding

In this phase data is acquired and examined. Data understanding can include exploratory data analysis , data visualisation , assessing the quality and quantity of data.

Data Preparation

The data preparation phase is one of the most important phases in the data science project life cycle. Some activities done during this phase are to determine the right datasets, data cleansing, levelling, data and feature engineering.

Modeling

This is one of the most exciting phases in the life cycle . Datasets are usually split into test, training and validation sets. The algorithms to be used are determined. Models are built and assessed continuously. Results of different models are interpreted based on the success and test criteria. This is an iterative phase and is continued until the results reach the expected benchmarks.

Evaluation

The evaluation phase focuses mainly on assessing the model based on business objectives. This evaluation is different that that done in the previous phase wherein models are assessed technically. The overall evaluation involves validating and measuring against the success criteria and the metrics defined.

Deployment

In this phase the model is deployed and made operational. Machine Learning models are usually integrated and coupled with products and applications. These can be web , desktop or mobile applications. Machine Learning models are also deployed on devices and nowadays gaining adoption and popularity in the field of edge computing.

Summary

The content in this article references the CRISP-DM processes. There are other known processes for data science and data mining projects like SEMMA , Knowledge Discovery in Databases (KDD) etc. With the wide adoption of Agile and Scaled Agile methodologies most of these Data Science Life Cycle processes are tailored to meet specific business needs with focus on iterative and incremental development and visibility.

References

https://www.datascience-pm.com/

Written by 

Rohit Jagati is a technologist, experienced in designing enterprise grade solutions. Rohit has been designing and implementing solutions based on Machine Learning, IOT and custom/bespoke applications.