Building a machine learning model is an iterative process. For a successful deployment, most of the steps are replicated several times to achieve optimal results. The model must sustain after deployment and adapted to changing environment. Let’s look at the details of the lifecycle of a machine learning model.
What is machine learning lifecycle?
The machine learning lifecycle is the process of developing, deploying, and managing a machine learning model for a specific application. The lifecycle typically consists of:
Determining business objective: The process typically starts by determining the business objective of implementing a machine learning model. For example, a business objective for a bank can be decreasing fraudulent transactions under a certain percentage of total transactions.
- Determining business objective: The process typically starts by determining the business objective of implementing a machine learning model. For example, a business objective for a bank can be decreasing fraudulent transactions under a certain percentage of total transactions.
- Data collection and exploration: Guided by the established business objective, you collect the relevant data for the machine learning task. Then, you perform exploratory data analysis and data visualizations to understand what the available data provides, and which processes are need to make the data ready before model training.
- Data processing and feature engineering: The data is then transform to better satisfy the business objective and to make it ready for the model. This stage includes data cleaning, splitting the data into training, validation and test sets, and feature engineering. Feature engineering is the process of transforming data to better represent the business objective. There are AutoML tools that offer automated feature engineering.
- Model training: Then, the new machine learning model is train on the prepared data. It is an iterative process. You can test several different algorithms, select the suitable model and finetune its hyperparameters to achieve the best performance. Hyperparameters refer to model parameters that influence the learning process (e.g. size of a neural network) that are not from data.
- Model testing and validation: The model is evaluated on a test set to ensure that its predictive performance is adequate for the use case. Before deployment, other potential issues about model performance are such as:
- Excessive resource requirements: The model can consume a large amount of memory or require long processing times. Software engineers and data scientists can work on this problem together to optimize the model performance.
- Insufficient performance: The cost of deploying the model can outweigh its benefits to the business. For example,
- the model may not identify the accuracy of its own predictions accurately. This may require all predictions to be human review if false positives are costly for the process
- the model may not have a high accuracy and therefore offer limited benefits
- Feel free to read our article on ML accuracy for more
- Model deployment: The selected and fine-tuned model is deploy to make predictions. The deployment type can include:
- Online deployment: The model is deployed via an API to respond to requests in real-time and serve predictions.
- Batch deployment: The model is integrated into a batch prediction system.
- Embedded model: The model is embedded in an edge or mobile device.
- Model monitoring: After deployment, the model’s performance is to ensure that it performs well over time. For example, a machine learning model develop a year ago to detect fraud may not capture a new type of fraud if it has not been continuously improve. For models that are trained in specific intervals, a new iteration of the development process can be launch.
What are the challenges of ML lifecycle management?
- Manual labor: Every step and the transition between steps are manual. It means data scientists need to collect, analyze, and process data for each application manually. They need to examine their older models to develop new ones and manually fine-tune each time. A large amount of time is allocation to model monitoring to prevent performance degradation.
- Scalability: As data size or the number of deployed machine learning models grow, it becomes challenging to manage the whole process manually. It may require different teams of data scientists to develop, manage, and monitor each model. So there is a limit for an organization to scale up its machine learning applications while relying on manual processes.
1. Automation of the lifecycle
A successful deployment of machine learning models at scale requires automation of steps of the lifecycle. Automation decreases the time allocated to resource-consuming steps such as feature engineering, model training, monitoring, and retraining. It frees up time to rapidly experiment with new models.
2. Standardization of the process
Data scientists must collaborate with different teams and collaboration requires a common language between teams. Standardization of the ML development and management platform within an organization enables efficient communication between diverse teams.
3. Continuous training
Real-world data changes continuously. So ML model should also retrain continuously to maintain model performance.