MLOps or ML Ops is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently.The word is a compound of “machine learning” and the continuous development practice of DevOps in the software field.
MLOps is slowly evolving into an independent approach to ML lifecycle management. It applies to the entire lifecycle – data gathering, model creation (software development lifecycle, continuous integration/continuous delivery), orchestration, deployment, health, diagnostics, governance, and business metrics.
The key phases of MLOps are:
- Gathering of Data
- Data analysis
- Data transformation/preparation
- Training and Development of Model
- Model validation
- Model serving
- Monitoring of Model
- Model re-training.
DevOps vs MLOps
MLOps is an offshoot of DevOps. It implements pipelines and automation for the smooth flow of training operations and the integration of final models into software products.
It may require steps such as data validation, model validation, and model quality testing. When it comes to deployment, depending on the type of ML model, the developer needs to set up pipelines for ongoing data handling and training; this requires multi-step pipelines to handle retraining steps, verification, and redeployment processes.
- MLOps helps in building an efficient machine learning strategy for a business by combining the business knowledge of an organisation’s operation team with the data science team’s expertise to drive maximum benefit.
- MLOps automates model development and deployment. This helps in faster release and lower operational costs, resulting in business agility and faster decision making.
- MLOps puts the operation team at the forefront of the regulatory process. This is important because insights gained from the data will hold no ground if one disregards the standard practices and the regulations.
- MLOps facilitates the collaboration between the operations and the data team to optimise labour division.
- The key phases of MLOps include–data gathering, data analysis, data preparation and transformation, model training and development, model serving, model monitoring and model retraining.
2. Architect ML and data solutions for the problem
Searching for data is one of the most strenuous tasks. It is a process with several tasks:
- You need to look for any available relevant dataset,
- Check the credibility of the data and its source.
- Is the data source compliant with regulations like GDPR?
- How to make the dataset accessible?
- What is the type of source — static(files) or real-time streaming(sensors)?
- How to build a data pipeline that can drive both training and optimization once the model is deployed in the production environment?
- What all cloud services are to be used?
3. Data preparation and processing — part of data engineering.
Data preparation includes tasks like feature engineering, cleaning(formatting, checking for outliers, imputations, rebalancing, etc), and then selecting the set of features that contribute to the output of the underlying problem.
An important part of deploying such pipelines is to choose the right combination of cloud services and architecture that is performant and cost-effective. For example, if you have a lot of data movement and huge amounts of data to store, you can look to build data lakes using AWS S3 and AWS Glue.
You might want to practice building a few different kinds of pipelines(Batch vs Streaming) and try to deploy those pipelines on the cloud.
5. Building and automating ML pipelines
These points should be in mind
- Identify system requirements — parameters, compute needs, triggers.
- Choose an appropriate cloud architecture — hybrid or multi-cloud.
- Construct training and testing pipelines.
- Track and audit the pipeline runs.
- Perform data validation.
6. Deploying models to the production system
There are mainly two ways of deploying an ML model:
Within dynamic deployment, you can use different methods:
- deploying on a server( a virtual machine)
- deploying in a container
- serverless deployment
- model streaming — instead of REST APIs, all of the models and application code are registered on a stream processing engine
Following are the considerations:
- Ensuring proper documentation and testing scores are met.
- Revalidating the model accuracy.
- Performing explainability checks.
- Ensuring all governance requirements have been met.
- Checking the quality of any data artifacts
- Load testing — compute resource usage.
7. Monitor, optimize and maintain models
An organization needs to keep an eye on the performance of the models in production but ensuring good and fair governance. Governance here means placing in control measures to ensure that the models deliver on their responsibilities to all the stakeholders, employees, and users that are affected by them.
As part of this phase, we need data scientists and DevOps engineers to maintain the whole system in production by performing the following tasks:
- Keeping track of performance degradation and business quality of model predictions.
- Setting up logging strategies and establishing continuous evaluation metrics.
- Troubleshooting system failures and introduction of biases.
- Tuning the model performance in both training and serving pipelines deployed in production.
Until recently, we were dealing with manageable amounts of data and a very small number of models at a small scale. The tables are turning now, we are embedding decision automation in a wide range of applications and this generates a lot of technical challenges that come from building and deploying ML-based systems.
To understand MLOps, we must first understand the ML systems lifecycle. The lifecycle involves several different teams of a data-driven organization.
- Business development or Product team — defining business objective with KPIs
- Data Engineering — data acquisition and preparation.
- Data Science — architecting ML solutions and developing models.
- IT or DevOps — complete deployment setup, monitoring alongside scientists.
Major challenges that MLOps addresses
Following are the major challenges that teams have come up with:
- There is a shortage of Data Scientists who are good at developing and deploying scalable web applications. There is a new profile of ML Engineers in the market these days that aims to serve this need. It is a sweet spot at the intersection of Data Science and DevOps.
- Reflecting changing business objectives in the model —There are many dependencies with the data continuously changing, maintain performance standards of the model, and ensuring AI governance. It’s hard to keep up with the continuous model training and evolving business objectives.
- Communication gaps between technical and business teams with a hard-to-find common language to collaborate. Most often, this gap becomes the reason for the failure of big projects.
- Risk assessment — there is a lot of debate going on around the black-box nature of such ML/DL systems. Assessing the risk/cost of such failures is a very important and meticulous step.
I hope you were able to follow along and was able to train successfully.
If you have any questions, recommendations, or critiques, I can be reached via my mail .Feel free to reach out to me.