In this blog, we will try to see what is MLOps (Machine Learning Operations) and how it is related/different from traditional software development operations or what we call DevOps. We will see why it is important to talk about it and lastly where it is not really a necessity.
What is MLOps?
Machine Learning Operations (MLOps) is a set of core activities of machine learning engineering. It aims to put a process to make a reusable pipeline (We will see later, what is ML pipeline is) to explore, build, test, deploy, serve and monitor machine learning models. It goal of the MLOps process is not very different from the DevOps which aims to automate the development, deployment, and monitoring of the applications.
The steps are different in MLOps as compared to DevOps as ML is more complex than software development where things are not very much as standard as SDLC. In many cases, you will not be sure if things are working and the models will be in ever-evolving mode. Let us look at it more closely in the next section.
How does machine learning solves a problem?
On a very high level, a machine-learning problem takes multiple steps in order to solve a business problem. Let us take an example of a retail chain giant trying to forecast how much Milk they would be able to sell next month for a store located in a particular zip code. Before we talk about the details let us see below in the diagram how each step in solving this problem fits.
As per my experience, I have detailed the steps enough to understand what process a data science team follows to build the required intelligence using ML and be able to use that to make predictions. If you are familiar somehow with the processes, you can easily relate how each process is contributing to the overall goal.
Is it enough to get to production?
The answer would be no! In order to reach production, we need automation to some extent so that we can minimize the time between one process to another in the pipeline above to quickly build, train and serve the model to make predictions. In software development general terms, we need to continue integration and continuous deployment of some sort to make this happen. Data science is a fairly complex game and once you build a model does not work always! Think about the below situations.
- What if the model that we built started giving the wrong prediction for the actual data in the production?
- What if you need to retrain your model
- Do we need to scale to achieve a certain SLA?
Data science is no magic and will always require iterative approaches to tune the pipeline to make it happen!
MLOps major activities?
In order to do commercial AI/ML all sorts of big steps will require as soon as you get your data. MLOps is all about doing AI/ML with the scale with predefined repeatable processes to reduce the time taken to perform mentioned steps in the ML pipeline. We can say it is a set of practices so that effective collaboration and communication can take place in the data science team. MLOps principles become a tool for different role players in the data science team.
The life cycle of the ML model in development, serving, and monitoring, it would look like this:
When we talk about DevOps in general software development, it is more kind of sorted and established. Once it is coded and tested (most of the time automatic testing) it can be shipped for operations. When we talk about ML Ops, the activities are more experimental in nature and require activities like:
- Need to carefully look at the data before training -> Requires data cleaning/ versioning of input data
- Need feature building and hyperparameter tuning -> Requires model versioning and storage
- Extensive research before you validate your model -> Testing and packaging the model once done
- Release and ask Ops to serve to model -> Requires configuration/code config to manage scale between multiple environments
- Serving Model -> Requires model monitoring while being used in production
The above activities require to be aligned to some process model which is taken care of by MLOps. There is no silver bullet yet! It is still in making!! I hope I was able to provide a glimpse of what MLOps is about.
Where MLOps is not a real need?
I would say having a best practice helps and add value but sometimes can be overkill.
- If you are working on a model which is tested already and you think is of one-time use and once deployed, not much retraining or reiteration is needed, setting up all the process MLOps can be an overkill for your task.
- With the one-man team doing every step in the data science and no collaboration needed in terms of sharing data or versioning models as things are not yet in prod, MLOps might turn out to be an overkill for you.
If you want to dive deep, here are a few good reads!!