Kubeflow: A Complete Solution to MLOps.

Reading Time: 5 minutes
Kubeflow Introductory Image

Hey Folks, In this blog we are going to put some light on Kubeflow, an open-source platform that enables us to orchestrate complicated workflows running on Kubernetes of machine learning pipelines.

Kubeflow Emerges for ML Workflow Automation

Many data scientists today find it burdensome to manually execute all of the steps in a machine learning workflow. Moving and transforming data, training models, then promoting them into production.

Data scientists can use Kubeflow to build and experiment with machine learning pipelines.
Machine Learning engineers and operational teams can use Kubeflow to train and deploy ML systems in various environments for development, testing, and production-level serving.

What is Kubeflow?

Kubeflow ML Workflow

Kubeflow is a free and open-source machine learning platform designed to enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes. Kubeflow was based on Google’s internal method to deploy TensorFlow models called TensorFlow Extended.

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.

It is an end-to-end Machine Learning platform for Kubernetes.
It provides components for each stage in the ML lifecycle, starting from exploration of data, model training, and deployment.
Operators can select the best-trained model for the end-users, with no need to deploy every component.

The set of tools available in Kubeflow helps the ML engineers/Data scientists in:

  • Data Exploration.
  • Build/Train machine learning models.
  • Analyze the model performance.
  • Hyper-parameter tuning.
  • Version different model.
  • Manage compute power.
  • Serving infrastructure.
  • Deploying the best model to production.

It runs on Kubernetes clusters, we can run it either locally or in the cloud.
It boosts the power of training the machine learning models on multiple nodes (i.e., computers).
Reduce the model training time.

Components of Kubeflow

Curious!!, What’s inside the KubeFlow?
Let’s unfold the treasure, and explore the components of the KubeFLow.

Kubeflow Central Dashboard

Yes, KubeFlow provides us a beautiful user interface called the central dashboard.
It helps us to quickly access the KubeFlow Components deployed in our cluster.

Kubeflow Central Dashboard

The Kubeflow user interface consists of the following:

  • Home: Central Hub to view, access resources recently used, active experiments, and useful documentation.
  • Notebook Servers: Manage Notebooks servers.
  • TensorBoards: Manage servers of TensorBoards.
  • Models: Manage deployed KFServing models.
  • Volumes: Manage cluster’s Volume.
  • AutoML Experiments: Manage Katlib experiments.
  • KFP Experiments: Manage Kubeflow Pipelines (KFP) experiments.
  • Pipelines: Manage Kubeflow Pipelines.
  • Runs: Manage KFP runs.
  • Recurring Runs: Manage KFP recurring runs.
  • Artifacts: To track ML Metadata (MLMD) artifacts.
  • Execution: To track various component execution in MLMD.
  • Manage Contributors: Configure user access sharing across namespaces in the Kubeflow

To access the central dashboard, you need to connect to the Istio gateway that provides access to the Kubeflow service mesh.
How you access the Istio gateway varies depending on how you’ve configured it.

Kubeflow Notebooks

Kubeflow Notebooks, a web-based development environment inside our Kubernetes clusters by running them inside the pods.

Users can spin up the notebook servers either using Jupyter lab, R Studio, or Visual Studio Code (code-server).
It can be done directly from the dashboard, allocating the right storage, CPUs, and GPUs.

You can create notebook containers directly in the cluster, rather than locally on their workstations.
Admins can provide standard notebook images for their organization with required packages pre-installed.
Kubeflow’s RBAC can be used to manage the access control that enables easier notebook sharing across the organization.
You can set up your notebook server using this link:-> Setup Kubeflow notebook server.

ML Libraries and Framework.

It is compatible with all the required machine learning libraries and frameworks like TensorFlow, PyTorch, XGBoost, sci-kit-learn, MXNet, Keras, and many more.

Kubeflow Pipelines

Kubeflow pipeline is a platform for building and deploying scalable, and portable machine learning workflows based on Docker containers.

We can automate our ML workflow into pipelines by containerizing steps as pipeline components and defining inputs, outputs, parameters, and generated artifacts.

So, a big question comes into mind what is a pipeline?
Let’s answer that question first.

What is Pipeline?

ML pipeline is a means of automating the machine learning workflow by enabling data to be transformed and correlated into a model that can also be anatomized to achieve outputs. This type of ML pipeline makes the process of inputting data into the ML model completely automated.

ML pipeline is the end-to-end construct that orchestrates the inflow of data into, and output from, a machine learning model (or set of multiple models). It includes raw data input, features, outputs, the machine learning model and model parameters, and prediction outputs.

In Kubeflow, the pipeline component is a self-contained set of user code, packaged as a Docker image, that performs one step in the pipeline. For example, a component can be responsible for data preprocessing, data transformation, model training, and so on.

While writing the code of the pipeline component make sure that all the necessary libraries that are needs to be imported should be defined within the function.

Each pipeline component should be independent of dependencies. This will helps us in many ways.
For example: If we got any failure in the pipeline so, we could easily identify the component which holds the issue and troubleshoot it without impacting other components.

This is how the Kubeflow pipeline looks:

Kubeflow Pipline Example

You can find the sample python code for the pipeline on xgboost-training-cm.py.

Katlib for Hyperparameter tuning/AutoML

Katlib is the component of Kubeflow that is used for hyperparameter tuning, neural architecture search.
Katib is a Kubernetes-native project for automated machine learning (AutoML). 
It runs pipelines with different hyperparameters, optimizing for the best ML model.
Katib is agnostic to machine learning (ML) frameworks.
It can tune hyperparameters of applications written in any language of the users’ choice and natively supports many ML frameworks, such as TensorFlow, MXNet, PyTorch, XGBoost, and others.

Automated Machine Learning (AutoML) is a way to automate the process of applying machine learning algorithms to solve real-world problems.
Basically, it automates the process of feature selection, composition, and parameterization of machine learning models.

Katlib supports a lot of various AutoML algorithms, such as Bayesian optimization, Tree of Parzen Estimators, Random Search, Covariance Matrix Adaptation Evolution Strategy, Hyperband, Efficient Neural Architecture Search, Differentiable Architecture Search, and many more.



For model serving Kubeflow uses KServe(earlier called KFServing).

KServe is a multi-framework model deployment tool with serverless inferencing, canary roll-outs, pre & post-processing, and explainability.

It aims at solving the difficulties of model deployment to production through the “model as data” approach, i.e. providing an API for inference requests.

KFServing abstracts away the complexity of server configuration, networking, health checking, autoscaling of heterogeneous hardware (CPU, GPU, TPU), scaling from zero, and progressive (aka. canary) rollouts.
It provides a complete story for production ML serving that includes prediction, pre-processing, post-processing and explainability, in a way that is compatible with various frameworks – Tensorflow, PyTorch, XGBoost, ScikitLearn, and ONNX.

KServe enables serverless inferencing on Kubernetes and provides performance, high abstraction interfaces for common machine learning (ML) frameworks like TensorFlow, XGBoost, sci-kit-learn, PyTorch, and ONNX to solve production model serving use cases.

and many more…

Kubeflow provides the integration that you need.
It integrates with MLFlow for the model registry, staging, and monitoring in production, Seldon Core for inference serving, and Apache Spark for parallel data processing.

Training of ML models in Kubeflow through operators like TFJobs, PyTorchJob, MXJob, XGBoostJob, and MPIJob.

You can schedule your jobs also with gang scheduling.


So, in this blog, we discussed Kubeflow and its component. We have learned what are machine learning pipelines, and how Kubeflow enables us to orchestrate complicated workflows running on Kubernetes of machine learning pipelines. how we can set up notebooks servers, do hyperparameter tuning, and serve models to production with KServe.



Written by 

Durgesh Gupta is a Software Consultant working in the domain of AI/ML.