Migrating MLFlow Server To Cloud: Part 1

Reading Time: 4 minutes

The cloud migration process involves moving all or part of an organization’s data, apps, and services from on-premises data centres to a public or private cloud, where they are accessible on-demand over the Internet to authorized users. For most businesses considering cloud migration, the move is filled with promise and potential; scalability, flexibility, reliability, cost-effectiveness, improved performance and disaster recovery, and simpler, faster deployment.

Cloud migration does not mean simply lifting and shifting your applications to a cloud platform. Instead, it involves assessing the application’s architecture and check if it is compatible with the technology stack of the Cloud platform. Moving an application that has a ‘stateful’ architecture to the cloud will hardly benefit from the move as it will be difficult to deploy and despite the move, the application will not be able to scale. Hence, the first step towards cloud migration will involve defining the goals and objectives behind cloud migration and then assessing if the transfer to the cloud will be beneficial or not.

In this case study, we’ll explore how we migrated our machine learning infrastructure to a cloud. We’ll dive into the migration process that we followed. Also, I’ll be including the decisions that made before and during the migration to make this happen.

Migration Objective

We had our Machine Learning infrastructure locally on an on-premise standalone server. We struggled with the maintenance difficulties and lack of scalability of the bare metal infrastructure and supporting operations from remote locations especially during the work from home due to Corona Virus lockdown. So, decided to migrate the infrastructure so that all of it is in Cloud.

We conducted a preliminary audit and found out the following technology being used currently:

  • Machine OS: Linux
  • Database: Postgres
  • MLFlow
  • Local Disk Persistence

Assessment Phase

The first step, before migration, is to calculate the cost of the move and the cost of what you are running in your data centers. This is useful if you’re planning a migration from an on-premises environment, a private hosting environment, another cloud provider, or if you’re evaluating the opportunity to migrate and exploring what the assessment phase might look like. I have written a blog about assessment earlier. The following is the decisions we made for the migration:

  • Cloud Type: We chose a public cloud for moving to the cloud as it’s cost-effective for small to medium. We have gone with Google Cloud Platform for the same.
  • Approach: We chose Greenfield approach to implement the migrations as we are moving everything to the cloud.
  • Cloud Readiness: Since the data scientists generally will use the UI for their operations, it’ll be a cosmetic change for them but it’ll have some downtime.

Planning Phase

After we had evaluated the cost of the move, we started looking at what to migrate. Moving all apps at the same time and the same way didn’t make sense. So, we created a migration plan on how we’ll carry out the migration.

  • Create an inventory of all the components being used.
  • Catalogue components according to their properties and dependencies.
  • Select the workloads that you want to migrate first.

Build an inventory

To scope your migration, you must first understand how many items, such as apps and hardware appliances, exist in your current environment, along with their dependencies. Building the inventory is a task that requires a significant effort, especially when you don’t have any automatic cataloguing system in place.

For each component in the environment, the following table highlights the most important technologies, its deployment procedure, and other requirements.

The following table is an example of the dependencies of the apps listed in the inventory. These dependencies are necessary for the apps to correctly function.

Select The Workloads

  • Storage data: High priority as a prerequisite for the MLFlow Server.
  • SQL Server: High priority as a prerequisite for the MLFlow Server.
  • MLFlow Server: Medium to high priority as it’ll incur a downtime.
  • Jenkins CI: Medium to low priority as it is standalone in our infrastructure.

Mapping Resources to The Cloud

We made a decision on various VMs and other services we need to use for the target machines in our move to cloud.

Conclusion

In this blog, I’ve tried to provide simple guidance regarding key decisions and the steps required in the first two phases of migration to cloud and how to approach those migrations. In the following blog, I’ll be writing about the last two parts which is deployment and optimization of the environment.

Knoldus-blog-footer-image

Written by 

Sudeep James Tirkey is a software consultant having more than 2 year of experience. He likes to explore new technologies and trends in the IT world. His hobbies include playing football and badminton, reading and he also loves travelling a lot. Sudeep is familiar with programming languages such as Java, Scala, C, C++ and he is currently working on DevOps and reactive technologies like Jenkins, DC/OS, Ansible, Scala, Java 8, Lagom and Kafka.