How to do Collaborative AI/ML Data Management with Studio9

Reading Time: 6 minutes

What is Studio-9?

Studio9 is an open source platform for doing collaborative Data Management & AI/ML anywhere Whether your data is trapped in silos or you’re generating data at the edge, Studio9 gives you the flexibility to create AI and data engineering pipelines wherever your data is. And you can share your AI, Data, and Pipelines with anyone anywhere. With Studio9, you can achieve newfound agility to effortlessly move between compute environments, while all your data and your work replicates automatically to wherever you want.

Below are described the major components of Studio-9. AI/ML

1. ORION – A service further consists of three components namely Job Dispatcher, Job Supervisor, and Job Resource Cleaner. Job Dispatcher mainly forwards messages from RabbitMQ to the proper Job Supervisor, instantiating it for each new job request. The job Supervisor is responsible for instantiating the job master for each new job which will have a new job supervisor setup. Job Resource Cleaner consumes messages from RabbitMQ and spins a new JobResourcesCleanerWorker for handling each message which then executes tasks for cleaning the resources. 

2. ARIES – A microservice that allows read/write access to ElasticSearch. It stores Job Metadata, Heartbeats, and Job Results in ElasticSearch as documents. 

3. TAURUS – This service works as a message dispatcher using SQS/SNS.

4. BAILE – It receives messages from the UI service called Salsa and then sends them to Cortex if it’s not Online Prediction. In the case of Online Prediction, Salsa sends messages to Taurus which then sends them to Cortex.

5. ARGO – A service designed to capture all configuration parameters for all job types or services. These parameters are saved by Argo in ElasticSearch. 

6. PEGASUS – A prediction storage service that receives messages from Taurus via Orion to upload data to RedShift. The messages contain metadata for the online prediction job and CSV file with prediction results. 

What are use cases? AI/ML

Computational Data Core that Automatically Scales and Adapts to You AI/ML

Imagine never having to worry about how to keep your data organized, keep track of how, when, and where it was manipulated, keep track of where it came from, or keep track of all its meta-data. Now imagine being able to effortlessly and securely share your data and its lineage with your colleagues. Finally, imagine being able to do any Analytics or Machine Learning right where your data is. The Studio9 Computational Data Core makes this all possible.

The Data Science Replication Engine

Every step you perform in the Analytics & AI Lifecycle results in a valuable asset – a snippet of code, or a data transformation pipeline, or a table of newly engineered data, or an album of images or a new algorithm. Imagine having the power to instantly use any asset anyone creates to build bigger and better AI models that constantly expand your power to generate breakthroughs. Studio9 gives your team the frictionless ability to organize, track, share, and re-use all your Analytics & AI assets.

Automated Model Governance & Compliance AI/ML

Studio9 allows Model Risk Management, Regulatory Constraints, and Documentation Policies that your models must abide by to be encoded right into Pipeline and automatically reproduced every time a model is refreshed by Studio9. This includes Model Explainability, Model Fairness & Bias Analytics, Model Uncertainty, and Model Drift analytics – all of which are performed automatically. We don’t think AI makes machines smarter. It exists to make you smarter. The easier it is for you to make AI, the greater your ability to make breakthroughs. Whether you have unlimited compute resources in the cloud, or you are limited at the edge, your ability to make breakthroughs should be unencumbered. We are committed to giving you the breakthrough Data Management & AI/ML capabilities you need so you can create the breakthroughs you want – anywhere, anytime, and with anyone.

What Studio9 can do?

Reduce Your AI Workload 120x AI/ML

Studio9 provides a large inventory of building blocks from which you can stitch together custom AI and Data Engineering pipelines. Rapidly assemble and test many different pipelines to create the AI you need. Turn your data into AI with near-zero effort and cost. Since Studio9 is an open platform, newer cutting-edge AI building blocks that are emerging every day are put right at your fingertips.

Studio9 helps you find the breakthroughs hidden in your data

Studio9 streamlines your burden of wrangling data. With its continuously expanding portfolio of building blocks, Studio9 makes it easier for you to clean, integrate, enrich, and harmonize your data. Do it all within your own infinitely scalable database environment without any of the hassles of managing your own database.

Push-button Model Deployment

You now have the power to deploy and run your Data Processing pipelines, Models, and AI anywhere – from infinitely scalable Cloud computing infrastructure to your own laptop to ultra-low power edge computing devices – with no additional programming or engineering effort required. We designed Studio9 for deployment flexibility so you can build, train, share, and execute your AI anywhere you want.

Studio9 Flow diagram

How to deploy Studio9 on Local? AI/ML

So for deploying the Studio-9 on local, we have to understand the sequence of the services to be deployed. But before deployment of services, we need to see some prerequisites for application.

Prerequisites:

Mesos-Marathon Cluster Setup

Before you Begin:

  • Marathon : Here we need one more machine so for this, we will create a VM on the local machine by using Vagrant because the Mesos-marathon cluster work on master-slave architecture.
  • Vagarant : We will run Mesos-Master on base machine and Marathon as well as Mesos-Slave on VM.
  • Mesos-Slave : The process to run mesos-slave on the slave is the same as Mesos-Master only difference is the command we will use to run.
    • ./bin/mesos-slave.sh --master=:5050 --work_dir=/var/run/mesos --log_dir=/var/log/mesos -- containerizers=docker,mesos --image_providers=appc,docker --isolation=filesystem/linux,docker/runtime
Serial No.ServiceVersionReference
1Apache Zookeeper3.7.1Deploying Zookeeper on local
2Apache Mesos1.7.2Deploying Mesos on local
3Marathon1.5.0Deploying Marathon on local
4VagrantAnyDeploying Vagrant on local
5Mesos-SlaveThe process to run mesos-slave on slave is the same as specified above only difference is the command shown in the above block we will use to run.

Now, we will deploy the below services:

Before you begin:

  • All service deployment must be in order as shown in the table
Serial No.ServiceReference
1Elastic SearchDeploying Elastic Search on local
2MongoDBDeploying MongoDB on local
3RabbitMQDeploying RabbitMQ on local
4PostgressDeploying Postgres on local
5AriesDeploying Aries Service on local
6ArgoDeploying Argo Service on local
7OrionDeploying Orion Service on local
8CortexDeploying Cortex Service on local
9PegasusDeploying Pegasus Service on local
10TaurusDeploying Taurus Service on local
11UM-ServiceDeploying UM-Service on local
12BaileDeploying Baile Service on local
13SalsaDeploying Salsa Service on local

How to Create a docker image?

Step1: When we change something in the code then we need to build a new docker image.

Step 2 : We just need to run the below command to build the image from the dockerfile.

If you are in the same directory where you have the docker file.

 docker build -t <image_name>:<version> .

Example :-

docker build -t python:1.0 .

If you are building an image from the other side of your dockerfile’s Path then you can simply pass the path at the end of the command :

docker build -t <image_name>:<version> ./<PATH to file>

Example :-

docker build -t python:1.0 ./<PATH to dockerfile>

Step 3: First you should tag the image according to your preference:

docker tag <image_name>:<version> <user_name>/repo_name>:<version>

Example :-

docker tag python:latest username/python:1.0

Step 4: Now we can push the image to the docker hub or other container registry:

docker push username/python:1.0 

Step5 : Now we can change the image name in the code or where we are using this particular image.

How to deploy Studio9 using Docker-Compose?

We’ll be deploying Studio9 on local using a docker-compose file.

Prerequisites

  • OS: Ubuntu 16.04 LTS – 4vCPUs and 16GB memory.
  • Mesos-marathon Cluster
  • AWS account
  • AWS IAM
  • AWS S3 buckets
  • AWS S3 buckets accessible to AWS IAM
  • Docker should be installed on your local system.
  • If you don’t have docker installed in your system, kindly refer to this link
  • After successfully installing Docker, clone the Repository.
  • Run the Docker Compose file by running the below command:
sh docker-compose up -d

or

sh docker compose up -d
  • If you want to see the logs, use the below command:
sh docker-compose up
  • To stop the services, use the below commands:
sh docker compose down

NOTE: Use the above commands in the directory where the docker-compose file exists.

Explanation of Docker Compose

For running the Studio-9 on local, we are using docker-compose.

  • We are using a single network i.e. ‘studio9’ for all the services that’ll run for studio-9.
  • Here we have 17 services that will be deployed on local machine to run the Studio-9.
  • There are four volumes being used in Studio-9, three for elastic-search and one for MongoDB.
  • The elastic-search master node is accessible at port 9200.
  • Kibana service will run after the Elastic-search nodes are up and will be accessible at port 5601.
  • Mongo express service depends on mongo and will be accessible at 8081.
  • Zookeeper is using the same network i.e. ‘studio9’ and will be accessible 2181.
  • RabbitMQ is accessible at ports 5672 and 15672.
  • Next, we have the Aries service and it depends on Elastic-search nodes and will be accessible at 9000.
  • The Cortex service depends on Aries RabbitMQ and will be accessible at 9000.
  • The Argo service also depends on Elastic-search nodes and will be accessible at 9000.
  • Gemini service depends on zookeeper and sql-server and will be accessible at 9000.
  • Taurus service depends on RabbitMQ, Cortex, Baile, Argo, and Aries and will be accessible at 9000.
  • Orion service depends on Cortex, Zookeeper, and RabbitMQ and will be accessible at 9000.
  • Pegasus service depends on Taurus RabbitMQ and Postgres and will be accessible at 9000.
  • UM service depends on Mongo and will be accessible at 9000.
  • Baile service depends on Mongo, UM service, Aries, Cortex, SQL-server, and Zookeeper and will be accessible at 9000.
  • SQL-Server depends on UM Service and will be accessible at 9000.
  • Salsa service is responsible for the UI of Studio-9 and it depends on Baile with port 80.
  • Postgres service depends on postgres-db and will be accessible at 8080.

Appendix

Refer to our Git Repo for complete details of local setup of Studio9(AI/ML)

Refer to our Youtube channel to see live demo of Studio9

Written by 

Rahul Miglani is Vice President at Knoldus and heads the DevOps Practice. He is a DevOps evangelist with a keen focus to build deep relationships with senior technical individuals as well as pre-sales from customers all over the globe to enable them to be DevOps and cloud advocates and help them achieve their automation journey. He also acts as a technical liaison between customers, service engineering teams, and the DevOps community as a whole. Rahul works with customers with the goal of making them solid references on the Cloud container services platforms and also participates as a thought leader in the docker, Kubernetes, container, cloud, and DevOps community. His proficiency includes rich experience in highly optimized, highly available architectural decision-making with an inclination towards logging, monitoring, security, governance, and visualization.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading