Table of contents

Reading Time: 6 minutes

What is Studio-9?

Studio9 is an open source platform for doing collaborative Data Management & AI/ML anywhere Whether your data is trapped in silos or you’re generating data at the edge, Studio9 gives you the flexibility to create AI and data engineering pipelines wherever your data is. And you can share your AI, Data, and Pipelines with anyone anywhere. With Studio9, you can achieve newfound agility to effortlessly move between compute environments, while all your data and your work replicates automatically to wherever you want.

Below are described the major components of Studio-9. AI/ML

1. ORION – A service further consists of three components namely Job Dispatcher, Job Supervisor, and Job Resource Cleaner. Job Dispatcher mainly forwards messages from RabbitMQ to the proper Job Supervisor, instantiating it for each new job request. The job Supervisor is responsible for instantiating the job master for each new job which will have a new job supervisor setup. Job Resource Cleaner consumes messages from RabbitMQ and spins a new JobResourcesCleanerWorker for handling each message which then executes tasks for cleaning the resources.

2. ARIES – A microservice that allows read/write access to ElasticSearch. It stores Job Metadata, Heartbeats, and Job Results in ElasticSearch as documents.

3. TAURUS – This service works as a message dispatcher using SQS/SNS.

4. BAILE – It receives messages from the UI service called Salsa and then sends them to Cortex if it’s not Online Prediction. In the case of Online Prediction, Salsa sends messages to Taurus which then sends them to Cortex.

5. ARGO – A service designed to capture all configuration parameters for all job types or services. These parameters are saved by Argo in ElasticSearch.

6. PEGASUS – A prediction storage service that receives messages from Taurus via Orion to upload data to RedShift. The messages contain metadata for the online prediction job and CSV file with prediction results.

What are use cases? AI/ML

Computational Data Core that Automatically Scales and Adapts to You AI/ML

Imagine never having to worry about how to keep your data organized, keep track of how, when, and where it was manipulated, keep track of where it came from, or keep track of all its meta-data. Now imagine being able to effortlessly and securely share your data and its lineage with your colleagues. Finally, imagine being able to do any Analytics or Machine Learning right where your data is. The Studio9 Computational Data Core makes this all possible.

The Data Science Replication Engine

Every step you perform in the Analytics & AI Lifecycle results in a valuable asset – a snippet of code, or a data transformation pipeline, or a table of newly engineered data, or an album of images or a new algorithm. Imagine having the power to instantly use any asset anyone creates to build bigger and better AI models that constantly expand your power to generate breakthroughs. Studio9 gives your team the frictionless ability to organize, track, share, and re-use all your Analytics & AI assets.

Automated Model Governance & Compliance AI/ML

Studio9 allows Model Risk Management, Regulatory Constraints, and Documentation Policies that your models must abide by to be encoded right into Pipeline and automatically reproduced every time a model is refreshed by Studio9. This includes Model Explainability, Model Fairness & Bias Analytics, Model Uncertainty, and Model Drift analytics – all of which are performed automatically. We don’t think AI makes machines smarter. It exists to make you smarter. The easier it is for you to make AI, the greater your ability to make breakthroughs. Whether you have unlimited compute resources in the cloud, or you are limited at the edge, your ability to make breakthroughs should be unencumbered. We are committed to giving you the breakthrough Data Management & AI/ML capabilities you need so you can create the breakthroughs you want – anywhere, anytime, and with anyone.

What Studio9 can do?

Reduce Your AI Workload 120x AI/ML

Studio9 provides a large inventory of building blocks from which you can stitch together custom AI and Data Engineering pipelines. Rapidly assemble and test many different pipelines to create the AI you need. Turn your data into AI with near-zero effort and cost. Since Studio9 is an open platform, newer cutting-edge AI building blocks that are emerging every day are put right at your fingertips.

Studio9 helps you find the breakthroughs hidden in your data

Studio9 streamlines your burden of wrangling data. With its continuously expanding portfolio of building blocks, Studio9 makes it easier for you to clean, integrate, enrich, and harmonize your data. Do it all within your own infinitely scalable database environment without any of the hassles of managing your own database.

Push-button Model Deployment

You now have the power to deploy and run your Data Processing pipelines, Models, and AI anywhere – from infinitely scalable Cloud computing infrastructure to your own laptop to ultra-low power edge computing devices – with no additional programming or engineering effort required. We designed Studio9 for deployment flexibility so you can build, train, share, and execute your AI anywhere you want.

Studio9 Flow diagram

How to deploy Studio9 on Local? AI/ML

So for deploying the Studio-9 on local, we have to understand the sequence of the services to be deployed. But before deployment of services, we need to see some prerequisites for application.

Prerequisites:

OS: Ubuntu 16.04 LTS – 4vCPUs and 16GB memory.
Mesos-marathon Cluster
AWS account
AWS IAM Role
AWS S3 buckets
AWS S3 buckets accessible to AWS IAM

Mesos-Marathon Cluster Setup

Before you Begin:

Marathon : Here we need one more machine so for this, we will create a VM on the local machine by using Vagrant because the Mesos-marathon cluster work on master-slave architecture.
Vagarant : We will run Mesos-Master on base machine and Marathon as well as Mesos-Slave on VM.
Mesos-Slave : The process to run mesos-slave on the slave is the same as Mesos-Master only difference is the command we will use to run.
- ./bin/mesos-slave.sh --master=:5050 --work_dir=/var/run/mesos --log_dir=/var/log/mesos -- containerizers=docker,mesos --image_providers=appc,docker --isolation=filesystem/linux,docker/runtime

Serial No.	Service	Version	Reference
1	Apache Zookeeper	3.7.1	Deploying Zookeeper on local
2	Apache Mesos	1.7.2	Deploying Mesos on local
3	Marathon	1.5.0	Deploying Marathon on local
4	Vagrant	Any	Deploying Vagrant on local
5	Mesos-Slave		The process to run mesos-slave on slave is the same as specified above only difference is the command shown in the above block we will use to run.

Now, we will deploy the below services:

Before you begin:

All service deployment must be in order as shown in the table

Serial No.	Service	Reference
1	Elastic Search	Deploying Elastic Search on local
2	MongoDB	Deploying MongoDB on local
3	RabbitMQ	Deploying RabbitMQ on local
4	Postgress	Deploying Postgres on local
5	Aries	Deploying Aries Service on local
6	Argo	Deploying Argo Service on local
7	Orion	Deploying Orion Service on local
8	Cortex	Deploying Cortex Service on local
9	Pegasus	Deploying Pegasus Service on local
10	Taurus	Deploying Taurus Service on local
11	UM-Service	Deploying UM-Service on local
12	Baile	Deploying Baile Service on local
13	Salsa	Deploying Salsa Service on local

How to Create a docker image?

Step1: When we change something in the code then we need to build a new docker image.

Step 2 : We just need to run the below command to build the image from the dockerfile.

If you are in the same directory where you have the docker file.

 docker build -t <image_name>:<version> .

Example :-

docker build -t python:1.0 .

If you are building an image from the other side of your dockerfile’s Path then you can simply pass the path at the end of the command :

docker build -t <image_name>:<version> ./<PATH to file>

Example :-

docker build -t python:1.0 ./<PATH to dockerfile>

Step 3: First you should tag the image according to your preference:

docker tag <image_name>:<version> <user_name>/repo_name>:<version>

Example :-

docker tag python:latest username/python:1.0

Step 4: Now we can push the image to the docker hub or other container registry:

docker push username/python:1.0

Step5 : Now we can change the image name in the code or where we are using this particular image.

How to deploy Studio9 using Docker-Compose?

We’ll be deploying Studio9 on local using a docker-compose file.

Prerequisites

OS: Ubuntu 16.04 LTS – 4vCPUs and 16GB memory.
Mesos-marathon Cluster
AWS account
AWS IAM
AWS S3 buckets
AWS S3 buckets accessible to AWS IAM
Docker should be installed on your local system.
If you don’t have docker installed in your system, kindly refer to this link
After successfully installing Docker, clone the Repository.
Run the Docker Compose file by running the below command:

sh docker-compose up -d

sh docker compose up -d

If you want to see the logs, use the below command:

sh docker-compose up

To stop the services, use the below commands:

sh docker compose down

NOTE: Use the above commands in the directory where the docker-compose file exists.

Explanation of Docker Compose

For running the Studio-9 on local, we are using docker-compose.

We are using a single network i.e. ‘studio9’ for all the services that’ll run for studio-9.
Here we have 17 services that will be deployed on local machine to run the Studio-9.
There are four volumes being used in Studio-9, three for elastic-search and one for MongoDB.
The elastic-search master node is accessible at port 9200.
Kibana service will run after the Elastic-search nodes are up and will be accessible at port 5601.
Mongo express service depends on mongo and will be accessible at 8081.
Zookeeper is using the same network i.e. ‘studio9’ and will be accessible 2181.
RabbitMQ is accessible at ports 5672 and 15672.
Next, we have the Aries service and it depends on Elastic-search nodes and will be accessible at 9000.
The Cortex service depends on Aries RabbitMQ and will be accessible at 9000.
The Argo service also depends on Elastic-search nodes and will be accessible at 9000.
Gemini service depends on zookeeper and sql-server and will be accessible at 9000.
Taurus service depends on RabbitMQ, Cortex, Baile, Argo, and Aries and will be accessible at 9000.
Orion service depends on Cortex, Zookeeper, and RabbitMQ and will be accessible at 9000.
Pegasus service depends on Taurus RabbitMQ and Postgres and will be accessible at 9000.
UM service depends on Mongo and will be accessible at 9000.
Baile service depends on Mongo, UM service, Aries, Cortex, SQL-server, and Zookeeper and will be accessible at 9000.
SQL-Server depends on UM Service and will be accessible at 9000.
Salsa service is responsible for the UI of Studio-9 and it depends on Baile with port 80.
Postgres service depends on postgres-db and will be accessible at 8080.