How to get the technical details for Studio9

Reading Time: 3 minutes

Studio9 TechStack Details

Studio9 Main Modules

Studio9 Technology Used

Studio9 Problem Statement

Studio9 Milestones

Studio9 Result

Studio9
Studio9

TechStack Details

Main Modules

  1. Library: An Asset Management Module where users can manage all their assets.
  2. Compose: A Pipeline Composition Module where users can construct their ETL, Data Engineering, and ML Pipelines. 
  3. Lab: A Pipeline Execution and Experimentation Module where users can run / execute their Pipelines
  4. Code: An IDE environment where users can write their own custom code, e.g. create and publish custom operators for use in Studio9 Pipelines.

Technology Used for Studio9

Development Language
Scala 2.11.X

Framework – 

Akka 2.5.X
Akka HTTP 10.0.X

Angular 7.3.X

Akka persistence 2.2.X

Supporting Stack Libraries
Akka Stream, Akka Actors, Akkahttp,SprayJson, Akkaslf4j, Akkahttpcors, javaXmail, Akka Cluster

Database: MongoDB, SQL, Redshift

Build Tool: Scala SBT 0.13.X

Code Quality Tools:

ScalaStyle – 

“org.scalastyle” %% “scalastyle-sbt-plugin” % “1.0.0”

Sbt Coverage –

“org.scoverage” % “sbt-scoverage” % “1.5.1”

Native Packager –

“com.typesafe.sbt” % “sbt-native-packager” % “1.3.21”

Problem Statement Or Objective for Studio9

“Description of technology problems and what we are trying to solve e.g: scalability and security”

  • To solves the scalability, accessibility, and rapid obsolescence problem of AI that results from the runaway costs when AI must be performed at enterprise scale and each AI model loses efficacy due to the blistering pace of algorithm innovation.
  • Studio9 gives the flexibility to create AI and data engineering pipelines wherever data is.
  • Studio9 provides a large inventory of building blocks from which we can stitch together the custom AI and Data Engineering pipelines required.
  • Since, Studio9 is an open platform, newer cutting-edge AI building blocks that are emerging every day put right at your fingertips.

Milestones

“Description of the milestones How we attempted to resolve it and what is the outcome at the end of the fiscal year?”

Phase 1: Development of Basic Version of Studio9

  • Development of Basic components of Studio 9 as listed below 

1. ORION – A service further consisting of three components namely Job Dispatcher, Job Supervisor and Job Resource Cleaner. Job Dispatcher mainly forwards messages from RabbitMQ to the proper Job Supervisor, instantiating it for each new job request.

Job Supervisor is responsible for instantiating job master for each new job which will have a new job supervisor setup. Job Resource Cleaner consumes messages from RabbitMQ and spins a new JobResourcesCleanerWorker for handling each message which then executes tasks for cleaning the resources. 

2. ARIES – A microservice that allows read/write access to ElasticSearch. It stores Job Metadata,   Heartbeats and Job Results in ElasticSearch as documents. 

3. TAURUS – This service works as a message dispatcher using SQS/SNS.

4. BAILE – It receives messages from the UI service called Salsa and then sends them to Cortex if its not Online Prediction. In case of Online Prediction, Salsa sends messages to Taurus which then sends them to Cortex.

5. ARGO – A service designed to capture all configuration parameters for all job types or services. These parameters are saved by Argo in ElasticSearch. 

6. PEGASUS – A prediction storage service that receives messages from Taurus via Orion to upload data to RedShift. The messages contain metadata for online prediction job and CSV file with prediction results. 

  • Development of Different Assets of Studio 9 such as
  1. Data Assets: Data Assets in Studio9 are stored as 1 of 3 asset types – Albums, Tables, and Binary Datasets.
  2. Model Assets: Models created within or uploaded to Studio9 are stored in the asset types Models, CV Models.
  3. Experimentation Assets

Phase 2 – Deployment of Studio9 on Cloud. 

  • Studio 9 deployed on DC/OS environment.
  • Then Migrated the studio9 to AWS EKS cluster and made it available for public access.
  • By migration we achieved High availability of studio9 , Highly secured enviroment as well as cost effective.
  • We have also worked on to deploy studio 9 on local environment with minimal setup and hardware requirements.

Phase3: Enhancement of Studio 9.

  • We have made studio9 customizable based on the requirements of user.
  • Provided a feature of customize IDE.
  • Enhanced pipeline feature and provided option to customize pipelines as user wants.
  • Also we have some more features which can be customized depending on the requirement of user.

Challenges And/Or Uncertainty 

  • Deployment 
  • Previously Studio 9 deployed on DC/OS environment but this infrastructure proved non viable for studi9 in terms of high availability as well as costing.
  • We looked into possible solutions and then finally experimented with the AWS cloud deployment with Kubernetes Infrastructure. 
  • Through this migration not only we have made studio 9 highly available for public access but also smoother working of studio 9 within a half cost than previous deployment.
  • Proprietorship
    • The code published on a public website, i.e available for download is still maintained by knoldus but we are not sure if we can have a or get Proprietorship on it? Still need to look at this context.

Result 

An open source platform for doing collaborative Data Management & AI/ML anywhere we want. 

Studio 9  – http://salsa.s9.devopscloud.link/

Written by 

Rahul Miglani is Vice President at Knoldus and heads the DevOps Practice. He is a DevOps evangelist with a keen focus to build deep relationships with senior technical individuals as well as pre-sales from customers all over the globe to enable them to be DevOps and cloud advocates and help them achieve their automation journey. He also acts as a technical liaison between customers, service engineering teams, and the DevOps community as a whole. Rahul works with customers with the goal of making them solid references on the Cloud container services platforms and also participates as a thought leader in the docker, Kubernetes, container, cloud, and DevOps community. His proficiency includes rich experience in highly optimized, highly available architectural decision-making with an inclination towards logging, monitoring, security, governance, and visualization.

Leave a Reply