Studio9 TechStack Details
Studio9 Main Modules
Studio9 Technology Used
Studio9 Problem Statement
- Library: An Asset Management Module where users can manage all their assets.
- Compose: A Pipeline Composition Module where users can construct their ETL, Data Engineering, and ML Pipelines.
- Lab: A Pipeline Execution and Experimentation Module where users can run / execute their Pipelines
- Code: An IDE environment where users can write their own custom code, e.g. create and publish custom operators for use in Studio9 Pipelines.
Technology Used for Studio9
Development Language –
Akka HTTP 10.0.X
Akka persistence 2.2.X
Supporting Stack Libraries –
Akka Stream, Akka Actors, Akkahttp,SprayJson, Akkaslf4j, Akkahttpcors, javaXmail, Akka Cluster
Database: MongoDB, SQL, Redshift
Build Tool: Scala SBT 0.13.X
Code Quality Tools:
“org.scalastyle” %% “scalastyle-sbt-plugin” % “1.0.0”
Sbt Coverage –
“org.scoverage” % “sbt-scoverage” % “1.5.1”
Native Packager –
“com.typesafe.sbt” % “sbt-native-packager” % “1.3.21”
Problem Statement Or Objective for Studio9
“Description of technology problems and what we are trying to solve e.g: scalability and security”
- To solves the scalability, accessibility, and rapid obsolescence problem of AI that results from the runaway costs when AI must be performed at enterprise scale and each AI model loses efficacy due to the blistering pace of algorithm innovation.
- Studio9 gives the flexibility to create AI and data engineering pipelines wherever data is.
- Studio9 provides a large inventory of building blocks from which we can stitch together the custom AI and Data Engineering pipelines required.
- Since, Studio9 is an open platform, newer cutting-edge AI building blocks that are emerging every day put right at your fingertips.
“Description of the milestones How we attempted to resolve it and what is the outcome at the end of the fiscal year?”
Phase 1: Development of Basic Version of Studio9
- Development of Basic components of Studio 9 as listed below
1. ORION – A service further consisting of three components namely Job Dispatcher, Job Supervisor and Job Resource Cleaner. Job Dispatcher mainly forwards messages from RabbitMQ to the proper Job Supervisor, instantiating it for each new job request.
Job Supervisor is responsible for instantiating job master for each new job which will have a new job supervisor setup. Job Resource Cleaner consumes messages from RabbitMQ and spins a new JobResourcesCleanerWorker for handling each message which then executes tasks for cleaning the resources.
2. ARIES – A microservice that allows read/write access to ElasticSearch. It stores Job Metadata, Heartbeats and Job Results in ElasticSearch as documents.
3. TAURUS – This service works as a message dispatcher using SQS/SNS.
4. BAILE – It receives messages from the UI service called Salsa and then sends them to Cortex if its not Online Prediction. In case of Online Prediction, Salsa sends messages to Taurus which then sends them to Cortex.
5. ARGO – A service designed to capture all configuration parameters for all job types or services. These parameters are saved by Argo in ElasticSearch.
6. PEGASUS – A prediction storage service that receives messages from Taurus via Orion to upload data to RedShift. The messages contain metadata for online prediction job and CSV file with prediction results.
- Development of Different Assets of Studio 9 such as
- Data Assets: Data Assets in Studio9 are stored as 1 of 3 asset types – Albums, Tables, and Binary Datasets.
- Model Assets: Models created within or uploaded to Studio9 are stored in the asset types Models, CV Models.
- Experimentation Assets
Phase 2 – Deployment of Studio9 on Cloud.
- Studio 9 deployed on DC/OS environment.
- Then Migrated the studio9 to AWS EKS cluster and made it available for public access.
- By migration we achieved High availability of studio9 , Highly secured enviroment as well as cost effective.
- We have also worked on to deploy studio 9 on local environment with minimal setup and hardware requirements.
Phase3: Enhancement of Studio 9.
- We have made studio9 customizable based on the requirements of user.
- Provided a feature of customize IDE.
- Enhanced pipeline feature and provided option to customize pipelines as user wants.
- Also we have some more features which can be customized depending on the requirement of user.
Challenges And/Or Uncertainty
- Previously Studio 9 deployed on DC/OS environment but this infrastructure proved non viable for studi9 in terms of high availability as well as costing.
- We looked into possible solutions and then finally experimented with the AWS cloud deployment with Kubernetes Infrastructure.
- Through this migration not only we have made studio 9 highly available for public access but also smoother working of studio 9 within a half cost than previous deployment.
- The code published on a public website, i.e available for download is still maintained by knoldus but we are not sure if we can have a or get Proprietorship on it? Still need to look at this context.
An open source platform for doing collaborative Data Management & AI/ML anywhere we want.
Studio 9 – http://salsa.s9.devopscloud.link/