DC/OS: The Architecture #2

Reading Time: 5 minutes

In my previous post, we have gone through the type of nodes, tasks and the distributed process management. Now we’re going under the hood to explore the components which make DC/OS what it is.

You’ll find the basic working of each component here. Let me show you around…


First, let’s check out the map we’re going to follow in order to get where we want to. Although, we’re only going to take a stroll here still one should be aware of where they are.

The following are the floors of this huge establishment. There are 3 floors.

We have 3 layers :

  • Software Layer – The place where all the services are managed. Installation of services through packages is done here. A user may also install their own apps here. Examples of services – Databases, Log aggregators, CI tools, SCM etc.
  • Platform Layer – Who makes the puppets dance? Puppeteer. To provide the services with basic services like an OS on your machine. All these components do the in this layer do the same thing and This is what we will extend to discuss in this post.
  • Infrastructure Layer – This is your cluster. The collection of machines where DC/OS runs. It can be on a cloud or a physical Machine.

You put your application on Software Layer, Platform Layer will provide basic support to your application and provide you with stats to monitor app’s performance & Infrastructure Layer is where your DC/OS will run.

In order to Use DC/OS, User’s have the GUI or the CLI conduit through which DC/OS can be accessed.

Let’s focus on the platform layer and see how it behaves as if it’s an OS.

Platform Layer

Platform layer is different in each type of node i.e Master, Private, & Public Nodes. There are some common components as we can see from the following diagram…

Let’s check out every room on floor #2.

Just like an Operating System, This layer performs a similar operation in order to provide the essential support required for an application to run in an environment. However, this must be noted that DC/OS is not an Operating System. Hence it cannot be installed like one in any device and be operated alone it requires an OS like Linux.

Cluster Management

This component allows the various machines (physical/virtual) to be treated as the collection of machines present together like a cluster. This component hides the backstage work done by Mesos by providing high-level abstraction, interfaces & tools. Cluster management is the core.
Cluster management is done with the help of several services running. The installer is one such service which helps the user when they install any packages from any repository. GUI & CLI  are conduits through a user can manage their clusters. To provide backup & restoration of DC/OS components we have Backup. Apache Mesos allows for management of resources, tasks & provides operator interfaces to ease cluster management. Zookeeper, supervised by Exhibitor to ensure consistency of Zookeeper, provides Key-value storage for cluster configuration.

Cluster Orchestration

Everything which runs in DC/OS is a containerized task, therefore, management of these tasks and resources they use is done here. Orchestration of most frequently used container based jobs is provided too.

The orchestration is based on job duration so, in that perspective, we have Marathon which orchestrates long-lived containerized services. It will start your application, monitor it for any failures and heal them automatically. Metronome orchestrates short-lived, scheduled or immediate containerized jobs. These jobs are created by following a JSON format.

Container Runtimes

Since most of the jobs in DC/OS are containerized. We require a runtime for those jobs. For ex. Docker Engine is the runtime to run containers. These runtimes isolated operating system level environment.

Universal Container Runtime(Mesos Containerizer) containerizes tasks with a configurable isolated environment. UCR supports multiple image formats including Docker Images without using Docker Engine. Although Docker Engine is included with orchestration for docker images. Docker Containerizer is also provided to delegates containerization of Mesos tasks to Docker Engine. Docker GC is the garbage collector running periodically for Docker containers and images.

Logging & Metrics

User applications require being monitored especially if they are running for the first time. So We have the provision of logs which provide us with ample amount of information which help in debugging.

Network Metrics provides network related information. Diagnostics explains component health status. Log exposes node, component, & task information. Logrotate compresses and deletes historical files. We have metrics about nodes, tasks & component which are provided by Metrics. DC/OS also collects usage data and analytics to improve itself which is done by Signal. History of usage and service cache are maintained by History.


Since we are dealing with clusters which consist of machines present remotely and to recognize them we have IP addresses so in order to simplify this problem we have named them. Now we like DNS like service for addressing these nodes. Also, we need proxying, load balancing, Virtual IPs etc.

Admin Router gives a united control panel for components and services for load-balancing, node-specific health, logging etc. Mesos DNS is domain name based service discovery within the cluster. DC/OS Net is an Erlang-based VM that hosts 3 types of services. dcos-dns DNS based service discovery.  dcos-overlay is software based DNS service for UCR and  Docker Containers. A 4-Load-balancer distributed layer called dcos-l4lb. Generate resolve.conf configures network name resolution by updating the /etc/resolve.conf

Package Management

Manages installation, upgrade, configuration & removal of application and services. There are two types of packages: machine-level for components & cluster-level for user services.

Cosmos(Package Manager) install & manages DC/OS packages from package repos such as Mesosphere Universe. For packages we have Pkgpanda installs and manages DC/OS components.

Sockets & Timers

Several components can configure for an on-demand start rather than running continuously and eating up all your resources. These components are configured to used systemd sockets. For periodic execution or restart after a particular requires the use of a timer which will trigger such action. Periodic restarting may be required to pick up the latest dependencies. For such periodic restarts, we rely on systemd timers.


DC/OS provides multiple ways to provide and allocate disk space. One of those methods, external volumes, is managed by its own component.

REX-RAY orchestrates provisioning, attachment, & mounting of external persistent volumes.

IAM & Security [Enterprise Feature]

With the help of an internal database of users, user groups, and permissions identity and access are managed. External Identity providers can also be attached to take advantage of that database.

DC/OS IAM (Bouncer) controls access to DC/OS components and services by managing users, user groups, service accounts, permissions, & Identity providers. Examples of External ID providers are: LDAP, SAML, or OpenID connect. CockroachDB is the database on which Bouncer relies. CockroachDB is a distributed SQL database built on transactional & strongly consistent key-value storage.  Certification Authority issues signed digital certificates for secure communication. A vault is a tool for securely managing secrets. Secrets are meant for tight access control. Vault provides a unified interface to any secret. DC/OS Secret is API which makes storing and retrieving in vaults easy for the user.

Here ends our journey and we go to know how DC/OS serves as the OS for the datacenter. I hope you find it useful.


Written by 

Software Consultant