Recently got the task to install DC/OS on our local cluster. Previously I used AWS to install DC/OS, but this time I had to do everything on our own, from installing CentOS to creating the DC/OS cluster. With this blog, I am going to share my experience that I faced in installing DC/OS. It took almost three days, but at the end I successfully installed it. It is now up and running, and I are using it for the testing environment but to bring it at current situation I faced many difficulties. So let’s begin with the starting.
I choose CentOS minimal because I want to get the maximum out of CentOS. I am familiar with Ubuntu’s environment, and I never worked on CentOS. Even though it is not a big issue but still I want to mention it as there are some commands which are different. I got total seven machines. Six machine contains four core and 8GB of RAM and one server with four core and 16GB of RAM. Every machine is connected through LAN network with
Before starting with DC/OS installation, I configured all prerequisite. Updated packages of all nodes. To keep it simple I configured each node with single pem file and disable password login(Habit).
Now let’s move the part which matters the most the installation of DC/OS.
DC/OS required 1 node with 2 cores, 16 GB RAM, 60 GB HDD for Bootstrap node, 1,3 or 5 nodes with 4 cores, 32 GB RAM, 120 GB HDD for Master node and at least 1 node with 2 cores, 16 GB RAM, 60 GB HDD for agent nodes. So I choose server as Master (1 master), 1 machine as bootstrap and rest machines as agent nodes.
We need to disable and stop the firewall. It is a known Docker issue that firewalld interacts poorly with Docker. To view, the issue checks GitHub issue https://github.com/docker/docker/issues/16137 .
To disable firewall DC/OS provided command:
|sudo systemctl stop firewalld && sudo systemctl disable firewalld|
Then I followed the further instructions which are making sure.
/opt/mesosphere must be in the same mountpoint as /
Do not remotely mount /var/lib/mesos
Do not mount /tmp with noexec.
Secure Shell (SSH) must be enabled on all nodes.
Internet Control Message Protocol (ICMP) must be enabled on all nodes.
Each node is network accessible from the bootstrap node.
Each node has unfettered IP-to-IP connectivity from itself to all nodes in the DC/OS cluster.
UDP must be open for ingress to port 53 on the masters. To attach to a cluster, the Mesos agent node service (dcos-mesos-slave) uses this port to find leader.mesos
After all that I installed docker on every node. The supported versions of Docker are 1.13.x, 1.12.x, and 1.11.x so I installed ____ version of docker. According to requirements, we need to disable password prompt for sudo but I was using root user so no problem of sudo for me.
We also need to enable NTP which I made sure to enable at the time of installation of CentOS. Now all of the machines are ready, so I moved to GUI Installation(GUI for ease :p). So to install DC/OS using their GUI interface we need to follow the steps provided in the documentation.
To start first, we need to download shell script on the bootstrap node using the following command. It took few minutes.
|curl -O https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh|
After download executes the hell script with –web flag. Then it started on the bootstrap node on port 9000. To begin installation, open the installer on the browser by entering http:<bootstrap-node-ip>:9000 . There are three different steps to install DC/OS are PreFlight, Deploy and PostFlight. First It starts with entering Deployment Settings which are Master Private IP List, Agent Private IP List, Agent Public IP List, Master Public IP. SSH Username, SSH Listening Port, and Private SSH Key. Also, we need to add DC/OS Environment Settings which are Upstream DNS Server and IP Detect Script. The IP detect script prints the unique IPv4 address of a node to STDOUT each time DC/OS is started on the node. Problem with my cluster was there are different network interface name for every machine. So I had to write my c script which is quite same as the given example script the only difference is it can retrieve IP from all node of my cluster.
After all, done I began installation, PreFlight and Deploy worked fine but on PostFlight, it gave me the error of “Error executing DC/OS components health check: exit status 127” which makes me wonder whether I skipped anything or did something wrong. So I tried again but failed again on the same error. I tried searching solution for online but every time I reached nowhere. Then I decided to raise the query on DC/OS’s slack channel. They told me to use Advance installer instead of GUI so again I repeated all the steps from the beginning.
With the advance installer we need to do few more task like pulling nginx docker image, installed data compression utilities on every node which are tar, xz, unzip, curl and ipset and On each of your cluster nodes disabled SELinux or set it to permissive mode, added nogroup to each Mesos masters and agents and Reboot cluster for the changes to take effect.
Also with the advance installer, we need to create a config.yaml file by our own which was previously created by GUI Installer itself. I used the documentation provided by DC/OS to create a config.yaml inside the genconf directory which also contains the IP-detect file. Once I am done with all required setups and files, I started installing DC/OS again but with the advance installer. To begin the installation, we need to run the shell script that we downloaded previously. After this step my directory resembles like:
│ ├── config.yaml
│ ├── ip-detect
After that, I followed every step mentioned in the documentation from hosting DC/OS install packages through an NGINX Docker container to installing DC/OS on every node.
It took few minutes, but I was able to install DC/OS successfully.