Apache Mesos is open source project for managing computer clusters originally developed at the University Of California. It sits between the application layer and operating system to manage the application works efficiently on the large-scale distributed environment.
In this blog, we will see how to setup mesos client and master on ec2 from scratch.
Step1: Launch ec2 with the configuration below :
Ami Server: Ubuntu server (ami-41e0b93b)
Instance Type: Minimum requirement for Mesos is t2.medium.
Network: VPC default
Subnet: Choose any area of us-east
Number Of Instances: Depends On requirement.
Security Group: Make sure all the instance uses the same security group to make configuration easier.
Monitoring: Depends on usage.
Before launching an instance a pem file needs to be downloaded or you may choose your own key-pair for accessing the instance.
Now, let’s do a password less ssh from master machine to all slaves machine
// copy and changing permission for pem file to master machine from local scp -i pem-file.pem ubuntu@:/home/ubuntu // change Permission of pem file to 400 ssh -i ubnutu@ chmod 400 /home/ubuntu/pem-file.pem // generate keys on master as well as on all slaves by ssh command and keygen from local machine ssh -i ubuntu@ ssh-keygen -t rsa // copy public key from master to all slaves cat .ssh/id_rsa.pub | ssh -i pem_file.pem ubuntu@"cat >> .ssh/authorized_keys" cat .ssh/id_rsa.pub | ssh -i pem_file.pem ubuntu@"cat >> .ssh/authorized_keys"
Step2: Install & Configure Hadoop
Download Hadoop on all machines. For this first ssh to a machine or if you are familiar with any configuration management tool like Ansible, you may use that.
// download the hadoop archive on all machines ssh -i ubuntu@ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz
Now we have to configure Master and Slave and for this we need to first map IP-address with host name in the hosts file located in /etc folder.
<private ip of master> hadoop-master <private ip of slave1> hadoop-slave-1 <private ip of slave2> hadoop-slave-2
step3 :
Next, in the master node, set slaves hostnames in $HADOOP_HOME/etc/hadoop/slaves file and also remember to remove localhost entry from the slaves file
<private ip of slave1> hadoop-slave-1 <private ip of slave2> hadoop-slave-2
step4: configure core-site of master
Set $HADOOP_HOME in bashrc as:
export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin
Create a directory named hadoop_data in opt folder
Note:Make sure that hadoop_data folder in opt folder is owned by hduser and its permissions should be 777
Inside core-site.xml put these properties
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop_data</value> <description>directory for hadoop data</description> </property> <property> <name>fs.default.name</name> <value>hdfs://hadoop-master:54311</value> <description> data to be put on this URI</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop-master:54311</value> <description>Use HDFS as file storage engine</description> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
Step5: Configure Spark
Install Spark
export $SPARK_HOME=PATH WHERE SPARK IS INSTALLED
Copy your core-site.xml file from $HADOOP_HOME/etc/hadoop and put it
in $SPARK_HOME/conf folder
To run Hadoop, go to $HADOOP_HOME in the master and run:
hadoop namenode -format //Next, start dfs by executing the command: $HADOOP_HOME/sbin/start-dfs.sh
Step6: Configure Mesos :
Download and build Mesos on all machines.
ssh -i ubnutu@Hostname wget http://www-us.apache.org/dist/mesos/1.4.1/mesos-1.4.1.tar.gz sudo apt-get -y install build-essential python-dev python-six python-virtualenv libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev sudo apt install zlib1g-dev tar -xvzf mesos-1.4.1.tar.gz cd mesos-1.4.1 mkdir build cd build build$../configure make install
The above commands will create libmesos.so file in /usr/local/lib folder.
Before running the master and slaves of mesos cluster from the instance make sure that all the required ports with the incoming request from master to slave are configured in the security group.
Ports shown below should be accessible anywhere and provide all port access of master-ip to its slaves as spark worker chooses any random port for it.
step 7. Run Mesos Master and Slave
- To run master type this command on master machine inside mesos directory
./build/bin/mesos-master.sh --ip=masterip --work_dir=/tmp/mesos
- To run slaves type this command on all slaves
sudo ./build/bin/mesos-slave.sh --hadoop_home=$HADOOP_HOME --master=hadoop-master:5050 --ip=slaveip --work_dir=/tmp/mesos
step 8: Start spark submit and give master url as mesos:
Go to $SPARK_HOME/bin folder and start spark shell using following command
./spark-shell –-master mesos://hadoop-master:5050
After completing the above steps you can execute a command in spark-shell to verify everything is working fine.