A Step-by-step guide for setting MultiNode Mesos Cluster with Spark and Hdfs on EC2

Table of contents
Reading Time: 3 minutes

Apache Mesos is open source project for managing computer clusters originally developed at the University Of California. It sits between the application layer and operating system to manage the application works efficiently on the large-scale distributed environment.

In this blog, we will see how to setup mesos client and master on ec2 from scratch.

Step1: Launch ec2 with the configuration below :

Ami Server: Ubuntu server (ami-41e0b93b)
Instance Type: Minimum requirement for Mesos is t2.medium.
Network: VPC default
Subnet: Choose any area of us-east
Number Of Instances: Depends On requirement.
Security Group: Make sure all the instance uses the same security group to make configuration easier.
Monitoring: Depends on usage.

1

Before launching an instance a pem file needs to be downloaded or you may  choose your own key-pair for accessing the instance.
Now, let’s do a password less ssh from master machine to all slaves machine

// copy and changing permission for pem file to master machine from local
scp -i  pem-file.pem ubuntu@:/home/ubuntu
// change Permission of pem file to 400
ssh -i  ubnutu@
chmod 400 /home/ubuntu/pem-file.pem 
// generate keys on master as well as on all slaves by ssh command and keygen from local machine
ssh -i  ubuntu@
ssh-keygen -t rsa 
// copy public key from master to all slaves 
cat .ssh/id_rsa.pub | ssh -i pem_file.pem ubuntu@"cat >> .ssh/authorized_keys"
cat .ssh/id_rsa.pub | ssh -i pem_file.pem ubuntu@"cat >> .ssh/authorized_keys"

Step2: Install & Configure Hadoop
Download Hadoop on all machines. For this first ssh to a machine or if you are familiar with any configuration management tool like Ansible, you may use that.

// download the hadoop archive on all machines
ssh -i  ubuntu@
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz

Now we have to  configure Master and Slave and for this we need to first  map IP-address with host name  in the hosts file located in /etc folder.

<private ip of master> hadoop-master
<private ip of slave1> hadoop-slave-1
<private ip of slave2> hadoop-slave-2

step3 :

Next, in the master node, set slaves hostnames in $HADOOP_HOME/etc/hadoop/slaves file and also remember to remove localhost entry from the slaves file

<private ip of slave1> hadoop-slave-1
<private ip of slave2> hadoop-slave-2

step4: configure core-site of master
Set $HADOOP_HOME in bashrc as:

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

Create a directory named hadoop_data in opt folder

Note:Make sure that hadoop_data folder in opt folder is owned by hduser and its permissions should be 777

Inside core-site.xml put these properties

 
  <configuration>
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/opt/hadoop_data</value>
 <description>directory for hadoop data</description>
 </property>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://hadoop-master:54311</value>
 <description> data to be put on this URI</description>
 </property>
 <property>
 <name>fs.defaultFS</name>
 <value>hdfs://hadoop-master:54311</value>
 <description>Use HDFS as file storage engine</description>
 </property>
 <property>
 <name>dfs.permissions</name>
 <value>false</value>
 </property>
</configuration>

Step5: Configure Spark

Install Spark

export $SPARK_HOME=PATH WHERE SPARK IS INSTALLED

Copy your  core-site.xml file from $HADOOP_HOME/etc/hadoop and put it

in $SPARK_HOME/conf folder

To run Hadoop, go to $HADOOP_HOME in the master and run:

hadoop namenode -format
//Next, start dfs by executing the command:

$HADOOP_HOME/sbin/start-dfs.sh

Step6: Configure Mesos :

Download and build Mesos on all machines.

ssh -i  ubnutu@Hostname
wget http://www-us.apache.org/dist/mesos/1.4.1/mesos-1.4.1.tar.gz
sudo apt-get -y install build-essential python-dev python-six python-virtualenv libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev
sudo apt install zlib1g-dev
tar -xvzf mesos-1.4.1.tar.gz
cd mesos-1.4.1
mkdir build
cd build
build$../configure
make install

The above commands will create libmesos.so file in /usr/local/lib folder.

Before running the master and slaves of mesos cluster from the instance make sure that all the required ports with the incoming request from master to slave are configured in the security group.

Ports shown below should be accessible anywhere and provide all port access of master-ip to its slaves as spark worker chooses any random port for it.

2

step 7. Run Mesos Master and Slave

  • To run master type this command on master machine inside mesos directory
    ./build/bin/mesos-master.sh  --ip=masterip --work_dir=/tmp/mesos
  • To run slaves type this command on all slaves
    sudo ./build/bin/mesos-slave.sh  --hadoop_home=$HADOOP_HOME  --master=hadoop-master:5050  --ip=slaveip   --work_dir=/tmp/mesos

step 8: Start spark submit and give master url as mesos:

Go to $SPARK_HOME/bin folder and start spark shell using following command

./spark-shell –-master mesos://hadoop-master:5050

After completing the above steps you can execute a command in spark-shell to verify everything is working fine.

knoldus-advt-sticker

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading