Setup a Apache Spark cluster in your single standalone machine

Table of contents

Reading Time: 2 minutes

If we want to make a cluster in standalone machine we need to setup some configuration.

We will be using the launch scripts that are provided by Spark, but first of all there are a couple of configurations we need to set

first of all setup a spark environment so open the following file or create if its not available with the help of template file spark-env.sh.template

/conf/spark-env.sh

and add some configuration for the workers like

export SPARK_WORKER_MEMORY=1g export SPARK_EXECUTOR_MEMORY=512m export SPARK_WORKER_INSTANCES=2 export SPARK_WORKER_CORES=2 export SPARK_WORKER_DIR=/home/knoldus/work/sparkdata

Here SPARK_WORKER_MEMORY specifies the amount of memory you want to allocate for a worker node if this value is not given the default value is the total memory available – 1G. Since we are running everything in our local machine we woundt want the slave the use up all our memory.

The SPARK_WORKER_INSTANCES specified the number of instances here its given as 2 since we will only create 2 slave nodes.

The SPARK_WORKER_DIR will be the location that the run applications will run and which will include both logs and scratch space

with the help of above configuration we make a cluster of 2 workers with 1GB worker memory and every Worker use maximum 2 cores

The SPARK_WORKER_CORE will specified the number of core will be use by the worker

After setup environment you should add the IP address and port of the slaves into the following conf file

conf/slaves

when using the launch scripts this file is used to identify the host-names of the machine that the slave nodes will be running, Here we have standalone machine so we set localhost in slaves

Now start master by following command

sbin/start-master.sh

master is running on spark://system_name:7077 for eg spark://knoldus-dell:7077 and you can monitor master with localhost:8080