Setup a Apache Spark cluster in your single standalone machine

Table of contents
Reading Time: 2 minutes

If we want to make a cluster in standalone machine we need to setup some configuration.

We will be using the launch scripts that are provided by Spark, but first of all there are a couple of configurations we need to set

first of all setup a spark environment so open the following file or create if its not available with the help of template file spark-env.sh.template

/conf/spark-env.sh

and add some configuration for the workers like


export SPARK_WORKER_MEMORY=1g
export SPARK_EXECUTOR_MEMORY=512m
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=2
export SPARK_WORKER_DIR=/home/knoldus/work/sparkdata

Here SPARK_WORKER_MEMORY specifies the amount of memory you want to allocate for a worker node if this value is not given the default value is the total memory available – 1G. Since we are running everything in our local machine we woundt want the slave the use up all our memory.

The SPARK_WORKER_INSTANCES specified the number of instances here its given as 2 since we will only create 2 slave nodes.

The SPARK_WORKER_DIR will be the location that the run applications will run and which will include both logs and scratch space

with the help of above configuration we make a cluster of 2 workers with 1GB worker memory and every Worker use maximum 2 cores

The SPARK_WORKER_CORE will specified the number of core will be use by the worker

After setup environment you should add the IP address and port of the slaves into the following conf file

conf/slaves

when using the launch scripts this file is used to identify the host-names of the machine that the slave nodes will be running, Here we have standalone machine so we set localhost in slaves

Now start master by following command

sbin/start-master.sh

master is running on spark://system_name:7077 for eg spark://knoldus-dell:7077 and you can monitor master with localhost:8080

Screenshot from 2015-03-27 15:43:04

Now start workers for the master by the following commands

sbin/start-slaves.sh

Screenshot from 2015-03-27 15:43:30
now your standalone cluster is ready,use it with spark shell,open spark shell with following flag

spark-shell –master spark://knoldus-Vostro-3560:7077

you can also add some configuration of spark like driver memory,number of cores etc

Now run following commands in spark shell

val file=sc.textFile(“READ.md”)
file.count()
file.take(3)

Now you can see which worker work and which worker completed the task at master ui(localhost:8080)
Screenshot from 2015-03-27 15:53:30

12 thoughts on “Setup a Apache Spark cluster in your single standalone machine2 min read

  1. Nice descriptive article regarding configuration of spark cluster on standalone machine

  2. “with the help of above configuration we make a cluster of 2 workers with 1GB worker memory and every Worker use maximum 2 cores”

    Is that:
    a) 2 workers use (max) 2 cores and 1GB
    b) 2 workers use (max) 2x (2 cores and 1GB) => 4 cores and 2GB?

Comments are closed.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading