Setup a Apache Spark cluster in your single standalone machine


If we want to make a cluster in standalone machine we need to setup some configuration.

We will be using the launch scripts that are provided by Spark, but first of all there are a couple of configurations we need to set

first of all setup a spark environment so open the following file or create if its not available with the help of template file spark-env.sh.template

/conf/spark-env.sh

and add some configuration for the workers like


export SPARK_WORKER_MEMORY=1g
export SPARK_EXECUTOR_MEMORY=512m
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=2
export SPARK_WORKER_DIR=/home/knoldus/work/sparkdata

Here SPARK_WORKER_MEMORY specifies the amount of memory you want to allocate for a worker node if this value is not given the default value is the total memory available – 1G. Since we are running everything in our local machine we woundt want the slave the use up all our memory.

The SPARK_WORKER_INSTANCES specified the number of instances here its given as 2 since we will only create 2 slave nodes.

The SPARK_WORKER_DIR will be the location that the run applications will run and which will include both logs and scratch space

with the help of above configuration we make a cluster of 2 workers with 1GB worker memory and every Worker use maximum 2 cores

The SPARK_WORKER_CORE will specified the number of core will be use by the worker

After setup environment you should add the IP address and port of the slaves into the following conf file

conf/slaves

when using the launch scripts this file is used to identify the host-names of the machine that the slave nodes will be running, Here we have standalone machine so we set localhost in slaves

Now start master by following command

sbin/start-master.sh

master is running on spark://system_name:7077 for eg spark://knoldus-dell:7077 and you can monitor master with localhost:8080

Screenshot from 2015-03-27 15:43:04

Now start workers for the master by the following commands

sbin/start-slaves.sh

Screenshot from 2015-03-27 15:43:30
now your standalone cluster is ready,use it with spark shell,open spark shell with following flag

spark-shell –master spark://knoldus-Vostro-3560:7077

you can also add some configuration of spark like driver memory,number of cores etc

Now run following commands in spark shell

val file=sc.textFile(“READ.md”)
file.count()
file.take(3)

Now you can see which worker work and which worker completed the task at master ui(localhost:8080)
Screenshot from 2015-03-27 15:53:30

About sandeep

I m working as an software consultant in Knoldus Software LLP . I m working on scala, play, spark,hive, hdfs, hadoop and many big data technologies.
This entry was posted in apache spark, Scala, Spark and tagged , , , . Bookmark the permalink.

9 Responses to Setup a Apache Spark cluster in your single standalone machine

  1. Pingback: How to run an application on Standalone cluster in Spark? | Knoldus

  2. Pingback: How to run an application on Standalone cluster in Spark? | Apache Spark Central

  3. Pingback: Setup a Apache Spark cluster in your single standalone machine | Apache Spark Central

  4. Pingback: Deploy a Spark Application on Cluster | Knoldus

  5. Swetha says:

    Thank you 🙂

  6. Rubin Porwal says:

    Nice descriptive article regarding configuration of spark cluster on standalone machine

  7. Henry says:

    “with the help of above configuration we make a cluster of 2 workers with 1GB worker memory and every Worker use maximum 2 cores”

    Is that:
    a) 2 workers use (max) 2 cores and 1GB
    b) 2 workers use (max) 2x (2 cores and 1GB) => 4 cores and 2GB?

  8. super helpful! thanks for the tutorial!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s