If we want to make a cluster in standalone machine we need to setup some configuration.
We will be using the launch scripts that are provided by Spark, but first of all there are a couple of configurations we need to set
first of all setup a spark environment so open the following file or create if its not available with the help of template file spark-env.sh.template
and add some configuration for the workers like
Here SPARK_WORKER_MEMORY specifies the amount of memory you want to allocate for a worker node if this value is not given the default value is the total memory available – 1G. Since we are running everything in our local machine we woundt want the slave the use up all our memory.
The SPARK_WORKER_INSTANCES specified the number of instances here its given as 2 since we will only create 2 slave nodes.
The SPARK_WORKER_DIR will be the location that the run applications will run and which will include both logs and scratch space
with the help of above configuration we make a cluster of 2 workers with 1GB worker memory and every Worker use maximum 2 cores
The SPARK_WORKER_CORE will specified the number of core will be use by the worker
After setup environment you should add the IP address and port of the slaves into the following conf file
when using the launch scripts this file is used to identify the host-names of the machine that the slave nodes will be running, Here we have standalone machine so we set localhost in slaves
Now start master by following command
master is running on spark://system_name:7077 for eg spark://knoldus-dell:7077 and you can monitor master with localhost:8080
Now start workers for the master by the following commands
spark-shell –master spark://knoldus-Vostro-3560:7077
you can also add some configuration of spark like driver memory,number of cores etc
Now run following commands in spark shell