In this blog we will focus on two major things :
1). Steps required to create a two node elasticsearch (v5.2 released on 31.Jan.2017) cluster on Linux instances (with CentOs as the default OS).
2). Attaching additional volume to the instances and making changes in elasticsearch configurations so that all the elasticsearch related data will be stored on the mounted volumes, since the default storage associated with the Linux instance is quite less (8 GB for a t2.micro type instance)
Prerequisites ->
1). Creating instances ->
We will require two ec2 instances with additional EBS volume attached and mounted to them. All the elasticsearch related data will be stored on the mounted volume.
To achieve this, follow the steps of my previous blog here and create 2 EC2 instances. For this blog I have taken two t2.micro type instance with EBS volume of 10 GB mounted to them. You may choose instance type and EBS volume size according to your use case.
2). Making the cluster nodes visible to each other ->
From the aws console, create a security group demo_elasticsearch_node. To it add an inbound rule that permits traffic to demo_elasticsearch_node security group via port 9300, as in the following pic :
Now assign demo_elasticsearch_node security group to both the instances.
3). Installing java 8 on instances ->
Elasticsearch version 5.2 is supported on java 8. By default the ec2 instances have java 7. You may check and upgrade the java version to 8 on your instance by following the steps here.
After creating, mounting volume and updating your instances to java 8, follow following steps on both instances in sequence.
Step 1 -> Installing elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.2.0.rpm sudo yum install -y elasticsearch-5.2.0.rpm
The commands above download and install latest elasticsearch (5.2) on your instance.
Step 2 -> Making required directory structure
sudo mkdir /mnt/elasticsearch sudo chown elasticsearch /mnt/elasticsearch sudo chgrp elasticsearch /mnt/elasticsearch
This creates the directory on the mounted volume where elasticsearch related data would be stored and changes the group and owner of ‘elasticsearch’ directory to elasticsearch.
Note -> ‘elasticsearch’ is the default user and group for elasticsearch and if required it may be updated in /etc/sysconfig/elasticsearch configuration file.
Step 3 -> Updating confugration to store ES data on mounted volume
export RAM_50_PERCENT=500m //replace it with 50% ram of your instance
Since we are working with t2.micro instance which comes with 1 GB ram as default, we will assign a heap size of 50% of ram ie 500 MB of memory. For better understanding of why alocating 50% of RAM as heap size is optimal, refer the elasticsearch documentation here. Now, following commands will update where the data and logs related to elasticsearch must be stored
sudo sed -i "s|#DATA_DIR=/var/lib/elasticsearch|DATA_DIR=/mnt/elasticsearch/lib/elasticsearch|" /etc/sysconfig/elasticsearch sudo sed -i "s|#LOG_DIR=/var/log/elasticsearch|LOG_DIR=/mnt/elasticsearch/log/elasticsearch|" /etc/sysconfig/elasticsearch sudo sed -i "s|#ES_JAVA_OPTS=|ES_JAVA_OPTS=\"-Xms$RAM_50_PERCENT -Xmx$RAM_50_PERCENT\"|" /etc/sysconfig/elasticsearch
Instead of editing the configuration file manually, I have done it via sed command for swift setup.
Step 4 -> Setting up ES cluster
Note -> Step 1 to 3 (above) are common for both the instance, in this step be carefull about setting LOCAL_PRIVATE_IP and NODE_PRIVATE_IP environment variables.
export CLUSTER_NAME="elasticsearch" //replace with the cluster name of your choice export LOCAL_PRIVATE_IP="172.31.37.48" //replace with private IP address of your instance export NODE_PRIVATE_IP="172.31.14.92" //replace with the private IP address of the second instance
In the export commands above we specify the name of the cluster ie ‘elasticsearch’, private IP of the instance in which we are going to start elasticsearch (LOCAL_PRIVATE_IP) and private IP of the second node of the cluster (NODE_PRIVATE_IP).
The values of the last two environment variables for the second instance will be swapped as following:
export LOCAL_PRIVATE_IP="172.31.14.92" export NODE_PRIVATE_IP="172.31.37.48"
Now we will make changes in elasticsearch.yml to specify the cluster name and private IPs of the ES cluster nodes
sudo sed -i "s|#cluster.name: my-application|cluster.name: $CLUSTER_NAME|" /etc/elasticsearch/elasticsearch.yml sudo sed -i "s|#network.host: 192.168.0.1|network.host: $LOCAL_PRIVATE_IP|" /etc/elasticsearch/elasticsearch.yml sudo sed -i "s|#discovery.zen.ping.unicast.hosts: \[\"host1\", \"host2\"\]|discovery.zen.ping.unicast.hosts: \[\"$LOCAL_PRIVATE_IP\", \"$NODE_PRIVATE_IP\"\]|" /etc/elasticsearch/elasticsearch.yml
Step 5 -> Starting elasticsearch
Start elasticsearch on both the instances using following command :
sudo service elasticsearch start
In case Elastic search does not start and give a error message as follows :
(May happen as t2.micro instance comes with quite less RAM, unlikely to happen with bigger instances)
Try emptying the buffers cache using following commands and then start elasticsearch
sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches' sudo sh -c 'echo 2 >/proc/sys/vm/drop_caches' sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'
Step 6 -> Verifying elasticsearch cluster with 2 nodes
Hit following curl command to verify your clusters health
curl -XGET "http://$LOCAL_PRIVATE_IP:9200/_cluster/health?pretty"
The field “number_of_nodes” : 2 shows that a ES cluster with 2 nodes is been created.
Accordingly alter the security group of the cluster to make it visible according to your need.
Hope the blog helps, please leave your questions in the comments below.
You need three nodes, not two, for a proper cluster. The third can just be a non-data node (arbiter). Without this you can’t get quorum and the health of your cluster will always be yellow.
Hey,
In this case of 2 node cluster, ‘discovery.zen.minimum_master_nodes’ property in ‘elasticsearch.yml’ set to default value of 1, due to which the cluster would be prone to ‘split brains’ problem, a condition with the existence of two masters in a single cluster. And this happens when the nodes of the cluster cant communicate with each other maybe due to temporary network issue or short span failure of one node out of two and thus cause data integration problem (due to creation of multiple masters) thus affecting health of the cluster. But for cluster health to be yellow, split brain problem must be encountered once I guess and until that the state remains to be green as you can check in the screenshot where I hit a curl to check the status of the cluster and its green.
To make a 2 node cluster less prone to ‘split brains’, we could set ‘discovery.zen.minimum_master_nodes’ property to 2 (According to quorum is 2/2+1 which is 2). By doing this, whenever one of the node of cluster goes down, elastic search cluster on the whole will go down since 2 masters must be available for the cluster to sustain thus resolving the problem of inconsistency but introducing the problem of unavailability.
Talking about quorum, it says that for a cluster to be consistent and healthy, it would need minimum masters to be 2/2+1 which is 2 but since we have only 1 minimum master(default) it makes our cluster prone to split brains.
Hence I agree that 3 node cluster with ‘discovery.zen.minimum_master_nodes’ set to 2 would be more optimal as it will reduce the chance of ‘split brain’ but it would not be correct to say that a 2 node cluster will always have a yellow health.