How to set up a 2 node Elasticsearch cluster on Kubernetes.

Reading Time: 6 minutes

In this blog, we will learn to set up Elasticsearch on the minikube cluster but before that let’s look at elasticsearch.

Elasticsearch is a distributed, scalable, real-time search engine that supports full-text and structured searches, and analytics. It’s most typically used to index and search vast amounts of log data, but it can also be used to explore a variety of documents.

Use Cases of Elasticsearch:

  • Application search
  • Website search
  • Enterprise search
  • Logging and log analytics
  • Infrastructure metrics and container monitoring
  • Application performance monitoring
  • Geospatial data analysis and visualization
  • Business analytics

How does Elasticsearch work?

Elasticsearch receives raw data from a variety of sources, including logs, system metrics, and web applications. Data ingestion is the process of parsing, normalizing, and enriching raw data before indexing it in Elasticsearch. Once their data is indexed in Elasticsearch, users can perform complicated queries against it and utilize aggregations to generate complex summaries. Users can utilize Kibana to build rich data visualizations, share dashboards, and manage the Elastic Stack.

Why use Elasticsearch?

  • Elasticsearch is fast: It excels at full-text search because it is built on top of Lucene. It is also a near real-time search platform, which means that the time between indexing a document and making it searchable is very short — typically one second.
  • Elasticsearch is distributed by nature: Its documents are distributed across different containers known as shards, which are duplicated to provide redundant copies of the data in the event of hardware failure.
  • Elasticsearch comes with a wide set of features: It has a number of powerful built-in features that make storing and searching data even more efficient, such as data rollups and index lifecycle management, in addition to its speed, scalability, and resiliency.
  • The Elastic Stack simplifies data ingest, visualization, and reporting: Integration with Beats and Logstash makes it easy to process data before indexing into Elasticsearch.

As now we are aware of Elasticsearch, let’s get started with the demo part.

Step 1: Creating a Namespace

So before we roll out an Elasticsearch cluster, we’ll first create a Namespace.

kubectl create namespace elasticsearchdemo

kubectl get namespaces

Step 2: Creating the Headless Service

Now we’ll establish elasticsearch, a headless Kubernetes service that will define a DNS domain for Pods. A headless service has no load balancing and no static IP address.

kind: Service
apiVersion: v1
metadata:
name: elasticsearch
namespace: elasticsearchdemo
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
clusterIP: None
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node

In the elasticsearchdemo Namespace, we’ve created a Service called elasticsearch and given it the app: elasticsearch label. The.spec.selector is then set to app: elasticsearch causing the Service to only select Pods with the app: elasticsearch label.

The Service will produce DNS A records that will point to Elasticsearch Pods with the app: elasticsearch label when we associate our Elasticsearch StatefulSet with it.

kubectl create -f elasticsearch_svc.yml

The service is then made headless by setting clusterIP: None. Finally, we define ports 9200 and 9300 for interacting with the REST API and communicating between nodes.

kubectl get svc

Step 3: Creating the Elasticsearch StatefulSet

Now, we will create a Statefulset as it will allow us to assign a stable identity to Pods and grant them stable, persistent storage. To retain data across Pod rescheduling and restarts, Elasticsearch requires stable storage.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-cluster
  namespace: elasticsearchdemo
spec:
  serviceName: elasticsearch
  replicas: 2
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
        resources:
            limits:
              cpu: 1000m
            requests:
              cpu: 100m
        ports:
        - containerPort: 9200
          name: rest
          protocol: TCP
        - containerPort: 9300
          name: inter-node
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        env:
          - name: cluster.name
            value: k8s-logs
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: discovery.seed_hosts
            value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
          - name: cluster.initial_master_nodes
            value: "es-cluster-0,es-cluster-1,es-cluster-2"
          - name: ES_JAVA_OPTS
            value: "-Xms512m -Xmx512m"
      initContainers:
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: increase-fd-ulimit
        image: busybox
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: do-block-storage
      resources:
        requests:
          storage: 100Gi

Here, we have defined a Statefulset called es-cluster in the elasticsearchdemo namespace. We next use the serviceName parameter to link it to our previously made elasticsearch Service. This guarantees that each Pod in the StatefulSet may be reached via es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local where [0,1,2] corresponds to the Pod’s given integer ordinal.

We’ve set matchLabels selector to elasticsearch and specified two replicas (Pods), which we then mirror in the .spec.template.metadata section. The fields .spec.selector.matchLabels and .spec.template.metadata.labels must be identical.

spec:

Here, we have defined pods in the statefulset. We give containers the name elasticsearch and use the Docker image docker.elastic.co/elasticsearch/elasticsearch:7.2.0(It can be modified to a different version).

The resources parameter is used to specify that the container requires at least 0.1 vCPU and can burst up to 1 vCPU.

ports:

For REST API and inter-node communication, we open and identify ports 9200 and 9300.

volumeMounts:

The data volumeMount mounts the PersistentVolume named data to the container at the path /usr/share/elasticsearch/data.

env:

In the container, we set the following environment variables:

  • cluster.name: The Elasticsearch cluster’s name which is k8s-logs.
  • node.name: It is set to the .metadata.name field using valueFrom. This will resolve es-cluster-[0,1], depending on the node’s assigned ordinal.
  • discovery.seed_hosts: A list of master-eligible cluster nodes that will be used to start the node discovery process.
  • cluster.initial_master_nodes: A list of master-eligible nodes that will participate in the master election process is also specified in this field.
  • ES_JAVA_OPTS: This is set to -Xms512m -Xmx512m, which directs the JVM to use a heap size of 512 MB for both the minimum and maximum heap sizes.

initcontainer :

We define Init Containers that will run before the main elasticsearch app container. Init Containers will execute in the order as they are defined.

The first fix-permissions will execute a chown command to change the Elasticsearch data directory’s owner and group to 1000:1000, the Elasticsearch user’s UID. Kubernetes mounts the data directory as root by default, making it inaccessible to Elasticsearch.

The second command, increase-vm-max-map will execute a command to increase the operating system’s mmap count limitations, which are by default too low, resulting in out-of-memory issues.

The third Init Container increase-fd-ulimit will execute the ulimit command to increase the maximum number of open file descriptors.

volumeClaimTemplates:

Here, we define the StatefulSet’s volumeClaimTemplates. Kubernetes will use this to create PersistentVolumes for the Pods. We call it data (the name we used in the volumeMounts definition) and give it the same app: elasticsearch label as our StatefulSet.

We then define ReadWriteOnce access mode, which means that it can only be mounted as read-write by a single node. The storage class is known as do-block-storage.

Finally, we define that each PersistentVolume should be 3 GiB in size.

Now, deploy the StatefulSet using kubectl:

kubectl apply -f elasticsearch_sts.yaml

We can monitor the StatefulSet as it is rolled out using kubectl rollout status:

kubectl rollout status sts/es-cluster --namespace=elasticsearchdemo

kubectl get po --namespace=elasticsearchdemo

After all of the Pods have been deployed, we can use the REST API to verify that our Elasticsearch cluster is up and running. To do so, use kubectl port-forward to forward the local port 9200 to the port 9200 on one of the Elasticsearch nodes (es-cluster-0):

kubectl port-forward es-cluster-0 9200:9200 --namespace=elasticsearchdemo

Then, make the following curl request to the REST API:

curl http://localhost:9200/_cluster/state?pretty

To check the health of the Elasticsearch cluster,

The output will display the status of the Elasticsearch cluster which is ‘green’.

Reference

Kubernetes

Written by 

I am a software Consultant at knoldus Inc in the DevOps studio. I am always excited to learn new things and upskill myself.