In this blog, we will learn to set up Elasticsearch on the minikube cluster but before that let’s look at elasticsearch.
Elasticsearch is a distributed, scalable, real-time search engine that supports full-text and structured searches, and analytics. It’s most typically used to index and search vast amounts of log data, but it can also be used to explore a variety of documents.
Use Cases of Elasticsearch:
- Application search
- Website search
- Enterprise search
- Logging and log analytics
- Infrastructure metrics and container monitoring
- Application performance monitoring
- Geospatial data analysis and visualization
- Business analytics
How does Elasticsearch work?
Elasticsearch receives raw data from a variety of sources, including logs, system metrics, and web applications. Data ingestion is the process of parsing, normalizing, and enriching raw data before indexing it in Elasticsearch. Once their data is indexed in Elasticsearch, users can perform complicated queries against it and utilize aggregations to generate complex summaries. Users can utilize Kibana to build rich data visualizations, share dashboards, and manage the Elastic Stack.
Why use Elasticsearch?
- Elasticsearch is fast: It excels at full-text search because it is built on top of Lucene. It is also a near real-time search platform, which means that the time between indexing a document and making it searchable is very short — typically one second.
- Elasticsearch is distributed by nature: Its documents are distributed across different containers known as shards, which are duplicated to provide redundant copies of the data in the event of hardware failure.
- Elasticsearch comes with a wide set of features: It has a number of powerful built-in features that make storing and searching data even more efficient, such as data rollups and index lifecycle management, in addition to its speed, scalability, and resiliency.
- The Elastic Stack simplifies data ingest, visualization, and reporting: Integration with Beats and Logstash makes it easy to process data before indexing into Elasticsearch.
As now we are aware of Elasticsearch, let’s get started with the demo part.
Step 1: Creating a Namespace
So before we roll out an Elasticsearch cluster, we’ll first create a Namespace.
kubectl create namespace elasticsearchdemo kubectl get namespaces
Step 2: Creating the Headless Service
Now we’ll establish elasticsearch, a headless Kubernetes service that will define a DNS domain for Pods. A headless service has no load balancing and no static IP address.
kind: Service apiVersion: v1 metadata: name: elasticsearch namespace: elasticsearchdemo labels: app: elasticsearch spec: selector: app: elasticsearch clusterIP: None ports: - port: 9200 name: rest - port: 9300 name: inter-node
In the elasticsearchdemo Namespace, we’ve created a Service called
elasticsearch and given it the
app: elasticsearch label. The.spec.selector is then set to
app: elasticsearch causing the Service to only select Pods with the
app: elasticsearch label.
The Service will produce DNS A records that will point to Elasticsearch Pods with the
app: elasticsearch label when we associate our Elasticsearch StatefulSet with it.
kubectl create -f elasticsearch_svc.yml
The service is then made headless by setting
clusterIP: None. Finally, we define ports 9200 and 9300 for interacting with the REST API and communicating between nodes.
kubectl get svc
Step 3: Creating the Elasticsearch StatefulSet
Now, we will create a Statefulset as it will allow us to assign a stable identity to Pods and grant them stable, persistent storage. To retain data across Pod rescheduling and restarts, Elasticsearch requires stable storage.
apiVersion: apps/v1 kind: StatefulSet metadata: name: es-cluster namespace: elasticsearchdemo spec: serviceName: elasticsearch replicas: 2 selector: matchLabels: app: elasticsearch template: metadata: labels: app: elasticsearch spec: containers: - name: elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0 resources: limits: cpu: 1000m requests: cpu: 100m ports: - containerPort: 9200 name: rest protocol: TCP - containerPort: 9300 name: inter-node protocol: TCP volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data env: - name: cluster.name value: k8s-logs - name: node.name valueFrom: fieldRef: fieldPath: metadata.name - name: discovery.seed_hosts value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch" - name: cluster.initial_master_nodes value: "es-cluster-0,es-cluster-1,es-cluster-2" - name: ES_JAVA_OPTS value: "-Xms512m -Xmx512m" initContainers: - name: fix-permissions image: busybox command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"] securityContext: privileged: true volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data - name: increase-vm-max-map image: busybox command: ["sysctl", "-w", "vm.max_map_count=262144"] securityContext: privileged: true - name: increase-fd-ulimit image: busybox command: ["sh", "-c", "ulimit -n 65536"] securityContext: privileged: true volumeClaimTemplates: - metadata: name: data labels: app: elasticsearch spec: accessModes: [ "ReadWriteOnce" ] storageClassName: do-block-storage resources: requests: storage: 100Gi
Here, we have defined a Statefulset called
es-cluster in the
elasticsearchdemo namespace. We next use the
serviceName parameter to link it to our previously made
elasticsearch Service. This guarantees that each Pod in the StatefulSet may be reached via
es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local where [0,1,2] corresponds to the Pod’s given integer ordinal.
matchLabels selector to
elasticsearch and specified two replicas (Pods), which we then mirror in the
.spec.template.metadata section. The fields
.spec.template.metadata.labels must be identical.
Here, we have defined pods in the statefulset. We give containers the name
elasticsearch and use the Docker image
docker.elastic.co/elasticsearch/elasticsearch:7.2.0(It can be modified to a different version).
resources parameter is used to specify that the container requires at least 0.1 vCPU and can burst up to 1 vCPU.
For REST API and inter-node communication, we open and identify ports 9200 and 9300.
volumeMount mounts the PersistentVolume named
data to the container at the path
In the container, we set the following environment variables:
cluster.name: The Elasticsearch cluster’s name which is
node.name: It is set to the
valueFrom. This will resolve
es-cluster-[0,1], depending on the node’s assigned ordinal.
discovery.seed_hosts: A list of master-eligible cluster nodes that will be used to start the node discovery process.
cluster.initial_master_nodes: A list of master-eligible nodes that will participate in the master election process is also specified in this field.
- ES_JAVA_OPTS: This is set to -Xms512m -Xmx512m, which directs the JVM to use a heap size of 512 MB for both the minimum and maximum heap sizes.
We define Init Containers that will run before the main
elasticsearch app container. Init Containers will execute in the order as they are defined.
fix-permissions will execute a
chown command to change the Elasticsearch data directory’s owner and group to
1000:1000, the Elasticsearch user’s UID. Kubernetes mounts the data directory as
root by default, making it inaccessible to Elasticsearch.
The second command,
increase-vm-max-map will execute a command to increase the operating system’s mmap count limitations, which are by default too low, resulting in out-of-memory issues.
The third Init Container
increase-fd-ulimit will execute the
ulimit command to increase the maximum number of open file descriptors.
Here, we define the StatefulSet’s
volumeClaimTemplates. Kubernetes will use this to create PersistentVolumes for the Pods. We call it
data (the name we used in the volumeMounts definition) and give it the same
app: elasticsearch label as our StatefulSet.
We then define ReadWriteOnce access mode, which means that it can only be mounted as read-write by a single node. The storage class is known as
Finally, we define that each PersistentVolume should be 3 GiB in size.
Now, deploy the StatefulSet using
kubectl apply -f elasticsearch_sts.yaml
We can monitor the StatefulSet as it is rolled out using
kubectl rollout status:
kubectl rollout status sts/es-cluster --namespace=elasticsearchdemo kubectl get po --namespace=elasticsearchdemo
After all of the Pods have been deployed, we can use the REST API to verify that our Elasticsearch cluster is up and running. To do so, use
kubectl port-forward to forward the local port 9200 to the port 9200 on one of the Elasticsearch nodes (es-cluster-0):
kubectl port-forward es-cluster-0 9200:9200 --namespace=elasticsearchdemo
Then, make the following curl request to the REST API:
To check the health of the Elasticsearch cluster,
The output will display the status of the Elasticsearch cluster which is ‘green’.