An Overview of Elasticsearch

Reading Time: 3 minutes

Introduction

Elasticsearch is a distributed, open-source full-text search and analytics engine and comprises schema-free JSON documents. It is built based on the Apache Lucene library. It is an important part of the ELK stack. Data can be stored, searched, and analyzed in near real-time. Results can be retrieved in milliseconds. Documents are used to store data instead of tables. It also comes with a rich set of APIs to perform different operations. Fast search responses are possible because they search for indexes instead of text. It can be viewed as the server that takes JSON requests as input and gives back output in the same format also. Elasticsearch works with all types of data.

Components

  1. Cluster :- It is group of one or more server providing indexing and search capabilities. Cluster size can differ from one node to thousand of nodes based upon the use case.
  2. Node :- Node is a single machine that holds all or part of data. It also provide computing power for indexing and searching your data.An Elasticsearch node can be configured in different ways:
    1. Master Node :- Master node is responsible for all administrative task of the cluster. It tracks the availability and failure of different nodes. It plays crucial role in all cluster-wide operations like creating, deleting an index and adding/removing nodes.A cluster having single master node has certainly a single point of failure. Elasticsearch provides the capability to have multiple master-eligible nodes. All the master eligible nodes participate in an election to elect a master node.
    1. Data Node :- It stores data in form of shards and participate in the CRUD, search, and aggregate operations.
    2. Coordinating Node :- These nodes act as load balancers. End user always interact with these nodes. It forwards cluster requests to the master node and data-related requests to data nodes.
  3. Index :- An index is a collection of documents having similar characteristics. An index is highest level entity that you can query against in Elasticsearch. Index is similar to a database in relational database schema. The index name is required to perform any operation on document such as add, delete, update etc.
  4. Shards :- Elasticsearch provides the capability of dividing the index into multiple pieces known as shards. A shard is fully functional and independent index that can be hosted on any node on the cluster. They are important as it allows to split of data volume horizontally. Documents can be distributed in an index across multiple shards. By distributing those shards across multiple nodes, it can ensure redundancy, which will protect against hardware failures.
  5. Type :– Type represents a class of similar documents inside an index. It is logical grouping of documents. It is used in the query for improving performance. Lucene has no concept of document data types, so Elasticsearch would store the type name of each document in a metadata field of a document called _type.
  6. Documents :- Document is the basic unit of information. It is represented in JSON format. Document can be thought as a row in a relational database. Each document has a unique ID and a data type. It shows what kind of entity the document is. You can store as many documents you want in an index.

Applications

  1. Logging :- It is one the key application of Elasticsearch. The different companies use it for ingesting and analyzing logs of different applications in near real time. It provides important insights to the teams to take appropriate actions if something goes wrong.
  2. Security analysis :-Another crucial analytics application of Elasticsearch is the security analysis. The logs related the security of the systems can be analyzed with the ELK stack. Across the globe, many fraud detection projects are using it.
  3. Full Text Search :- As we know that full text search is the core capability of Elasticsearch. Because of its faster search capability , it is used by many E-commerce websites to give its users a great experience.

Thanks for reading

Written by 

Amarjeet is a Software Consultant at Knoldus Software LLP. He has an overall experience of 2 years and 10 months.He has completed his Bachelor of Technology in Computer Science from National Institute of Technology, Hamirpur, Himachal Pradesh. He likes reading books, travelling and trekking.