Working with Nested Aggregation of Elasticsearch

First of all we need to understand aggregation in ElasticSearch.In Elasticsearch an aggregation can be seen as a unit of work that builds analytic information over a set of documents.It is a powerful tool for build complex summaries of the data.

There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into three main families.

Bucket aggregation: A family of aggregations that build buckets, where each bucket is associated with a key and a document criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to “fall in” the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets – each one with a set of documents that “belong” to it.Example: filter aggregation,term aggregation,nested aggregation, etc.

Metrics aggregation: A family of aggregations that keep track and compute metrics over a set of documents.Example: Avg aggregation,Max aggregation,Min aggregation ,etc.

Pipeline aggregation: A family of aggregation that aggregate the output of other aggregations and their associated metrics.Example: Cumulative sum aggregation,Avg Bucket aggregation,etc.

In this blog, we will learn nested aggregation and its implementation using java api.

Nested aggregation: A special single bucket aggregation that enables aggregating nested documents (A document which contain another set of documents is called nested document).

First we need to create an index with nested mapping.

Nested Aggregation

We need to create TCP client using java api which will interact with elasticsearch server running on our machine.We have a method getClient() which returns TCP client.

Now, we need to create nested aggregation using AggregationBuilders class.

This aggregation has two methods, first nested() which creates nested aggregation with “nested_doc” name and second is path() which determines nested field here “books”is nested field.The Term bucket aggregation with “genre_count” name classify books on the basis of genre and subaggregation average with “avg_price” name calculates average on price of books of each genre.

We have getAvgAggregateByPrice() method which returns the average over price on each of genre books.

Here you can see how to extract nested aggregation and its subaggregation from Search Response of ES.

You can clone this project from here ES-Nested-Agg

Written by 

Narayan Kumar is a Sr. Software Consultant having experience of more than 3.5 years. He is passionate about Scala development and have worked on the complete range of Scala Ecosystem. He is a quick learner & curious to learn new technologies. He is responsible and a good team player. He has a good understanding of building streaming application on Apache Spark, Kafka and Cassandra.

Leave a Reply

%d bloggers like this: