Working with Nested Aggregation of Elasticsearch

Table of contents

Reading Time: 2 minutes

First of all we need to understand aggregation in ElasticSearch.In Elasticsearch an aggregation can be seen as a unit of work that builds analytic information over a set of documents.It is a powerful tool for build complex summaries of the data.

There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into three main families.

Bucket aggregation: A family of aggregations that build buckets, where each bucket is associated with a key and a document criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to “fall in” the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets – each one with a set of documents that “belong” to it.Example: filter aggregation,term aggregation,nested aggregation, etc.

Metrics aggregation: A family of aggregations that keep track and compute metrics over a set of documents.Example: Avg aggregation,Max aggregation,Min aggregation ,etc.

Pipeline aggregation: A family of aggregation that aggregate the output of other aggregations and their associated metrics.Example: Cumulative sum aggregation,Avg Bucket aggregation,etc.

In this blog, we will learn nested aggregation and its implementation using java api.

Nested aggregation: A special single bucket aggregation that enables aggregating nested documents (A document which contain another set of documents is called nested document).

First we need to create an index with nested mapping.

We need to create TCP client using java api which will interact with elasticsearch server running on our machine.We have a method getClient() which returns TCP client.

def getClient(): Client = {
val settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff", true)
.put("client.transport.ping_timeout", "6s")
.put("cluster.name", "elasticsearch")
.build()
val client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress("localhost", 9300))
client
}

Now, we need to create nested aggregation using AggregationBuilders class.

val avgAggregateByPrice = AggregationBuilders.nested("nested_doc").path("books")
.subAggregation(
AggregationBuilders.terms("genre_count").field("books.genre")
.subAggregation(
AggregationBuilders.avg("avg_price").field("books.price")))

This aggregation has two methods, first nested() which creates nested aggregation with “nested_doc” name and second is path() which determines nested field here “books”is nested field.The Term bucket aggregation with “genre_count” name classify books on the basis of genre and subaggregation average with “avg_price” name calculates average on price of books of each genre.

We have getAvgAggregateByPrice() method which returns the average over price on each of genre books.

def getAvgAggregateByPrice(client: Client) = {
val searchResponse = client.prepareSearch().setSearchType("count").setIndices("authors")
.addAggregation(avgAggregateByPrice).execute().get
val agg: Nested = searchResponse.getAggregations().get("nested_doc")
val listValues = (agg.getAggregations.get[Terms]("genre_count").getBuckets map { groupBucket =>
val avgPrice = groupBucket.getAggregations.get[Avg]("avg_price").getValueAsString
("gener" -> groupBucket.getKey.toString(), "Number of document" -> groupBucket.getDocCount,
"Average price" -> avgPrice)
}).toList

listValues
}

Here you can see how to extract nested aggregation and its subaggregation from Search Response of ES.

You can clone this project from here ES-Nested-Agg