Elasticsearch: How to paginate over selected data in elasticsearch with scala using Scroll API of elasticsearch.

Elasticsearch is  real-time,distributed,full-text search analytics engine.It is built on top of Apache Lucene™.You can read it more on their website.

It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine.

In this post we will learn pagination over selected data using Scroll API of  elasticsearch .The scenario will be that we will take json data from input file and insert into ES index ,after that we will request a search query through Scroll API and fetch specified size of records from scrolled data.

Now, start with adding dependency of elasticsearch in the project.Here is the snippet in build.sbt file.

name := “ES-Scroll-API”

scalaVersion :=  “2.11.4”

libraryDependencies  ++= {
Seq(
“org.elasticsearch” % “elasticsearch” % “1.5.2”,
“ch.qos.logback”       %     “logback-classic”          %      “1.0.13”
)
}

First of all We need to create node in scala using java api which will interact with elasticsearch server running on our machine.I have created method getClient() which returns local  node client.

Elasticsearch is schemaless. We can index any json to it. We have a  inputJson file, each line is a json. For our implementation: Application reads file line by line and insert json into the elasticsearch index . For this i have created insertBulkDoc() method which is  uses bulk api for insert set of documents in elastic search index.

Here is the complete insertBulkDoc() method.

After this we will perform fetch chunks of data through scroll api.Scroll api of ES provides effective way to paginate over selected data.Each call to the scroll API returns the next batch of results until there are no more results left to return, ie the hits array is empty.The initial search request and each subsequent scroll request returns a new _scroll_id — only the most recent _scroll_id should be used for retrieve current page data.for this i have created scrollFetch() method.for more information about Scroll API here

Here is the complete scrollFetch() method.

Now, for validation we store  scrolled records into local file.For this  we have a writeDataOnLocalFile() private method which writes fetched records into local file system.

Here is the complete writeDataOnLocalFile() method.

After this we can delete index from our node,for this i have created method deleteIndex() which takes client and index name as argument .

Here is the complete deleteIndex() method.

Here is the complete application.

We can run this application by extending ESScrollApi trait in our main object like.

Here is the main object.

After this go to sbt console and type  => ‘sbt run’ we will get expected output on console as well as on local file.

output2

Download the source code to check the functionality. GitHub

Written by 

Narayan Kumar is a Sr. Software Consultant having experience of more than 3.5 years. He is passionate about Scala development and have worked on the complete range of Scala Ecosystem. He is a quick learner & curious to learn new technologies. He is responsible and a good team player. He has a good understanding of building streaming application on Apache Spark, Kafka and Cassandra.

1 thought on “Elasticsearch: How to paginate over selected data in elasticsearch with scala using Scroll API of elasticsearch.

Leave a Reply

%d bloggers like this: