Elasticsearch: CURD Operations and sorting documents by time stamp with scala using java api of elastic search

Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. You can read it more on their website.

Elasticsearch is also written in Java and uses Lucene internally for all of its indexing and searching, but it aims to make full-text search easy by hiding the complexities of Lucene behind a simple, coherent, RESTful API.

In this post, we will learn to use elasticsearch java api in Scala. The scenario will be that we will perform CRUD operation on elasticsearch. Then searching and sorting on the basis of time stamp and retrieve specified size of document  from elasticsearch  index and validate it.

we will start with adding dependency of elasticsearch in the project. At the time 1.5.2 is the latest. Here is the snippet in build.sbt file.

name := “crudOnEs”

scalaVersion :=  “2.11.4”

libraryDependencies  ++= {
Seq(
“org.elasticsearch” % “elasticsearch” % “1.5.2”
)
}

First of all We need to create node in scala using java api which will interact with elasticsearch server running on our machine.I have created method getClient() which returns local client.

After this we will create an index with mappings and settings using addMappingToIndex() method.We are using  XContentBuilder for creating mapping json object.

The use case of this mappingBuilder object is when we required to perform sorting on the basis of custom defined time stamp ,first of all we need change following attributes of field “_timestamp” like “enabled” = true,”store” = true which is use for store time stamp in your index and “path”=”post_date” that means you set reference for “_timestamp” field which holds “post_date” value as time stamp where “post_date” is field of your document.

Here is the complete addMappingToIndex() method.

Elasticsearch is schemaless. We can index any json to it. We have a bulk json file, each line is a json. For our implementation: Application reads file line by line and index json to the elasticsearch. For this i have created insertBulkDocument() method which uses bulk api for insert set of documents in elastic search index.

Here is the bulk json file. Each line is a json.

Here is the complete insertBulkDocument() method.

After this we can perform sorting and searching by time stamp,for this i have created method sortByTimeStamp().In elastic search each documents by default ascending oder in terms of time stamp.In this method we are using QueryBuilders and FilterBuilders api for create filteredQuery.

Here is the complete sortByTimeStamp() method.

We can update document by adding one or more field . we have used update api  for create the UpdateRequest  on client we pass three parameters index name,type name,and id.If update request successfully execute then it will change version of document like if document version is 1 then after call this method version will be 2.

Here is the complete updateIndex() method.

We can delete document by id ,for this i have created method deleteDocumentById() which takes three parameters index name,type name,and id and returns delete response.

Here is the complete deleteDocumentById() method.

We can delete index from our node,for this i have created method deleteIndex() which takes client and index name as argument and returns acknowledgement in boolean.

Here is the complete deleteIndex() method.

Here is the complete application.

We can run this application by extending ESOperation trait in our main object like.

Here is the main object.

After this go to sbt console and type  => sbt run we will get expected output on console.

Or we can check data using curl request.Before  creating this request we require to comment the deleteIndex() call because deleteIndex() method delete index from current node so we get index misssing exception.

curl -XGET http://localhost:9200/twitter/tweet/_search?pretty

And here is the output of the curl request.

{
“took” : 8,
“timed_out” : false,
“_shards” : {
“total” : 5,
“successful” : 5,
“failed” : 0
},
“hits” : {
“total” : 9,
“max_score” : 1.0,
“hits” : [ {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “4”,
“_score” : 1.0,
“_source”:{ “id”: 4, “source”: “twitter”, “data”: “tweet 4” , “post_date”: “2015-05-15”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “9”,
“_score” : 1.0,
“_source”:{ “id”: 9, “source”: “twitter”, “data”: “tweet 9” , “post_date”: “2015-05-20”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “5”,
“_score” : 1.0,
“_source”:{ “id”: 5, “source”: “twitter”, “data”: “tweet 5” , “post_date”: “2015-05-16”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “6”,
“_score” : 1.0,
“_source”:{ “id”: 6, “source”: “twitter”, “data”: “tweet 6” , “post_date”: “2015-05-17”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “2”,
“_score” : 1.0,
“_source”:{ “id”: 2, “source”: “twitter”, “data”: “tweet 2” , “post_date”: “2015-05-13”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “7”,
“_score” : 1.0,
“_source”:{ “id”: 7, “source”: “twitter”, “data”: “tweet 7” , “post_date”: “2015-05-18”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “3”,
“_score” : 1.0,
“_source”:{ “id”: 3, “source”: “twitter”, “data”: “tweet 3” , “post_date”: “2015-05-14”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “8”,
“_score” : 1.0,
“_source”:{ “id”: 8, “source”: “twitter”, “data”: “tweet 8” , “post_date”: “2015-05-19”}
}, {
“_index” : “twitter”,
“_type” : “tweet”,
“_id” : “10”,
“_score” : 1.0,
“_source”:{ “id”: 10, “source”: “twitter”, “data”: “tweet 10” , “post_date”: “2015-05-21”}
} ]
}
}

You can explore more on the api here.

Download the source code to check the functionality. GitHub

Written by 

Narayan Kumar is a Sr. Software Consultant having experience of more than 3.5 years. He is passionate about Scala development and have worked on the complete range of Scala Ecosystem. He is a quick learner & curious to learn new technologies. He is responsible and a good team player. He has a good understanding of building streaming application on Apache Spark, Kafka and Cassandra.

4 thoughts on “Elasticsearch: CURD Operations and sorting documents by time stamp with scala using java api of elastic search

Leave a Reply

%d bloggers like this: