Introduction to ElasticSearch in Scala


Elasticsearch is a real-time distributed search and analytics engine built on top of Apache Lucene. It is used for full-text search, structured search and analytics.

Lucene is just a library and to leverage its power you need to use Java. Integrating Lucene directly with your application is a very complex task.

Elasticsearch uses the indexing and searching capabilities of Lucene but hides the complexities behind a simple RESTful API.

In this post we will learn to perform basic CRUD operations using Elasticsearch transport client in Scala with sbt as our build-tool.

Let us start by downloading Elasticsearch from here and unzipping it.

Execute the following command to run Elasticsearch in foreground:

cd elasticsearch-<version>
./bin/elasticsearch

Test it out by opening another terminal window and running the following:

curl 'http://localhost:9200/?pretty'

You should see a response like this:

{
"name" : "Don Fortunato",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.3.2",
"build_hash" : "b9e4a6acad4008027e4038f6abed7f7dba346f94"
"build_timestamp" : "2016-04-21T16:03:47Z",
"build_snapshot" : false,
"lucene_version" : "5.5.0"
},
"tagline" : "You Know, for Search"
}

To start with the coding part, create a new sbt project and add the following dependency in the build.sbt file.

"org.elasticsearch" % "elasticsearch" % "2.3.2"

Next, we need to create a client that will talk to the elaticsearch server.

private val port = 9300

private val nodes = List("localhost")

private val addresses = nodes.map { host = new InetSocketTransportAddress(InetAddress.getByName(host), port) }

lazy private val settings = Settings.settingsBuilder().put("cluster.name", "elasticsearch").build()

val client:Client = TransportClient.builder()
.settings(settings).build().addTransportAddresses(addresses:_*)

Once the client is created we can query the Elasticsearch server.

The following example inserts a JSON document into an index called library, under a type called books.

An index is like a database and a type is like a table in Elasticsearch.

Lets create our first json document.

val jsonString =
{
"title": "Elastic",
"price": 2000,
"author":{
"first": "Zachary",
"last": "Tong";
}
}

To add a json into the Elasticsearch add the following code to your project:

client.prepareIndex("library","books","1").setSource(jsonString).get()

The prepareIndex method takes 3 arguments:- index name,type,id. The id argument is optional. If you do not specify an id Elasticsearch will automatically generate an id for the document.

Note that the title of the book is Elastic and not Elastic search. Lets correct this by executing an update on the document:

client.prepareUpdate("library","books","1").setDoc("title", "Elasticsearch").get()

Lets search for our document and see whether the document is updated or not
Execute the following code to search for a document:

client.prepareSearch("library").setTypes("books")

.setQuery(QueryBuilders.termQuery("_id","1")).get()

The id that we specified while adding the document is stored as "_id".
If you do not specify the setQuery method then Elasticsearch will get all the documents in the type books.

Finally to delete a document execute the following code:

client.prepareDelete("library","books",2).get()

Elasticsearch also provides bulk API used to insert multiple documents onto the Elasticsearch server in a single API call.
To use the bulk API create a file in the following format:

{ "title" :"Java","price":"1000","author":{"first":"Chris","last":"Adamson"} }
{ "title" : "Scala","price":"2000","author":{"first":"Martin","last":"Ordersky"} }
{ "title" : "C","price":"3000","author":{"first":"Dennis","last":"Ritchie"} }

Now lets create a bulk request and add the following documents to the request:
Open an InputStream and read the json file you just created and store the data in a list named fileData.

val bulkRequest:BulkRequestBuilder = client.prepareBulk()
fileData.foreach{
json => bulkRequest.add(client.prepareIndex("library","books").setSource(json))
}
bulkRequest.get()

We are done with the CRUD operations. You can read more from the Elasticsearch docs.
Get the source code from here.

Happy Searching!!!!

Advertisements
This entry was posted in Elasticsearch, Scala and tagged , , . Bookmark the permalink.

4 Responses to Introduction to ElasticSearch in Scala

  1. Raymond Fu says:

    Reblogged this on Big Data of Everything and commented:
    I haven’t had time recently to write blogs of my own, but I saw this blog and I think it’s really helpful for those who are interested in doing ElasticSearch, so I’m going to share it on my own blog. This is the first time I’m doing a reblog.

  2. ayush says:

    great

  3. Prabhat Kashyap says:

    Reblogged this on Prabhat Kashyap – Scala-Trek.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s