Autocomplete using Elasticsearch


You would have seen in a movie data store like IMDB, Whenever a user enters ‘g’, the search bar suggests him that you might be looking for gone girl or all the movies that have ‘g’ in them.

This is what an Autocomplete or word completion is and it has become an essential part of any application.

Autocomplete speeds up human-computer interaction by predicting the word using very few characters.

In this blog I’ll be discussing about result suggest autocomplete using elasticsearch which means that the predictions would be based on the existing data in the data store.

There is another type of autocomplete i.e search suggest autocomplete which works on the previously searched phrases but we won’t be discussing about it in this blog

Analyzers

Whenever we insert data into Elasticsearch, it analyzes the data so that an appropriate inverted index can be created.
The Analyzers consists of a tokenizer and one or more token filter which transform the data appropriately so that the business needs are met.

For this post we are using the nGrams analyzer.

N-gram is a contiguous sequence of n items from a given sequence of text. This means that we are breaking the search text into character permutations.

 

ngram-analyzer

 

Mapping And Settings

{
  "settings": {
    "analysis": {
      "filter": {
        "gramFilter": {
          "type": "nGram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "analyzer": {
        "gramAnalyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "gramFilter"
          ]
        },
        "whitespaceAnalyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "movies": {
      "properties": {
        "Title": {
          "type": "string",
          "analyzer": "gramAnalyzer",
          "search_analyzer": "whitespaceAnalyzer"
        },
        .
        .
        .
      }
    }
  }
}

Notice that we have defined a gramFilter of type nGram, min_gram and max_gram are the minimum and maximum characters that you want in the tokens and token_chars is the condition on which you want to create the grams.

And also we have used two analyzers in the mapping:-

  • gramAnalyzer
  • whitespaceAnalyzer

Now the question which must be striking you guys is, Why do we need two analyzers?

It’s just because we want to analyze the stored data and the search query differently.

  • The search text lowercased and is split on whitespaces.
  • The stored data is lowercased and gramFilter is applied on it.

Once our analyzers are ready we need to apply these to the field that we want to make suggestions for (In our example the field would be Title).

Searching

We can execute a match phrase query on “Title” field to use the autocomplete functionality.

The query looks like this:

{
  "query": {
    "match": {
      "Title": "go"
    }
  }
}

This query will return all the movies that are listed in the Elasticsearch index which contain ‘go’ in the Title.

An activator template implementing this feature can be found here.

References:
1. https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
2. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html


KNOLDUS-advt-sticker

This entry was posted in Elasticsearch, Scala and tagged , , , , , , , , . Bookmark the permalink.

3 Responses to Autocomplete using Elasticsearch

  1. Prabhat Kashyap says:

    Reblogged this on Prabhat Kashyap – Scala-Trek.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s