Autocomplete using Elasticsearch

You would have seen in a movie data store like IMDB, Whenever a user enters ‘g’, the search bar suggests him that you might be looking for gone girl or all the movies that have ‘g’ in them.

This is what an Autocomplete or word completion is and it has become an essential part of any application.

Autocomplete speeds up human-computer interaction by predicting the word using very few characters.

In this blog I’ll be discussing about result suggest autocomplete using elasticsearch which means that the predictions would be based on the existing data in the data store.

There is another type of autocomplete i.e search suggest autocomplete which works on the previously searched phrases but we won’t be discussing about it in this blog

Analyzers

Whenever we insert data into Elasticsearch, it analyzes the data so that an appropriate inverted index can be created.
The Analyzers consists of a tokenizer and one or more token filter which transform the data appropriately so that the business needs are met.

For this post we are using the nGrams analyzer.

N-gram is a contiguous sequence of n items from a given sequence of text. This means that we are breaking the search text into character permutations.

 

ngram-analyzer

 

Mapping And Settings

{
  "settings": {
    "analysis": {
      "filter": {
        "gramFilter": {
          "type": "nGram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "analyzer": {
        "gramAnalyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "gramFilter"
          ]
        },
        "whitespaceAnalyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "movies": {
      "properties": {
        "Title": {
          "type": "string",
          "analyzer": "gramAnalyzer",
          "search_analyzer": "whitespaceAnalyzer"
        },
        .
        .
        .
      }
    }
  }
}

Notice that we have defined a gramFilter of type nGram, min_gram and max_gram are the minimum and maximum characters that you want in the tokens and token_chars is the condition on which you want to create the grams.

And also we have used two analyzers in the mapping:-

  • gramAnalyzer
  • whitespaceAnalyzer

Now the question which must be striking you guys is, Why do we need two analyzers?

It’s just because we want to analyze the stored data and the search query differently.

  • The search text lowercased and is split on whitespaces.
  • The stored data is lowercased and gramFilter is applied on it.

Once our analyzers are ready we need to apply these to the field that we want to make suggestions for (In our example the field would be Title).

Searching

We can execute a match phrase query on “Title” field to use the autocomplete functionality.

The query looks like this:

{
  "query": {
    "match": {
      "Title": "go"
    }
  }
}

This query will return all the movies that are listed in the Elasticsearch index which contain ‘go’ in the Title.

An activator template implementing this feature can be found here.

References:
1. https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
2. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html


KNOLDUS-advt-sticker

Written by 

Rachel Jones is a Solutions Lead at Knoldus Inc. having more than 22 years of experience. Rachel likes to delve deeper into the field of AI(Artificial Intelligence) and deep learning. She loves challenges and motivating people, also loves to read novels by Dan Brown. Rachel has problem solving, management and leadership skills moreover, she is familiar with programming languages such as Java, Scala, C++ & Html.

3 thoughts on “Autocomplete using Elasticsearch

Leave a Reply

%d bloggers like this: