In this blog, we will learn about the indexes used in Couchbase. And get to know about the good and bad side of the primary index.
Wait!! ✋ you must know some BASICS before reading this blog.
Just check it out here ¯\_(ツ)_/¯ Couchbase – Lets Get Started
So, an index is a data structure that provides quick and efficient means to access data. We have two types of Indexes – Primary and Secondary.
A Primary index of N1QL will allow you to query any field of your JSON. As such, it takes up more space and is less efficient. It is the basic requirement to be able to use the query language on your data.
A secondary index of N1QL is targeted at the specific field(s) and will be more performant for queries involving these fields.
So, it’s clear from the above definition that if you really want a good performance, then prefer a secondary index. Even it’s advisable not to use Primary Index in production.
Then don’t we really need a primary index? So, let’s look at some of the scenarios with examples that illustrate good and bad side of the primary index.
Please have a look at the samples that I will be using.
Sample JSON document
Sample document key
The query for creating an index:
CREATE PRIMARY INDEX `employeeIndex` ON `employee`
Do we really need Primary Index?
1) Couchbase does not scan all the documents from the start to end. But, this can be achieved with the Primary Index. Primary index provides us FULL TABLE SCAN. So, primary key is basically a sorted list of the document key considering different document types.
2) Consider the following query
select * from employee WHERE name= “Bhawna” and department = “IT”
Here, with only primary index , the query execution increases quite much. Because first of all the primary index will fetch all the documents and then perform filtering on the mentioned attribute. So, here secondary index will come as a rescue and improved the execution time.
3) Consider a scenario, where you have just started with Couchbase. Everything is new, so at that time will you really bother about performance? With one primary index, you can easily run most of the queries. As your primary concern is understanding data not execution time!
4) If you want to fetch all your employees having empid 1 to 10 (make sure empId should be the part of the document key) or in simple words when we know the range of document key that we want to scan. So, this can be achieved without much compromising on performance. Remember, here we are filtering on the basis on document Id, not on attributes.
select * from employee WHERE META().id between “emp::1” and “emp::10”
Another scenario when we know the pattern of our Meta().id.
select * from employee WHERE META().id like “emp::%”
So, in this blog, we discuss the indexes- Primary and Secondary. And some bad and good side of primary index.
Hope this is helpful. Please feel free to provide your suggestions. 🙂 Please comment if you have some queries and want to add some points.