Indexing in MarkLogic: a quick data finder.

Reading Time: 5 minutes
Marklogic and indexing
( The Above Image is the property of, we did not claim its ownership and we have used this image for reference purposes only.)


MarkLogic is a document-oriented database that provides a highly flexible cloud-based solution for storing and managing documents. It has been used in applications ranging from enterprise search to e-commerce and financial services. This article will explain how to index in MarkLogic Server. Using the Universal Index, as well as other types of indexes available within the product. We’ll also cover reindexing options when your index becomes fragmented. Due to changes in document metadata or new content within your system

Overview of MarkLogic Server,

MarkLogic Server is a JSON database and is open source. It’s designed to work with the MarkLogic API. Which is used to access data stored in the database and perform queries against it.

The Universal Index is a type of index that enables you to store multiple types of values in your fields. Like string, integer, or boolean, together with their associated data types. This allows users to search for all possible combinations without having to specify them explicitly within their queries. Instead, they can just enter a value into one field and get back results from other fields using wildcards like [0]*[1]*.

The Universal Index

The universal index is a single, global index that contains all the data in the database. It can be thought of as an inverted list. Where each element represents one row in your database, and there are no duplicates. For example, if you have two customers with the first names “John” and “Fred,” then their respective records. Would be stored under different columns (e.g., one customer might have a first name column called “FirstName”, while another would have a second column called “LastName”). However, since both have the last name column named “LastName”, they are both considered part of the same record for purposes of indexing purposes. And because they share an identifier (i.e., their last names), they will also share related keys associated with these values:

Other Types of Indexes

The Universal Index

The Universal Index is the main indexing type in MarkLogic. It can be used to create indexes on any field in any document. In combination with other fields like text or date. The following code example shows how you would create an index on a simple field:

Index Size

For a document-based index, the size of your index is directly proportional to how many documents are in your database.

For example: if you have 100 documents and each one has 10 fields, then you need to create an index that has 100Mb of storage space (100 * 10). If you have 3 million documents and each one has 10 fields as well, then again we need 100Mb for this case too (3 million * 10). This can get expensive very quickly if there are a lot of documents in your database!


In MarkLogic, fields are the building blocks of indexes. You can use them to create a simple index or an advanced one. Fields are also used in queries and searches. For example, if you want to find a list of users who live in London but don’t have any children under age 10. You would create an index on their birth date field:

{ “id”: 1, “name”: “Jane Doe”, “birth_date”: 2008-04-15 }

In this case, there’s only one field present—the id—but it could be more complex than just two values (or even just one).


Reindexing is the process of rebuilding indexes. In general, reindexing can be done automatically or manually. When performed automatically, MarkLogic creates a new index file and refreshes all of your existing data in order to ensure that it has been indexed correctly. You do not need to do anything; MarkLogic will take care of this task for you.

If you wish to perform manual reindexing yourself, then there are two ways for doing so. Firstly by clicking on Tools > Reindexes All Tables… from within the UI. This option does not appear if there are no errors reported). Secondly through the use of SQL commands such as CREATE INDEX FROM … COLLATION SETS [ … ] OR REPLACE ARCHIVE FROM AFRICA TO ASIA NATIONALITY MAPPINGS BY SAME NAME ONLY.


Relevance is a measure of how well an index matches a query. It’s calculated as the number of documents in your index with exact matches for each word or phrase in your query.

Relevance is different from precision, which measures how well an index matches the language used to express it. For example, if you search for “dog” and find only people who have written about dogs but not cats, then it’s likely that they don’t have any content on cats (even though they might have written something about them). In this case, relevance would be low because most of their documents are related to dogs rather than cats; therefore they don’t have many relevant results available when searching for “cat” alone!

In addition to relevance being determined by matching words and phrases found within each document individually—which can lead us into situations where certain terms are excluded entirely—it also depends upon its context within other sets of data: what else is there about these two entities together?

Indexing Document Metadata

Document metadata is a special type of index, which can be used to index document properties and other information about the document. For example, if you have a property called “Title” that contains text from your document. Then you could use this property as an element in a mapping document. So that it would appear in your search results when someone searched for the text “The Hobbit: An Unexpected Journey”.

You can also use this kind of metadata to index content within documents themselves by using it as an element in your mapping documents (for example). If your mapping document says something like “Title equals ‘The Hobbit: An Unexpected Journey'”, then all documents whose ‘Title’ columns contain “The Hobbit” will be indexed under this rule.

Fragmentation of XML Documents

Fragmentation is a problem with XML documents and can cause performance issues. It’s often a result of using the universal index, which uses an algorithm to find all the occurrences of each element/attribute within a document. This process isn’t always perfect, though—in particular, it may miss some nodes (elements) or attributes in your document that you want to include in your indexing process.

The solution? Use the universal index only when necessary! You’ll still benefit from its speediness when searching over large amounts of data because it will allow you to search faster than if all nodes were indexed separately instead; however, if there are any parts of your document with very little content value then this approach may not work well enough as an alternative method since only those parts would be scanned during each query against their respective indexes rather than everyone at once.”


We’ve covered a lot of ground here, but hopefully, you feel more confident about the basics of indexing and how it can help your site. If you have any questions about this topic, please let us know in the comments below!

For More Info you can refer to our Blogs and official site:
1) Basic Concepts of MarkLogic and CRUD operation.
2) MarkLogic Official Guide.