Marklogic Server Architecture

Reading Time: 4 minutes

Introduction

Data is the new oil. And hence managing data is of utmost importance for any enterprise. With the huge amount of data that is generated for a market now and to provide superior performance over them, NoSQL databases are now ruling the tech industry. Within the numerous NoSQL databases in the market, this emerging one is catching the attention of numerous techies and businesses. Marklogic will definitely be having very prosperous future.

What is MarkLogic Server?

In a single sentence, Marklogic is an enterprise NoSQL multi-model database management system. Let’s now break down the above sentence to get a clearer picture.

  • Enterprise – Marklogic provides enterprise features like security, acid transactions, and real-time full-text search.
  • NoSQL – Marklogic is obviously a NoSQL database at its core, hence we can expect the flexibility and scalability of a NoSQL DB.
  • Multi-model – We can save all data no matter what shape or form it is in.
  • Database – Marklogic helps in the storage of data.
  • Management System – Marklogic just doesn’t dump the data, but it helps to govern it.

Marklogic Server Architecture

Marklogic is basically a clustered database that has multiple nodes running. The following is a layered structure inside 1 node.

Let’s understand the different layers in detail in the bottom-up approach.

Data Layer

At the bottom, we have the data layer, and at the bottom of that, there is the storage system for storing the data. It is multi-model, so there is a different kind of storage. It can store compressed text like json and XML, and we can understand the structure of those documents at this level. We have binary for storing images and videos, semantic for semantic triples, and semantic relationships.

Next, we have an extensive set of indexes, consisting of the main full-text index. It also has other specialized indexes like geospatial, scalar, semantic, relational, etc. It also has a security index at this level. All data in Marklogic is mediated through the security index as Marklogic provides security in the most fundamental level of data access to Marklogic.

Caches – Provides efficient access to data storage and data on disk.
Journal – The data in Marklogic, i.e the compressed data and the indexes are written in batches. So first a journal entry is made and the data is committed to disk. So in a case of disaster, before we commit the batch data efficiently, we still have the committed journal record we can start up and get back to a good known state and maintain a consistent state.

Transaction Controller – It handles all the above, mediating transactions across the cluster. It follows acid properties, so in case of even very complicated transactions, it will make it to all the nodes in the cluster together or not at all.

Query Layer

Broadcaster – At the base of the query layer is the broadcaster, federating queries across the cluster and to multiple threads within this node in the cluster.
Aggregator – Consolidates those partial results into a complete resultset.

Caches – Used to cache the queries that are executed frequently.

Evaluator – There are multiple evaluators in Marklogic, the 2 main ones being Javascript (for json) and xquery (for xml), as well as other specialized evaluators for more specific data formats like SQL for relational data. Supporting all these evaluators is an extensive library of functions that help them to make them even more capable.

Interface

The interface to these is Http rest endpoints. There is an extensive collection of endpoints to felicitate search documents, crud operations, administration, etc. We can define new endpoints can be defined as per business requirements for the required data services.

Client

If we are dealing with java/nodeJs there are client APIs that provide access to the same set of services. We can take our own endpoint specifications and compile them so that again the developers here can access them in an idiomatic way. If any other languages like python or shell script, we can just call rest HTTP in the normal way.

Conclusion

How does it fit in the bigger picture? Marklogic is a distributed DB with multiple nodes in the cluster. The above diagram is just 1 node. Each node could be just a data layer, query layer and interface, or a combination. This can be deployed on-premise or on the cloud. We could use one of the services in the cloud. Like if we are using the query service, what we have is an elastic pool of nodes focused just on the query layer that scales to our workload. If we are using the datahub service, we have a full-stack application that is dedicated to helping us integrate data. That’s the MarkLogic server and this is how it fits into our world.