Playing with Semantic Data and MarkLogic

Reading Time: 4 minutes

In this blog, I will discuss the Semantic Data Model and its support in one of the enterprise NoSQL databases with an example.

What is a Data Model

Data models always come into the picture when we talk about any database and what type of data storage and query utility it provides. The data model defines the structure, as well as the relationships of data elements. Some of the popular data models that everyone must have heard of are the relational data model. Where data is kept in form of tables or relations.

What is Semantic Data Model

A semantic data model is about storing semantics (real meaning between two entities) in form of a specified format that tells about the relations between two entities. Consider the below two statements as the semantics of entities Author, Employee, Book, etc.

  • Clean Code is a Book.
  • Uncle Bob is the Author of “Clean Code”
  • Uncle Bob is an Employee of XCompany

If we represent it diagrammatically, It will look something like the below:

The magic of the Semantic model is that when we care about the semantics or the relations among the data, we are able to infer a few things which are not stored directly as information but true and inferred. For e.g. An employee can write a book, or an Employee has written a Book.

What has MarkLogic to do with Semantic Model

MarkLogic is one of the Database offering providing support to store and query the Semantic data. If you have not heard of this database before or installed it before, I found the documentation of MarkLogic very detailed which will give you a glimpse of introductory enlightenment of the Database. Likewise, If you want to install MarkLogic on Linux or ubuntu, you will find this blog very much helpful.

Starting with Experiment

I assume you have installed the MarkLogic and the server is running on your system. I am running a local cluster which will give you a start on localhost:8001/ and hopefully you would see an interface like the below:

MarkLogic Server Running on Local Cluster

Importing Semantic Data into MarkLogic

The data format here which I will be using for explanation is RDF triples. This is a format for Semantic Data Model. The data format will take the form of:

RDF Data Format.

For e.g. “Uncle Bob” has authored “Clean Code”.

For this blog, I will be using the same data set which is provided as part of a very detailed document from MarkLogic starting with Semantic Data. To check what is required before you can start hands-on, jump to the Pre-Requisites section.

To insert the triples on in the database, follow the simple steps below. I will be using

  • Open localhost:8000 (Query console in your browser)
  • Copy the below text into the query console and change the type of query to XQuery.
import module namespace sem = "" 
  at "/MarkLogic/semantics.xqy";
    "Clean Code"

The above snippet, I have borrowed from the documentation as I mentioned before. Once it is written, hit the “Run” button and you would see a response there.

Inserting RDF Triples

Once the data is inserted, we can verify using the triple count. using the below function in another tab.

We have inserted 2 triples. In my case, It was three, you would see 2.

Querying RDF Data

As part of the above RDF data, we already established the below facts.

  • Uncle_Bob lives on Earth
  • Uncle_Bob authored Clean Code

Let us use the SPARQL query language to query the same question “Who lives on Earth”? . SPARQL is the query that is used to run on semantic data. Here you can find more about the SPARQL query structure in W3 documentation.

import module namespace sem = "" 
  at "/MarkLogic/semantics.xqy";
  SELECT ?person
  WHERE { ?person <> "Earth" }

You can see the result in the query console in the third tab. It returns “person” as “Uncle_Bob” at the bottom.

Query Result: Uncle_BoB

Where to go from here?

MarkLogic gives you tools to import or insert RDFs into the database not only through query console but also through REST API and bulk load tool. Please look at the detailed documentation as per the use case. I hope you liked the summary here.


Written by 

Manish Mishra is Lead Software Consultant, with experience of more than 7 years. His primary development technology was Java. He fell for Scala language and found it innovative and interesting language and fun to code with. He has also co-authored a journal paper titled: Economy Driven Real Time Deadline Based Scheduling. His interests include: learning cloud computing products and technologies, algorithm designing. He finds Books and literature as favorite companions in solitude. He likes stories, Spiritual Fictions and Time Traveling fictions as his favorites.