RDF – Basic Building Blocks of Semantic Web


In the first post, we talked about the general description of Semantic Web and how it can be useful. In this post, we would try to look at RDF which is the basic building block. RDF is Resource Description Framework which was defined as standard for encoding metadata by W3C in 1999. The idea for this standard is to make metadata readable by machines.

The standard is domain agnostic. It is fair to consider RDF for Semantic Web in the same way as we have HTML for the Web. The format of the RDF is that it has 3 parts.

Screenshot from 2016-07-22 15-31-19.png

A combination of this triplet is called a statement. The subject and object are two things in the world and the predicate connects them. Each statement represents a fact  and a collection of facts forms a RDF graph. The graph is a If you recall from the earlier blog post, each of these statements combine together to form a graph like the one below

Screenshot from 2016-07-19 19-41-48

The Subject and the Object can be proper nouns like things, cities etc or abstract things like “resourcefulness”. The subject or object is called a Resource and we are defining the resource. Hence the Resource Definition Framework (RDF).

Having a unique global name is important

If you notice, the subject and object are names of resources. This name can create issues if it is not universal. Assume that we denote the Movie Gladiator with “MovieID:Gladiator” however someone else could have called it “mid:GladiatorTheMovie”. In this case in the sematic web terminology, the 2 subjects are quite different. Another problem is if someone used “MovieID:Gladiator” to represent something totally unrelated to the movie Gladiator. If this is the case then we might end up merging graphs which are unrelated. Hence, to remove this ambiguity, the name of the resource should be global and should be identified by Uniform Resource Identifier (URI)

Usually, these URIs are either a hash URI or a slash URI. For example,
http://www.knoldus.com/about/team/erik is a slash URI and
http://www.knoldus.com/about/team#erik is a hash URI. Earlier the slash URIs were expected to return a resource from the web and the hash URIs were not but this difference is blurring now.

The idea is to re-use the URIs that already exist and create new ones only if we have to. The URIs can be long names so it is usually best to represent a URI with its XML Qualified Name (QName). For example we can define the mapping as

prefix        namespace
------        ---------
knoldus       http://www.knoldus.com/about/team

and hence http://www.knoldus.com/about/team#erik can be written as knoldus:erik

Apart from the subject and the object, the predicate name must also be a URI and should only be created if one does not exist already. This allows in creating shared vocabularies on the web and allows us to use predicates as subject or object when the situation thus demands.

Thus, in the above example, if the predicate was represented as a URI as well then the statement a.k.a fact a.k.a triple would look like this

Literals and Blank Nodes

The object can be a URI or it can be a literal. Literals can be represented as String optionally with a language tag so that the machine reading the literal knows how to decipher it. Examples would be

"knoldus"
"knoldus"@en
"knoldus"^^

In the above example, the first two literals are untyped, i.e. they do not have a specific type assigned whereas the last statement has a type associated which is  and hence the last literal is a typed literal.

Sometimes we have a situation where in the subject or the object might not have a unique URI. In such cases it is called a Blank node.  The blank node might have further predicates and objects associated with it but by itself it is unrecognizable. For example, in our movie RDF graph lets add a statement for reviewedBy and represent it like this

Screenshot from 2016-08-02 19-12-59

In this scenario, the movie represented by “MovieID:Gladiator” is reviewedBy someone whom we do not know as his URI is unknown or does not exist. However, we do know that this blank node has a predicate called name which has a literal “Vikas Hazrati” associated to it. In RDF graphs however it is common to give this Blank node a local URI and work with that. Hence in our case, the statements with the local URI could be


MovieID:Gladiator      reviewedBy      local:_980_

local:_980_      name      "Vikas Hazrati"

Keep tuned.


KNOLDUS-advt-sticker

Advertisements

About Vikas Hazrati

Vikas is the Founding Partner @ Knoldus which is a group of software industry veterans who have joined hands to add value to the art of software development. Knoldus does niche Reactive and Big Data product development on Scala, Spark and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra). To know more, send a mail to hello@knoldus.com or visit www.knoldus.com
This entry was posted in Best Practices, big data, Scala and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s