In the first post, we talked about the general description of Semantic Web and how it can be useful. In this post, we would try to look at RDF which is the basic building block. RDF is Resource Description Framework which was defined as standard for encoding metadata by W3C in 1999. The idea for this standard is to make metadata readable by machines.
The standard is domain agnostic. It is fair to consider RDF for Semantic Web in the same way as we have HTML for the Web. The format of the RDF is that it has 3 parts.
A combination of this triplet is called a statement. The subject and object are two things in the world and the predicate connects them. Each statement represents a fact and a collection of facts forms a RDF graph. The graph is a If you recall from the earlier blog post, each of these statements combine together to form a graph like the one below
The Subject and the Object can be proper nouns like things, cities etc or abstract things like “resourcefulness”. The subject or object is called a Resource and we are defining the resource. Hence the Resource Definition Framework (RDF).
Having a unique global name is important
If you notice, the subject and object are names of resources. This name can create issues if it is not universal. Assume that we denote the Movie Gladiator with “MovieID:Gladiator” however someone else could have called it “mid:GladiatorTheMovie”. In this case in the sematic web terminology, the 2 subjects are quite different. Another problem is if someone used “MovieID:Gladiator” to represent something totally unrelated to the movie Gladiator. If this is the case then we might end up merging graphs which are unrelated. Hence, to remove this ambiguity, the name of the resource should be global and should be identified by Uniform Resource Identifier (URI)
Usually, these URIs are either a hash URI or a slash URI. For example,
http://www.knoldus.com/about/team/erik is a slash URI and
http://www.knoldus.com/about/team#erik is a hash URI. Earlier the slash URIs were expected to return a resource from the web and the hash URIs were not but this difference is blurring now.
The idea is to re-use the URIs that already exist and create new ones only if we have to. The URIs can be long names so it is usually best to represent a URI with its XML Qualified Name (QName). For example we can define the mapping as
and hence http://www.knoldus.com/about/team#erik can be written as knoldus:erik
Apart from the subject and the object, the predicate name must also be a URI and should only be created if one does not exist already. This allows in creating shared vocabularies on the web and allows us to use predicates as subject or object when the situation thus demands.
Thus, in the above example, if the predicate was represented as a URI as well then the statement a.k.a fact a.k.a triple would look like this
Literals and Blank Nodes
The object can be a URI or it can be a literal. Literals can be represented as String optionally with a language tag so that the machine reading the literal knows how to decipher it. Examples would be
In the above example, the first two literals are untyped, i.e. they do not have a specific type assigned whereas the last statement has a type associated which is and hence the last literal is a typed literal.
Sometimes we have a situation where in the subject or the object might not have a unique URI. In such cases it is called a Blank node. The blank node might have further predicates and objects associated with it but by itself it is unrecognizable. For example, in our movie RDF graph lets add a statement for reviewedBy and represent it like this
In this scenario, the movie represented by “MovieID:Gladiator” is reviewedBy someone whom we do not know as his URI is unknown or does not exist. However, we do know that this blank node has a predicate called name which has a literal “Vikas Hazrati” associated to it. In RDF graphs however it is common to give this Blank node a local URI and work with that. Hence in our case, the statements with the local URI could be