Posted onSep 28, 2022 • Edited onJan 7, 2023

Classification

#rdf #owl #sparql #rules

Myprevious post showed how to use a reasoner likePellet to make some interesting inferences about stated data. In that case, it demonstrated how a compound class could be determined with insufficient data to identify which of its components was satisfied.

Interesting though it was, this was a contrived question made against data that was specifically designed for it, and required a complex reasoner that is limited in its ability to scale. Less contrived scenarios are more interesting, precisely because they can be applied to real-world data at scale.

This leads to the question of what sorts of inferences are valid to make in OWL?

Specification

TheOWL 2 specification describes the vocabulary and how this maps intoDescription Logic. Most of it is rather obtuse due to the need to provide a precise mathematical definition.

Fortunately, a number of the concepts are relatively straightforward to follow. Indeed, many of the concepts can be codified into rules to be applied mechanically.

Entailment

The basic structure of most rules is a description of conditions that must be met by existing data, and new data that may be asserted.Mathematical logic describes this using theHorn Clause, and this has become the standard tool for logic programming systems as well.

As an example, we can make a statement saying that all people are mortal, so if an individual is a person, they must be mortal:

Mortal(X) :- Person(X).

Here, I have describedPerson andMortal as classes, andX as an individual.

To read a Horn clause, the outcome is described first, given the conditions that come after the:- symbol. So the above says, "X is Mortal, if X is a Person".

The outcome of the clause is referred to as theHead of the clause, while the conditions are referred to as theBody.

Multiple necessary conditions may be described in the body by separating them with commas. For instance, if a person has a child, then they are a parent:

Parent(X) :- Person(X), hasChild(X,Y).

Note that the body now includes an entityY who is not referred to in the head.

A more mathematical approach for this statement might be:

∀x x∊Person: ∃y (x,y)∊hasChild

i.e. For allx wherex is a person, there exists ay where relationship ofx,y is a member of the role-relationshiphasChild. Or, less strictly, forx wherex is a person, there exists ay wherex.hasChild.y

To be honest, a lot of these notations are merely very precise and tedious way to express concepts that don't seem too difficult. But the precision allow mathematicians to eventually getting around to saying more interesting things, while proving that they got it right.

RDF and SPARQL

Data

The Resource Description Framework (RDF) and RDF Schema (RDFS) were designed to describe classes and data in a form that can be managed by a computer.

To declare the above classes we can say that the they are each instances of the schema type calledClass, usingTurtle syntax:

@prefixrdfs:<http://www.w3.org/2000/01/rdf-schema#>.@prefix:<http://quoll.gearon.org/data/demo.ttl#>.:Personardfs:Class.:Mortalardfs:Class.:Parentardfs:Class.

The prefixes here will be used throughout this post. Strictly speaking, they should appear with each block of Turtle and each SPARQL query, but I will elide them for brevity.

We can then use these classes to declare instances of data, say:mary and:jane:

:marya:Person.:janea:Person.

We can also declare the role:hasChild, and connect Mary and Jane with it:

:hasChildarfds:Property.:mary:hasChild:jane.

There is awhole technology stack that can start working with this already, but for now, I'm going to stick to the Horn Clauses we've already discussed.

Querying

Mortality

Let's consider the first clause:

Mortal(X) :- Person(X).

This means we are looking for all values ofX that have a type ofPerson. Once we have those values forX, we want to declare that eachX has a type ofMortal. We can create a graph based on this query:

construct{?xa:Mortal}where{?xa:Person}

This would result in:

:marya:Mortal.

Alternatively, we could assert all this back in with the original data by sayinginsert rather thanconstruct:

insert{?xa:Mortal}where{?xa:Person}

With either expression, thewhere clause describes the conditions for the body of the Horn Clause, and theinsert orconstruct describes the head of the clause.

Parenthood

Similarly, we can look at the second clause again in order to convert it:

Parent(X) :- Person(X), hasChild(X,Y).

The two expressions in body will result in a pair of patterns in thewhere clause of the query:

construct{?xa:Parent}where{?xa:Person.?x:hasChild?y}

This would result in:

:marya:Parent.

RDFS Entailment

The above rules are fine for a specific application, but they aren't generally useful. Instead, we'd like to be able to describe data with greater expressivity, and identify rules that can be applied more generally.

In the early days of the Semantic Web, theW3C had planned on releasing RDF as a data model, along with an expressive language to describe that data. There was a lot of progress with prior systems likeDAML,OIL andDAML+OIL, but getting the semantics of these systems fully defined and correct is a very difficult task, leading to the creation of each of these systems, which all attempted to correct the issues of their predecessors. These efforts culminated in theWeb Ontology Language (OWL) which was formally released along with the first formal release of RDF in 2004.

Prior to the formal releases of these standards, there was already a lot of RDF in use, which meant that there was a need to describe that data. While the complicated details of ontologies were being worked out, the working groups published a minimal vocabulary that could describe a taxonomy of classes and roles. This was first published as a draft forRDF Schema, or RDFS.

Alongside RDFS came a set of "semantics" (a word that means "meaning") which described what the vocabulary meant when it said things. A blog post isn't the place to get into what I mean when I say, "Class" or "Role" (also called a "property"), but I expect that if you're reading this then you may already have an idea. Instead, I'm going to look at a few other elements of RDFS.

Subclass Descriptions

The first concept I'm going to consider is the "subclass". If individuals can be classified into a class, then a subclass is a refinement on that classification, resulting in a smaller group.

Considering the classes we had earlier, we can see that all instances of:Person must be in the class:Mortal, but there is nothing that says that all:Mortals must in the the class of:Person. So:Person is at most equal to:Mortal, and most likely a smaller group. One way to write this mathematically is:

Person ⊑ Mortal

This is essentially the same notation as "subset" ⊆, and it means the same sort of thing, just in relation to classes.

RDFS expresses the same thing with therdfs:subClassOf relationship:

:Personrdfs:subClassOf:Mortal.

In a similar way, all Parents are Persons, but not all people are parents:

Parent ⊑ Person

:Parentrdfs:subClassOf:Person.

Domain and Range Descriptions

Whether we use the term "role", "property", "predicate" or "attribute", we are referring to some kind of relationship between things. Once we introduce relationships it starts being useful to declare what a type of things can be related to each other. For instance,hasChild is an interesting relationship to have between people, but is not typically associated with something like an item of furniture.

RDFS uses the term "domain" to describe the class of things that a property can be applied to. For instance, thehasChild property can be applied to the class ofParent. This is declared with:

:hasChildrdfs:domain:Parent.

In the same way, the things that a property can refer to are referred to as the "range":

:hasChildrdfs:range:Person.

Note that only parents have children, but any person can be someone's child.

Entailment

We now have enough to start looking at some of the entailments defined by RDFS.

Given that RDF uses anOpen World Assumption (OWA), RDFS never considers applying a property to something whose type does not match to be an error. One reason for this is because the data model can always have new data added, and it is presumed that this data will be consistent with whatever exists.

One effect of this is that we can sometimes derive what that new data must be. For instance, say we introduce a new person:

:jane:hasChild:kathy.

Until now,:jane has only been a:Person, but because:hasChild has a domain of:Parent we now know that Jane is also a parent. Similarly, we haven't seen Kathy before, but the appearance in this statement tells use that she is a person.

We can codify these with rules:

Parent(X) :- hasChild(X,Y).Person(Y) :- hasChild(X,Y).

But this is where having a vocabulary for describing the system becomes useful. Instead of describing rules for each property, we can base the rules on the vocabulary and create a more general form:

C(X) :- rdfs:domain(P,C), P(X,Y).D(Y) :- rdfs:range(P,D), P(X,Y).

Having variable predicates like this is often considered 2nd order logic, but in this case it's not really. It just looks like it is because of the syntax we use. Instead, I'll use the term that my supervisor called it: 1.5th order logic.

To explain why this is not 2nd order logic, consider it this way: We may represent(subject,predicate,object) aspredicate(subject,object) but a true Predicate-Logic representation would be with ternary predicates like:statement(subject,predicate,object). This is not as useful for our purposes, and is aesthetically less inviting, but nevertheless more correct.

Domain and Range Entailment

The above rules can also be converted into SPARQL:

construct{?xa?c}where{?prdfs:domain?c.?x?p?y}

construct{?ya?d}where{?prdfs:range?d.?x?p?y}

The statements created from these queries are considered valid entailments from RDFS:

:marya:Parent.:janea:Parent.:janea:Person.:kathya:Person.

Note that some of the entailed statements already exist. This is fine, as graphs are considered to be a set of edges, so duplicates are ignored.

Subclass Entailment

Subclass entailment can also be described using queries. In this case, there are two entailments to consider.

We know that:

:Person is a subclass of:Mortal, therefore all instances of:Person will be an instance of:Mortal.
:Parent is a subclass of:Person, therefore all instances of:Parent will be an instance of:Person.

Putting this together:

Instances of:Parent must be instances of:Person.
Instances of:Person must be instances of:Mortal.
Therefore: instances of:Parent must be instances of:Mortal.

The formal term for this is thatrdfs:subClassOf is atransitive property. This is something that OWL describes explicitly, but I'll leave that for later. More succinctly:

A ⊑ B ⊑ C  ⊢  A ⊑ C

Or as a Horn Clause:

rdfs:subClassOf(A,C) :- rdfs:subClassOf(A,B), rdfs:subClassOf(B,C).

This can be evaluated in SPARQL with:

construct{?ardfs:subClassOf?c}where{?ardfs:subClassOf?b.?brdfs:subClassOf?c}

The next part is considering the instances. We've already stated that if an entity is an instance of a class, then it will also be an instance of whatever that class is a subclass of (also called thesuperclass, just like in software development):

construct{?xa?d}where{?xa?c.?crdfs:subClassOf?d}

With all of these rules in place, suddenly we have:

:Parentrdfs:subClassOf:Mortal.:marya:Parent.:marya:Mortal.:janea:Parent.:janea:Mortal.:kathya:Person.:kathya:Parent.:kathya:Mortal.

Incidentally, bothrdfs:domain andrdfs:range have a domain ofrdf:Property and a range ofrdfs:Class. So a full entailment regime means that we don't need to declare that:hasChild is a property, and we don't need to explicitly declare that any of our classes ardfs:Class. I will sometimes include these declaration for the sake of clear documentation, but they're not needed.

More Entailment

The patterns of RDFS entailment are all described in theRDF Semantics specification, and collected in thesection 9.2 of that document.

The rules described above correspond to several of the RDFS Entailment patterns:

# rdfs2construct{?xa?c}where{?prdfs:domain?c.?x?p?y}# rdfs3construct{?ya?d}where{?prdfs:range?d.?x?p?y}# rdfs11construct{?ardfs:subClassOf?c}where{?ardfs:subClassOf?b.?brdfs:subClassOf?c}# rdfs9construct{?xrdfs:subClassOf?d}where{?xa?c.?crdfs:subClassOf?d}

Using the patterns above, the process for converting all of the entailment patterns to executable rules takes very few steps.

# rdfs1construct{?aaaardfs:Datatype}where{?xxx?yyy?dddFILTERisLiteral(?ddd)BIND(datatype(?ddd)AS?aaa)}# rdfs2construct{?yyya?xxx}where{?aaardfs:domain?xxx.?yyy?aaa?zzz}# rdfs3construct{?zzza?xxx}where{?aaardfs:range?xxx.?yyy?aaa?zzz}# rdfs4aconstruct{?xxxardfs:Resource}where{?xxx?aaa?yyy}# rdfs4bconstruct{?yyyardfs:Resource}where{?xxx?aaa?yyyFILTER!isLiteral(?yyy)}# rdfs5construct{?xxxrdfs:subPropertyOf?zzz}where{?xxxrdfs:subPropertyOf?yyy.?yyyrdfs:subPropertyOf?zzz}# rdfs6construct{?xxxrdfs:subPropertyOf?xxx}where{?xxxardf:Property}# rdfs7construct{?xxx?bbb?yyy}where{?aaardfs:subPropertyOf?bbb.?xxx?aaa?yyy}# rdfs8construct{?xxxrdfs:subClassOfrdfs:Resource}where{?xxxardfs:Class}# rdfs9construct{?zzza?yyy}where{?xxxrdfs:subClassOf?yyy.?zzza?xxx}#rdfs10construct{?xxxrdfs:subClassOf?xxx}where{?xxxardfs:Class}# rdfs11construct{?xxxrdfs:subClassOf?zzz}where{?xxxrdfs:subClassOf?yyy.?yyyrdfs:subClassOf?zzz}# rdfs12construct{?xxxrdfs:subPropertyOfrdfs:member}where{?xxxardfs:ContainerMembershipProperty}# rdfs13construct{?xxxrdfs:subClassOfrdfs:Literal}where{?xxxardfs:Datatype}

Executing them all together is a task for system that can coordinate the rules, and is typically tied to aspecific database implementation though this is not necessary (e.g.Naga, though I have yet to port this to SPARQL).

Was This Needed?

Well… probably not. If you're reading this then maybe you already knew everything I had to say here. However, I also have more useful things to say, and I'm about to go on to say it. This further discussion will build on what I've discussed so far, and I wanted to establish a base of understanding before moving on.

I continue this discussion inmy next post...