
Myprevious post showed how to use a reasoner likePellet to make some interesting inferences about stated data. In that case, it demonstrated how a compound class could be determined with insufficient data to identify which of its components was satisfied.
Interesting though it was, this was a contrived question made against data that was specifically designed for it, and required a complex reasoner that is limited in its ability to scale. Less contrived scenarios are more interesting, precisely because they can be applied to real-world data at scale.
This leads to the question of what sorts of inferences are valid to make in OWL?
Specification
TheOWL 2 specification describes the vocabulary and how this maps intoDescription Logic. Most of it is rather obtuse due to the need to provide a precise mathematical definition.
Fortunately, a number of the concepts are relatively straightforward to follow. Indeed, many of the concepts can be codified into rules to be applied mechanically.
Entailment
The basic structure of most rules is a description of conditions that must be met by existing data, and new data that may be asserted.Mathematical logic describes this using theHorn Clause, and this has become the standard tool for logic programming systems as well.
As an example, we can make a statement saying that all people are mortal, so if an individual is a person, they must be mortal:
Mortal(X) :- Person(X).
Here, I have describedPerson
andMortal
as classes, andX
as an individual.
To read a Horn clause, the outcome is described first, given the conditions that come after the:-
symbol. So the above says, "X is Mortal, if X is a Person".
The outcome of the clause is referred to as theHead of the clause, while the conditions are referred to as theBody.
Multiple necessary conditions may be described in the body by separating them with commas. For instance, if a person has a child, then they are a parent:
Parent(X) :- Person(X), hasChild(X,Y).
Note that the body now includes an entityY
who is not referred to in the head.
A more mathematical approach for this statement might be:
∀x x∊Person: ∃y (x,y)∊hasChild
i.e. For allx
wherex
is a person, there exists ay
where relationship ofx,y
is a member of the role-relationshiphasChild
. Or, less strictly, forx
wherex
is a person, there exists ay
wherex.hasChild.y
To be honest, a lot of these notations are merely very precise and tedious way to express concepts that don't seem too difficult. But the precision allow mathematicians to eventually getting around to saying more interesting things, while proving that they got it right.
RDF and SPARQL
Data
The Resource Description Framework (RDF) and RDF Schema (RDFS) were designed to describe classes and data in a form that can be managed by a computer.
To declare the above classes we can say that the they are each instances of the schema type calledClass
, usingTurtle syntax:
@prefixrdfs:<http://www.w3.org/2000/01/rdf-schema#>.@prefix:<http://quoll.gearon.org/data/demo.ttl#>.:Personardfs:Class.:Mortalardfs:Class.:Parentardfs:Class.
The prefixes here will be used throughout this post. Strictly speaking, they should appear with each block of Turtle and each SPARQL query, but I will elide them for brevity.
We can then use these classes to declare instances of data, say:mary
and:jane
:
:marya:Person.:janea:Person.
We can also declare the role:hasChild
, and connect Mary and Jane with it:
:hasChildarfds:Property.:mary:hasChild:jane.
There is awhole technology stack that can start working with this already, but for now, I'm going to stick to the Horn Clauses we've already discussed.
Querying
Mortality
Let's consider the first clause:
Mortal(X) :- Person(X).
This means we are looking for all values ofX
that have a type ofPerson
. Once we have those values forX
, we want to declare that eachX
has a type ofMortal
. We can create a graph based on this query:
construct{?xa:Mortal}where{?xa:Person}
This would result in:
:marya:Mortal.
Alternatively, we could assert all this back in with the original data by sayinginsert
rather thanconstruct
:
insert{?xa:Mortal}where{?xa:Person}
With either expression, thewhere
clause describes the conditions for the body of the Horn Clause, and theinsert
orconstruct
describes the head of the clause.
Parenthood
Similarly, we can look at the second clause again in order to convert it:
Parent(X) :- Person(X), hasChild(X,Y).
The two expressions in body will result in a pair of patterns in thewhere
clause of the query:
construct{?xa:Parent}where{?xa:Person.?x:hasChild?y}
This would result in:
:marya:Parent.
RDFS Entailment
The above rules are fine for a specific application, but they aren't generally useful. Instead, we'd like to be able to describe data with greater expressivity, and identify rules that can be applied more generally.
In the early days of the Semantic Web, theW3C had planned on releasing RDF as a data model, along with an expressive language to describe that data. There was a lot of progress with prior systems likeDAML,OIL andDAML+OIL, but getting the semantics of these systems fully defined and correct is a very difficult task, leading to the creation of each of these systems, which all attempted to correct the issues of their predecessors. These efforts culminated in theWeb Ontology Language (OWL) which was formally released along with the first formal release of RDF in 2004.
Prior to the formal releases of these standards, there was already a lot of RDF in use, which meant that there was a need to describe that data. While the complicated details of ontologies were being worked out, the working groups published a minimal vocabulary that could describe a taxonomy of classes and roles. This was first published as a draft forRDF Schema, or RDFS.
Alongside RDFS came a set of "semantics" (a word that means "meaning") which described what the vocabulary meant when it said things. A blog post isn't the place to get into what I mean when I say, "Class" or "Role" (also called a "property"), but I expect that if you're reading this then you may already have an idea. Instead, I'm going to look at a few other elements of RDFS.
Subclass Descriptions
The first concept I'm going to consider is the "subclass". If individuals can be classified into a class, then a subclass is a refinement on that classification, resulting in a smaller group.
Considering the classes we had earlier, we can see that all instances of:Person
must be in the class:Mortal
, but there is nothing that says that all:Mortal
s must in the the class of:Person
. So:Person
is at most equal to:Mortal
, and most likely a smaller group. One way to write this mathematically is:
Person ⊑ Mortal
This is essentially the same notation as "subset" ⊆, and it means the same sort of thing, just in relation to classes.
RDFS expresses the same thing with therdfs:subClassOf
relationship:
:Personrdfs:subClassOf:Mortal.
In a similar way, all Parents are Persons, but not all people are parents:
Parent ⊑ Person
:Parentrdfs:subClassOf:Person.
Domain and Range Descriptions
Whether we use the term "role", "property", "predicate" or "attribute", we are referring to some kind of relationship between things. Once we introduce relationships it starts being useful to declare what a type of things can be related to each other. For instance,hasChild
is an interesting relationship to have between people, but is not typically associated with something like an item of furniture.
RDFS uses the term "domain" to describe the class of things that a property can be applied to. For instance, thehasChild
property can be applied to the class ofParent
. This is declared with:
:hasChildrdfs:domain:Parent.
In the same way, the things that a property can refer to are referred to as the "range":
:hasChildrdfs:range:Person.
Note that only parents have children, but any person can be someone's child.
Entailment
We now have enough to start looking at some of the entailments defined by RDFS.
Given that RDF uses anOpen World Assumption (OWA), RDFS never considers applying a property to something whose type does not match to be an error. One reason for this is because the data model can always have new data added, and it is presumed that this data will be consistent with whatever exists.
One effect of this is that we can sometimes derive what that new data must be. For instance, say we introduce a new person:
:jane:hasChild:kathy.
Until now,:jane
has only been a:Person
, but because:hasChild
has a domain of:Parent
we now know that Jane is also a parent. Similarly, we haven't seen Kathy before, but the appearance in this statement tells use that she is a person.
We can codify these with rules:
Parent(X) :- hasChild(X,Y).Person(Y) :- hasChild(X,Y).
But this is where having a vocabulary for describing the system becomes useful. Instead of describing rules for each property, we can base the rules on the vocabulary and create a more general form:
C(X) :- rdfs:domain(P,C), P(X,Y).D(Y) :- rdfs:range(P,D), P(X,Y).
Having variable predicates like this is often considered 2nd order logic, but in this case it's not really. It just looks like it is because of the syntax we use. Instead, I'll use the term that my supervisor called it: 1.5th order logic.
To explain why this is not 2nd order logic, consider it this way: We may represent
(subject,predicate,object)
aspredicate(subject,object)
but a true Predicate-Logic representation would be with ternary predicates like:statement(subject,predicate,object)
. This is not as useful for our purposes, and is aesthetically less inviting, but nevertheless more correct.
Domain and Range Entailment
The above rules can also be converted into SPARQL:
construct{?xa?c}where{?prdfs:domain?c.?x?p?y}
construct{?ya?d}where{?prdfs:range?d.?x?p?y}
The statements created from these queries are considered valid entailments from RDFS:
:marya:Parent.:janea:Parent.:janea:Person.:kathya:Person.
Note that some of the entailed statements already exist. This is fine, as graphs are considered to be a set of edges, so duplicates are ignored.
Subclass Entailment
Subclass entailment can also be described using queries. In this case, there are two entailments to consider.
We know that:
:Person
is a subclass of:Mortal
, therefore all instances of:Person
will be an instance of:Mortal
.:Parent
is a subclass of:Person
, therefore all instances of:Parent
will be an instance of:Person
.
Putting this together:
- Instances of
:Parent
must be instances of:Person
. - Instances of
:Person
must be instances of:Mortal
. - Therefore: instances of
:Parent
must be instances of:Mortal
.
The formal term for this is thatrdfs:subClassOf
is atransitive property. This is something that OWL describes explicitly, but I'll leave that for later. More succinctly:
A ⊑ B ⊑ C ⊢ A ⊑ C
Or as a Horn Clause:
rdfs:subClassOf(A,C) :- rdfs:subClassOf(A,B), rdfs:subClassOf(B,C).
This can be evaluated in SPARQL with:
construct{?ardfs:subClassOf?c}where{?ardfs:subClassOf?b.?brdfs:subClassOf?c}
The next part is considering the instances. We've already stated that if an entity is an instance of a class, then it will also be an instance of whatever that class is a subclass of (also called thesuperclass, just like in software development):
construct{?xa?d}where{?xa?c.?crdfs:subClassOf?d}
With all of these rules in place, suddenly we have:
:Parentrdfs:subClassOf:Mortal.:marya:Parent.:marya:Mortal.:janea:Parent.:janea:Mortal.:kathya:Person.:kathya:Parent.:kathya:Mortal.
Incidentally, bothrdfs:domain
andrdfs:range
have a domain ofrdf:Property
and a range ofrdfs:Class
. So a full entailment regime means that we don't need to declare that:hasChild
is a property, and we don't need to explicitly declare that any of our classes ardfs:Class
. I will sometimes include these declaration for the sake of clear documentation, but they're not needed.
More Entailment
The patterns of RDFS entailment are all described in theRDF Semantics specification, and collected in thesection 9.2 of that document.
The rules described above correspond to several of the RDFS Entailment patterns:
# rdfs2construct{?xa?c}where{?prdfs:domain?c.?x?p?y}# rdfs3construct{?ya?d}where{?prdfs:range?d.?x?p?y}# rdfs11construct{?ardfs:subClassOf?c}where{?ardfs:subClassOf?b.?brdfs:subClassOf?c}# rdfs9construct{?xrdfs:subClassOf?d}where{?xa?c.?crdfs:subClassOf?d}
Using the patterns above, the process for converting all of the entailment patterns to executable rules takes very few steps.
# rdfs1construct{?aaaardfs:Datatype}where{?xxx?yyy?dddFILTERisLiteral(?ddd)BIND(datatype(?ddd)AS?aaa)}# rdfs2construct{?yyya?xxx}where{?aaardfs:domain?xxx.?yyy?aaa?zzz}# rdfs3construct{?zzza?xxx}where{?aaardfs:range?xxx.?yyy?aaa?zzz}# rdfs4aconstruct{?xxxardfs:Resource}where{?xxx?aaa?yyy}# rdfs4bconstruct{?yyyardfs:Resource}where{?xxx?aaa?yyyFILTER!isLiteral(?yyy)}# rdfs5construct{?xxxrdfs:subPropertyOf?zzz}where{?xxxrdfs:subPropertyOf?yyy.?yyyrdfs:subPropertyOf?zzz}# rdfs6construct{?xxxrdfs:subPropertyOf?xxx}where{?xxxardf:Property}# rdfs7construct{?xxx?bbb?yyy}where{?aaardfs:subPropertyOf?bbb.?xxx?aaa?yyy}# rdfs8construct{?xxxrdfs:subClassOfrdfs:Resource}where{?xxxardfs:Class}# rdfs9construct{?zzza?yyy}where{?xxxrdfs:subClassOf?yyy.?zzza?xxx}#rdfs10construct{?xxxrdfs:subClassOf?xxx}where{?xxxardfs:Class}# rdfs11construct{?xxxrdfs:subClassOf?zzz}where{?xxxrdfs:subClassOf?yyy.?yyyrdfs:subClassOf?zzz}# rdfs12construct{?xxxrdfs:subPropertyOfrdfs:member}where{?xxxardfs:ContainerMembershipProperty}# rdfs13construct{?xxxrdfs:subClassOfrdfs:Literal}where{?xxxardfs:Datatype}
Executing them all together is a task for system that can coordinate the rules, and is typically tied to aspecific database implementation though this is not necessary (e.g.Naga, though I have yet to port this to SPARQL).
Was This Needed?
Well… probably not. If you're reading this then maybe you already knew everything I had to say here. However, I also have more useful things to say, and I'm about to go on to say it. This further discussion will build on what I've discussed so far, and I wanted to establish a base of understanding before moving on.
Next
I continue this discussion inmy next post...
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse