DEV Community

Cover image for Stay Classy in OWL
Paula Gearon
Paula Gearon

Posted on

Stay Classy in OWL

In an effort to publish this quickly, I am posting without proofreading. Errata is welcome.

In the last post I discussed using rules to generate RDF statements that are entailed by RDFS. This is useful stuff, but is very limited due to the lack of expressivity of RDFS.

This is to be expected, since RDFS was a limited vocabulary that was released well before the full Web Ontology Language (OWL) was released. But if we adopt OWL, then what entailments will be valid and useful?

Description Logics

As shown in my first post in this series, OWL provides a vocabulary for a Description Logic. In particular, OWL2 conforms closely to a family of logics known as ๐’ฎโ„›๐’ชโ„๐’ฌ. This name indicates some of the elements of the vocabulary:

  • ๐’ฎ: An Abbreviation for ๐’œโ„’๐’ž
  • ๐’œโ„’๐’ž: Contains concepts including:
    • Classes C, and roles r
    • โŠค (top, or everything)
    • โŠฅ (bottom, or nothing)
    • โŠ“ (conjunctions, or intersections)
    • โŠ” (disjunctions, or unions)
    • ยฌ (negation, or inverse)
    • โˆƒr.C (existential role restrictions)
    • โˆ€r.C (universal role restrictions)
  • โ„‹: Class hierarchies (classes, with subclasses)
  • โ„›: Complex role inclusion. This is both a role hierarchy (indicated by โ„‹) and role composition.
  • โ„: Inverse roles.
  • ๐’ช: Nominals.
  • ๐’ฉ: Cardinality restrictions on roles.
  • ๐’ฌ: Qualified cardinality. This includes cardinality restrictions (indicated by ๐’ฉ), and can also qualify them to classes.

To explain each of the above:

  • Classes are a way to classify things. Entities can be classified multiple ways, in which case we say the entity has a type of the class, or that the entity is an instance of the class. e.g. Person can be a class and the entity we name "Alice" may be an instance of Person.
  • Roles describe relationships between entities. e.g. an entity named "Alice" may have a hasChild relationship to an entity named "Susan".
  • Top is a universal class that every entity is an instance of.
  • Bottom is an empty class that no entity is an instance of.
  • Conjunctions are the combination of multiple classes where every class must apply. e.g. A wooden chair is an instance of the conjunction formed from the classes Wooden and Furniture.
  • Disjunctions are the combination of multiple classes where one or more classes must apply. e.g. OfficeEquipment could be a disjunction of OfficeFurniture, ComputerEquipment, Stationery, and KitchenSupplies.
  • Negation is used to create a class of everything that is not the negated class. e.g. The negation of the class of Visible things is everything that cannot be seen. That includes both physical objects, like air, but also arbitrary concepts like "imagination" or "price". It is the entire universe of things that are not in the Visible class.
  • Existential role restrictions means that a given relationship must exist in order to be a member of a class. e.g. To be a Parent an entity must have a hasChild relationship to another entity.
  • Universal role restrictions means that all use of a role has to be with a specific class. e.g. hasChild can be defined to always refer to instances of the class Child.
  • Role hierarchy indicates more general or specific relationships between roles. This creates sub-property and super-property relationships and is roughly analogous to sub-classes and super-classes. e.g. hasDaughter is a more specific role than hasChild, while hasDescendent is a more general role. So hasDaughter is a sub-property for hasChild, and hasChild is a sub-property for hasDescendent.
  • Inverse roles refers to the relationship that goes in the opposite direction between entities. e.g. hasParent is the inverse role to hasChild.
  • Nominals describes a class of items that is defined by its membership in the class. e.g. PresidentOfTheUnitedStates can be defined as a class of the 46 people who have had that position (as of this writing).
  • Number restrictions (or cardinality restrictions) refers to a minimum or maximum number of relationships. e.g. To be a member of FullTimeStudent a university might require that a student have a minimum of 4 enrolledIn relationships.
  • Qualified cardinality is a more specific type of nominal, where the class of the relationship must apply. e.g. For a student to be a MathMajor they might require a minimum of 10 passed relationships to instances of MathSubject.

OWL

These constructs are all supported by OWL, and each of those constructs has a mapping to RDF. This means that for each Description Logic expression there is a way to express that expression precisely in RDF. It is data like this that was read and processed by Pellet in the Oedipus example in my initial post.

The problem with systems like Pellet is that they rely on memory to explore all possibilities for the data. Consequently, they can struggle with large bodies of data, and are unable to handle large ontologies such as SNOMED CT. We have a much better chance of scaling OWL processing if we can use operations that databases are designed to provide.

The principal database operation is the Query, and so the various approaches to scaling try to build on this operation.

Query Rewriting

One approach to identifying entailments in OWL is by using it to rewrite queries.

The first to consider is a general approach to rewriting queries, where the query can be expanded to ask for parts of the ontology. For instance, consider asking for the classes a resource is an instance of:

select ?class
where { my:resource a ?class }
Enter fullscreen mode Exit fullscreen mode

This can be rewritten to consider superclasses as well:

select ?class
where { my:resource a/rdfs:subClassOf* ?class }
Enter fullscreen mode Exit fullscreen mode

Where the * modifier describes the transitive closure of the rdfs:subClassOf relationship, starting with the 0 step, meaning that it includes the step where the class is the immediate type of the resource.

A more specific case is using the ontology to rewrite the query. For instance, if the property my:prop is transitive, then querying for it can always be expanded to use the + modifier. So the query:

select ?related
where { my:resource my:prop ?related }
Enter fullscreen mode Exit fullscreen mode

Would be modified to:

select ?related
where { my:resource my:prop+ ?related }
Enter fullscreen mode Exit fullscreen mode

These are some trivial examples, but some great work was done on this by โ€ชHรฉctor Pรฉrez-Urbina in his PhD dissertation.

Rules

Another approach to scalable entailment is using rules. The mechanism for this is using certain existing data structures to generate new data structures that get inserted alongside the original data. I described this in the last post, where I used construct queries to obtain the data generated by each rule. The other option is to send this generated statements back into the graph by changing construct to insert.

For instance, the transitive closure of the property my:prop above could be created with an update operation:

insert { ?a my:prop ?c }
where { ?a my:prop ?b . ?b my:prop ?c }
Enter fullscreen mode Exit fullscreen mode

Note that this is a single update, and not actually a full "rule". Running this will only extend all of the my:prop relations by 1 step, doubling the length to 2. But running this iteratively will extend the maximum length of the closure by doubling, so it will rapidly cover the entire closure.

What we can learn from this is that rule systems can be built from query/update operations like this, but they need a mechanism for scheduling the rules to be run over and over when needed, and to stop when nothing new is being generated. The basic algorithm for doing this is called RETE, and I discussed this and an implementation at Clojure/conj 2016.

Because rules are built on a querying mechanism that is foundational to the database, they are often very fast. They also rely on the main database storage, so they can scale with the database. They allow computational complexity to be pre-calculated, with the results stored. This allows subsequent operations to rely on space complexity instead. These are very important characteristics for working with large quantities of data, and for this reason I will be focusing on this approach to entailment.

Rule Justification

Many OWL operations describe entailments that can be expressed as a rule. For instance, the OWL 2 Structural Specification and Functional-Style Syntax document describes many constructs with examples of what they entail. There are many examples of this, but a simple one is the Symmetric Object Property which describes that a:friend is symmetric. Consequently, if "Brian is a friend of Peter" then this entails "Peter is a friend of Brian".

The general rule for symmetry can then be given as:

insert {?b ?prop ?a}
where {?a ?prop ?b . ?prop a owl:SymmetricProperty}
Enter fullscreen mode Exit fullscreen mode

This can seem to be a straightforward operation, but interesting insights come about when we consider exactly why these new statements are allowed to be asserted.

Validity and Consistency

Logic systems can be described using a pair of properties: validity and consistency.

  • A system is valid if every interpretation of the systems leads to conclusions that are true.
  • A system is invalid if there exists an interpretation where the conclusions are not true.
  • A system is consistent if there exists an interpretation where the conclusions are true.
  • A system is inconsistent if there are no interpretations where the conclusions are true.

The interpretation of a system is a selection of values that conform to the system.

To explain some of this, let's use the mathematical domain. In this case, the interpretation will usually be a selection of numbers to associate with values.

Valid

Since every possible interpretation of a valid system is true, these are also referred to as a tautology. At first glance, this does not seem that useful, however it is a very important concept.

An example of a valid math expression is:
| x | โ‰ฅ x

The various interpretations of this system are the values that x can take in the domain of real numbers โ„. In this case, it doesn't matter what value x takes, since the equation will always be true.

This is still logic, so we can introduce an or operation, with another valid equation:
x > 1 โˆจ x < 2

Again, this is a tautology, as it will be true for every interpretation of x in the domain.

Invalid

This applies to any system which is not valid. That simply means that there exists an interpretation where truth does not hold. For instance:
x > 2

This is true when x is 3 or 4, but it is not true when x is 1 or 2. Lots of systems are Invalid, since tautologies (i.e. valid systems) don't have a lot to say.

Consistent

A system is consistent when there exists an interpretation that leads to truth. The example invalid statement also happens to be consistent:
x > 2

As already mentioned, when x is 3 then the statement is true, so this system is consistent.

Inconsistent

This indicates that a system is not consistent, meaning that there are no possible interpretations which can be true. For instance:
x < 3 โˆง x > 4

There are no numbers that meet both of these conditions, and therefore there are no interpretations where this is true.

Relations to Each Other

When considered in relation to one another we see the following states for systems:

  • Always true: Valid and Consistent
  • Sometimes true, Sometimes false: Invalid and Consistent
  • Always false: Invalid and Inconsistent

Entailment

Entailment is the operation of finding new statements that are true in every possible interpretation of a system, given its semantics. This is possible when using the Open World Assumption (OWA), since new statements can always be added, which is a concept that is sometimes expressed as, "Anyone can say Anything about Anything" (this phrase appears in an early draft of the RDF Concepts and Abstract Data Model, and also in the book "Semantic Web for the Working Ontologist", by Allemang, Hendler, and now in the second edition, Gandon. I will refer to this book as SWWO).

This cuts both ways though: not only does it mean that new statement may be created, but it also limits which statements may be created. The OWA says that there are possibly a lot of other statements that the system does not describe which could lead to a statement not being possible in every possible interpretation. When it comes to identifying new statements that can be entailed, it is a useful exercise to consider all the possible constructs that could lead to the statement leading to a falsehood, even if it requires a convoluted set of statements to get there.

Tableaux

One approach in using validity and consistency is to determine if a given system entails a statement using the Tableaux Algorithm. In this case, for a given system ๐‘ฎ an entailment ๐‘จ is described as:
๐‘ฎ โŠจ ๐‘จ
This entailment can only be true if:
๐‘ฎ โ‹ƒ {ยฌ๐‘จ} is inconsistent

This is a useful test, because the algorithm need only discover a single false case to prove inconsistency. Pellet is an implementation of this algorithm, and while it does not scale to very large ontologies, it is nevertheless very powerful.

Rules

Another approach to entailment is to use rules. As mentioned above, this can be done when we know that a statement is legal in every possible interpretation of the system.

There is a defined subset of possible reasoning in OWL2 which can lead to entailments via rules. This subset is called the OWL 2 RL Profile, and the rules can be found in tables 4 through to table 9 in the rules section of the OWL 2 Profiles document. It is some of these rules that I want to explore here and in other posts.

Intersections

A practical application of all of this can be seen in Intersections.

As described in SWWO, an intersection of classes :A and :B can be described using a subclass relationship:
:IntersectionAB as subclass of :A and :B
This is described in Turtle as:

:IntersectionAB rdfs:subClassOf :A, :B .
Enter fullscreen mode Exit fullscreen mode

Let's consider what can be inferred from this.

If we have an instance of :IntersectionAB called :x, then this is represented as:
:x as instance of :IntersectionAB

:x a :IntersectionAB .
Enter fullscreen mode Exit fullscreen mode

The rule rdfs9 is:

insert { ?zzz a ?yyy } where { ?xxx rdfs:subClassOf ?yyy . ?zzz a ?xxx }
Enter fullscreen mode Exit fullscreen mode

Applying this will result in :x being an instance of both :A and :B:
:x as instance of :A and :B

:x a :IntersectionAB .
:x a :A .
:x a :B .
Enter fullscreen mode Exit fullscreen mode

These inferences are valid, because the definition of the rdfs:subClassOf relationship states and instances of a subclass will also be instances of the superclass. There are no interpretations where this does not hold.

Class Membership

Another possibility is when :y is a member of both :A and :B.
:y is an instance of both :A and :B

:y a :A .
:y a :B .
Enter fullscreen mode Exit fullscreen mode

It would seem reasonable to infer that :y is therefore an instance of :IntersectionAB. However, inferences are only valid if they apply in every possible system, and the Open World Assumption (OWA) must allow for any new consistent statement. There are statements that can be introduced that are both consistent with the existing statements, and inconsistent with inferring :y as a member of :IntersectionAB.

To see an example of this, we can introduce a two new classes, called :C and :D, are the complements of each other. This means that anything that is a member of :C is not a member of :D, and vice versa. We can also make :IntersectionAB a subclass of :C:
:C as a complement of :D and part of the intersection

:IntersectionAB rdfs:subClassOf :A, :B, :C .
:C owl:complementOf :D .
Enter fullscreen mode Exit fullscreen mode

If :y becomes an instance of :D then if cannot be a part of the intersection, since it cannot be an instance of :C.
:y as an instance of :A, :B, and :D

:IntersectionAB rdfs:subClassOf :A, :B, :C .
:C owl:complementOf :D .
:y a :A, :B, :D .
Enter fullscreen mode Exit fullscreen mode

Regardless of how contrived the example may be, the fact that any such example exists indicates that the inference may not be made.

OWL Intersections

The problem with inferring membership in an intersection above is that the intersection is open, meaning that new classes can be added to the intersection, and those classes can preclude an instance of the other classes from becoming a member.

OWL addresses this by defining an closed intersection using an RDF Collection. This uses a linked list structure that does not allow for extra members.

Redefining :IntersectionAB we can express this in TTL as:

:IntersectionAB owl:intersectionOf (:A :B) .
Enter fullscreen mode Exit fullscreen mode

This looks short and simple, but expands into a longer set of triples:

:IntersectionAB owl:intersectionOf _:b1 .
_:b1 rdf:first :A .
_:b1 rdf:rest _:b2 .
_:b2 rdf:first :B .
_:b2 rdf:rest rdf:nil .
Enter fullscreen mode Exit fullscreen mode

Closed Intersection of :A and :B
RDF defines collections to have a specific structure with each node in the list having a single rdf:first and rdf:rest connection, terminating in the rdf:nil node. This means that it is not a valid construct to include any more elements in the collection.

As a consequence, if there is an element :y which is an instance of both :A and :B, then it is not possible to add in triples that make :y a member of something that is excluded from the intersection. Therefore, it is valid to infer that :y is in the intersection:

:y a :IntersectionAB .
Enter fullscreen mode Exit fullscreen mode

This is described in the OWL semantics, and demonstrated in the OWL 2 RL profile in the rule cls-int1 found in table 6. This rule states:

IF

T(?c, owl:intersectionOf, ?x)
LIST[?x, ?c1, ..., ?cn]
T(?y, rdf:type, ?c1)
T(?y, rdf:type, ?c2)
...
T(?y, rdf:type, ?cn)
Enter fullscreen mode Exit fullscreen mode

THEN

T(?y, rdf:type, ?c)
Enter fullscreen mode Exit fullscreen mode

In English it says:
If ?c is an intersection described by ?x
and ?x is a list containing ?c1 through to ?cn
and ?y is an instance of every element in that list
Then ?y is an instance of the intersection ?c

In Practice

Unfortunately, this is tricky to describe in SPARQL. It is easy to check if a value is an instance of one or more members of a list, but how can you check if it is a member of every member of the list?

Let's start with some example data, and try to perform the entailment on it.

First of all, we can define the intersection of 3 classes: :A, :B and :C. Then we'll describe 3 objects: :m, :n, and :o.

  • The :m object will be an instance of all 3 classes
  • The :n object will be an instance of 2 of the 3 classes
  • The :o object will not be an instance of any of the classes

We should be able to find that :m is a member of the intersection, while :n and :o are not.

:IntABC owl:intersectionOf (:A :B :C).
:m a :A, :B, :C.
:n a :A, :C.
:o a :D.
Enter fullscreen mode Exit fullscreen mode

Let's start with finding a value ?y which is a member of the intersection class ?c:

select distinct ?y
where {
  ?c owl:intersectionOf ?x .
  ?x rdf:rest*/rdf:first ?cn .
  ?y a ?cn }
Enter fullscreen mode Exit fullscreen mode

This returns :m and :n, since they are both instances of classes in the intersection.

Now we need to remove values of ?x which don't match every class in the intersection. To do that, start with finding those that don't match everything in the intersection. This can be found by considering each element in the collection (call it ?d) and pairing it with every other element in the collection (call these ?d2). We can then look for the instances of ?d:

select distinct ?y ?d ?2
where {
  ?c owl:intersectionOf ?x .
  ?x rdf:rest*/rdf:first ?d .
  ?x rdf:rest*/rdf:first ?d2 .
  FILTER (?d != ?d2) 
  ?y a ?d }
Enter fullscreen mode Exit fullscreen mode

This returns every class paired with every other class, but only for instances of the first class:

?y ?d ?d2
:m :C :A
:m :C :B
:m :A :C
:m :A :B
:m :B :A
:m :B :C
:n :C :A
:n :C :B
:n :A :C
:n :A :B

Note how :n does not include a ?d equal to :B, but it does have ?d2 set to each value. If we remove cases where ?y is set to the second value, then everything will be removed for ?y=:m, since:

  • when ?y is :m, and :m is an instance of :A, then ?y is also an instance of :B and :C
  • when ?y is :m, and :m is an instance of :B, then ?y is also an instance of :A and :C.
  • when ?y is :m, and :m is an instance of :C, then ?y is also an instance of :A and :B

However, when ?y is :n not everything is cancelled:

  • when ?y is :n, and :n is an instance of :A, then ?y is and instance of :C, and that gets removed, but :n is not an instance of :B.
  • when ?y is :n, and :n is an instance of :C, then ?y is an instance of :A, and that gets removed, but :n is not an instance of :B.

The query to express this is:

select distinct ?y ?d ?d2
where {
  ?c owl:intersectionOf ?x .
  ?x rdf:rest*/rdf:first ?d .
  ?x rdf:rest*/rdf:first ?d2 
  FILTER (?d != ?d2)
  ?y a ?d MINUS {?y a ?d2}}
Enter fullscreen mode Exit fullscreen mode
?y ?d ?d2
:n :C :B
:n :A :B

Now that we've found the values of ?y that we don't want, they can be removed from the original query:

select distinct ?y
where {
  ?c owl:intersectionOf ?x .
  ?x rdf:rest*/rdf:first ?cn .
  ?y a ?cn
  MINUS {
    ?x rdf:rest*/rdf:first ?d .
    ?x rdf:rest*/rdf:first ?d2 .
    FILTER ( ?d != ?d2 )
    ?y a ?d MINUS { ?y a ?d2 }
  } }
Enter fullscreen mode Exit fullscreen mode

This gives the single solution of :m

?y
:m

So the final rule is:

insert { ?y a ?c }
where {
  ?c owl:intersectionOf ?x .
  ?x rdf:rest*/rdf:first ?cn .
  ?y a ?cn
  MINUS {
    ?x rdf:rest*/rdf:first ?d .
    ?x rdf:rest*/rdf:first ?d2 .
    FILTER ( ?d != ?d2 )
    ?y a ?d MINUS { ?y a ?d2 }
  } }
Enter fullscreen mode Exit fullscreen mode

NOTE: This query is for demonstration only. These operations are implemented doubly nested loops, which will not scale at all. It works for small ontologies, but if you try it on something like SNOMED then you will discover that the database will process for over a week. Non-standard SPARQL operations can do this much more efficiently.

Final Comment

This post was to introduce people to some of the more detailed elements of OWL's representation of Description Logic, explained Valid models are ones in which all interpretations will be true, and how entailment can be made for Consistent statements that lead to correct models for every possible interpretation.

The examples at the end demonstrated how entailment can be limited in the open structures of RDFS, but is more capable for the closed structures described in OWL, always remembering that the model itself always follows the Open World Assumption.

The SPARQL rule for owl:intersectionOf was me being clever, even if it's useless in the real world due to the scalability issues. ๐Ÿ˜Š I've been doing this in the real world with code that is outside of SPARQL, but I ought to be able to do it with SPARQL extensions like stardog:list:member (this could remove one level of loop in the above query, but I think it's possible to do even better).

All of this is to provide background for my next blog post, which I ought to be able to start now that I've finished writing this!

Top comments (0)