loading...

On the subtleties of OOP

martinhaeusler profile image Martin Häusler ・5 min read

It is not a overstatement to say that I am an enthusiast when it comes to object orientation. A well crafted object oriented system is a piece of art. However, following all the principles of good object-oriented design can be challenging. This article should show why we can never let our guards down when programming object oriented systems. It turned out to be a lot longer than I initially intended, but please bear with me on this one - it will be worth your time.

The Code

In one of my projects, I needed to do some mathematical coding, in particular I needed to work with pairs. Even though org.apache.commons.lang3 has excellent classes for tuples (and I would strongly recommend to use them!) I needed some extra bells and whistles and I ended up writing them on my own.

Here's what my pair class roughly looked like (without the additional stuff that I needed):

public class Pair<A,B> {

  private final A first;
  private final B second;

  public Pair(A first, B second){
    Objects.requireNonNull(first); 
    Objects.requireNonNull(second);
    this.first = first;
    this.second = second;
  }

  // getters for first and second

  public int hashCode() {
    return this.first.hashCode()*31 + this.second.hashCode();
  }

  public boolean equals(Obj other){
    if(other == null) { return false; }
    if(other == this) { return true; }
    if(other instanceof Pair == false) { return false; }
    Pair<?,?> p = (Pair<?,?>)other;
    return this.first.equals(p.getFirst()) 
        && this.second.equals(p.getSecond());
  }

}

The above class works fine in its own right. Later down the road in my project I also needed to work with triples. Following the "Don't Repeat Yourself" (DRY) principle, I implemented it as follows:

public class Triple<A,B,C> extends Pair<A,B> {

  private final C third;

  public Triple(A first, B second, C third) {
    super(first, second);
    Objects.requireNonNull(third);
    this.third = third;
  }

  // getter for third

  public int hashCode() {
    return super.hashCode()*47 + this.third.hashCode();
  }

  public boolean equals(Obj other){
    if(other == null) { return false; }
    if(other == this) { return true; }
    if(other instanceof Triple == false) { return false; }
    Triple<?,?,?> t = (Triple<?,?,?>)other;
    return this.first.equals(t.getFirst()) 
        && this.second.equals(t.getSecond()) 
        && this.third.equals(t.getThird());
  }

}

I managed to reuse two fields and a bunch of other stuff, nice! ... or so I thought. Spoiler alert: the code above has a massive flaw. Hats off to you if you can spot it outright. I did not.

Here's the deal: The code above breaks an essential contract of Java programming.

The Issue

A long time after writing these classes, I was searching for an error in a code base of approximately 100.000 lines of code, some of which was quite complex. The code was behaving erratically in certain cases, my JUnit tests were complaining. But I couldn't find the issue.

Eventually, after hours of debugging, I arrived at the following statement:


  // complex logic here ...

  Set<Pair<String, Long>> pairs = calculate();
  Pair<String, Long>> pairToSearchFor = ...;

  if(pairs.contains(pairToSearchFor)){
    // ... path A
  } else {
    // ... path B
  }

Intense debugging revealed that in this method, path A was never invoked. And I was puzzled why that was the case, clearly the pair I was looking for in my debug scenario should have been in the set. I checked the entries of the set in the debugger, and then it suddenly dawned upon me.

The set contained only Triples.

A Triple can be used in place of a Pair in our implementation, the Java compiler will not complain. But here's the issue. Let's assume we have two tuples:

Pair<String, Long> pair = new Pair<>("Hello", 123L);
Triple<String, Long, Long> triple = new Triple<>("Hello", 123L, 144L);

pair.equals(triple); // returns true
triple.equals(pair); // returns false

We have broken the contract of Object#equals(...). This is why the entry was not found in the hash set. Object#equals(...) (like any other equality relation) has to be symmetric, reflexive and transitive. In our example, it's not symmetric. We syntactically fulfilled the contract (i.e. we implemented the method in a way that made javac happy), but we semantically violated it. It might seem entirely obvious in the isolation of this blog post, but good luck detecting this error in a large code base.

Where did I go wrong?

My first urge was to run for a fix and adjust the equals method of Pair. Clearly a triple could never be equal to a pair. But Triple is a subclass of Pair, so instanceof Pair would not help to distinguish these two. other.getClass().equals(Pair.class) did not feel right for me either. I realized that I had no chance to make this equals(...) method work without making it aware of its subclasses. And according to OOP design rules, a class should never be aware of its subclasses (unless it is abstract).

So what the heck? Did we not follow each and every best practice there is? Did we find a flaw in the very principles of Object Orientation? Did we break the matrix?

The solution

When I applied the DRY principle to share the first and second fields, I focused exclusively on syntax. I followed this principle because it was easy. Not repeating those fields seemed like a good idea. However, I forgot the one principle of object orientation that overrules all else when it comes to inheritance. Only use inheritance if the sub-class is in fact a semantic specialization of the base class. Inheritance can be a tool to achieve a DRY design. But it's not the only one, and it is not always applicable.

Do Triple and Pair share two fields? Well yes, they do! But that's just syntax!

Is every Triple also a Pair in each and every case? Certainly not!

The second conclusion needs to be true in order to properly employ inheritance. However, it requires far more mental effort to check than the first one. It is about the semantics of your class. javac can't help you here. The essential flaw in my thinking was that I focused on writing as few lines of code as possible, but I ended up violating not only the contract of Object#equals(...) but also one of the core rules of object orientation - without even noticing at first. The rule which is violated here is known as the Liskov Substitution Principle.

Message in a bottle

The take-away messages from this article are:

  • Syntax is a means to an end in OOP. Focus on the semantics in your decisions.
  • Rules, principles and best practices can be contradicting each other at times. Make sure that you are aware of which one is more important than the other.
  • Never use inheritance simply for the sake of syntactically sharing common fields or methods.
  • Never just implement an interface blindly. Read the f***ing manual.
  • Never take the implementation of Object#hasCode() or Object#equals(...) lightly. It will come back to haunt you.

Thanks for reading, I hope this post will one day save somebody a lot of hours wasted in the debugger.

Posted on Nov 9 '17 by:

martinhaeusler profile

Martin Häusler

@martinhaeusler

Primarily a Software Engineer who feels at home on the JVM. OOP enthusiast, testing addict and Software Architect.

Discussion

markdown guide
 

A painful lesson that every Java developer learns is that subclassing and equality are a dangerous mix.

One defensive measure is not to use 'instanceof' in non-final equals method, because any use of instanceof will be asymmetric if subclassed. You can enforce symmetry by doing an exact class comparison —this.getClass() == other.getClass() — instead, which is what most tool-generated equals methods end up doing. This code also acts as a clear warning to developers: "equality does not survive subclassing".

However, this still leads to the same problem that broke your program, because two things that are otherwise indistinguishable when treated as type A (and thus should be equal) remain not equal to each other, because one is secretly of type B.

The fault ultimately lies in Java's concept of universal equality; the idea that there is a coherent concept of equality that transcends the types of the things being compared. The problem is that there is no such thing as coherent universal equality (for the reasons we've discovered above), but it's often really convenient to pretend there is. It's one of those places in Java where simplicity of interface was favoured over correctness of implementation, leaving behind a lot of pitfalls for developers.

Outside of the enforced requirement of universal equality, there's no real reason that Pair<A, B> shouldn't be a supertype of Triple<A, B, C>. Every sensible operation you can perform on a pair can be unambiguously projected onto a triple, so the latter should be substitutable for the former.

A fix for this would be to introduce a concept of equality in which the meaning of equality depends on the type of the comparison (similar to how you can supply a comparator when sorting). You could then express concepts like "A equals B as pairs", "A equals B as triples" or "A equals B as object references" distinctly and unambiguously.

 

First of all, thanks for the elaborate comment!

I totally agree that universal equality is difficult (if not impossible) to truly achieve. In my opinion, the Java designers did the best they could without totally going overboard with it. It is a huge improvement over what other languages do, Javascript for example has no real notion of equality other than reference equality (plus, interestingly, specific equalities for built-in types such as strings). But that is another matter entirely.

I think that, strictly speaking, a mathematician would disagree with the statement that a triple "is" a pair. However, a triple can be projected to a pair.

The best solution in this case is the one you outlined. Have Pair and Triple as independent classes (no inheritance) and let Triple have a method asPair that projects the triple to its first two entries, producing a new Pair. Also, Pair could have a toTriple(C third) method. This is also the mathematically correct way of doing it. Doing an "implicit projection" of the triple to the first two entries is questionable practice at best and dangerous at worst (as my example above has shown).

Oh, and Object#equals(...) is only getting started here. Things can get really hairy really quickly if we also add JDK Dynamic Proxies into the mix. The example I presented is really just the tip of the iceberg. My intention was to highlight how important it is to read the JavaDoc before implementing a method, and to focus on actual semantics rather than mere syntax.