loading...
Cover image for A sorting bug

A sorting bug

nfrankel profile image Nicolas Frankel Originally published at blog.frankel.ch ・3 min read

Lately, I succumbed to nostalgia, and agreed to do some consulting for a customer. The job was to audit the internal quality of an application, and finally to make recommandations to improve the code base and reimburse the technical debt. While parsing the source code, I couldn't help but notice a bug in the implementation of a Comparator.

This post is to understand how sorting works in Java, what is a Comparator, and how to prevent fellow developers to fall into the same trap. Even if it's obvious to experienced developers, I do believe it's a good refresher nonetheless.

Context

Most languages offer an out-of-the-box implementation of a (or more) sorting algorithm.

Providing shared utilities as part of the language stack (or a library) has two main benefits:

  1. Using an API is much more cost-effective than every developer implementing it over and over again
  2. A significant portion of developers - including myself - would probably have bugs in their first iteration. Sharing code means it's battle-tested by a lot of other developers.

Java's sorting API

Yet, even though the algorithm is provided, it relies on some properties of the underlying to-be-sorted elements. In Java, and I believe in every strongly statically typed language, this is enforced by the API through types.

Java Sorting API

Note that in recent Java versions, the sorting algorithm has been moved from Collections.sort() to the List.sort() method. The latter is a default method. For more information on this move, please check my previous post on this specific subject.

The List.sort() method accepts a Comparator argument. If it's null, the algorithm will sort according to the natural order of elements, which is the contract of Comparable. If it's not, it will sort according to the Comparator argument. Fundamentally, the contract of Comparator.compare() is the following:

Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.

-- JavaDoc

To sum it up, it returns o1 minus o2: it's up to the developer to define the implementation of the minus operation in the context of type T. With that, Timsort is able to compare elements in pairs and work its magic.

The bug

Now, the implementation I stumbled upon was the following:

(foo1, foo2) -> {
  if (foo1 == null || foo2 == null) {       // 1
    return 0;
  } else {
    return foo1.compareTo(foo2);            // 2
  }
}
  1. Take care of null values
  2. Compare using a specific method. I'm using compareTo() as a simple illustration

Can you spot the issue?

It works as expected until null values are part of the List to be sorted. During sorting, the null value will be compared to other Foo values: since it returns 0 in that case, it will be considered equal to the other value, even when the latter is not null! In short, it means null values won't be re-ordered, and will keep their index in the collection.

The fix

I believe the fix is straightforward:

(foo1, foo2) -> {
  if (foo1 == null && foo2 == null) return 0;  // 1
  if (foo1 == null)                 return -1; // 1
  if (foo2 == null)                 return 1;  // 1
  return foo1.compareTo(foo2);
}
  1. The fix

By returning -1 unless both values are null, null values are always treated as being less than any other value. Similarly, one could decide to return 1 to move null values at the end of the sorted list.

All in all, the result of the sorting process needs to be the same regardless of the initial order of the elements. To achieve that, it's necessary to handle null values in a consistent way.

Originally published at A Java Geek on July 26th, 2020

Posted on by:

nfrankel profile

Nicolas Frankel

@nfrankel

Dev Advocate | Former developer | Former architect | Former teacher | Still learning and blogging.

Discussion

pic
Editor guide
 

There is a small issue in your fix - it would return -1 if both of them are null.

But I actually wanted to bring up a more interesting point - I felt that the original implementation was not necessarily a bug, if it was written for some specific business logic. It may have been intentional to treat null as equal to any value. As you pointed out, it effectively keeps nulls where they are. That may very well have been the original requirement.

And it's still consistent, despite how it goes against our instinct that null ought to be less than any non-null value. Conversely, it also could be consistent and sensible, in some other contexts, to consider null greater than any non-null value.

 

Thanks for your comment.

I agree, that might have been a requirement. However, this brings two questions:

  1. Why do no comment this strange behavior?
  2. Why would you sort everything, but keep null values in their original location?

EDITED: I updated the code accordingly

 

Well, only whoever was familiar with the original circumstances would be able to answer those questions. :-) It could very well have been a bug. I just would have given the original author some benefit of doubt, and assumed that they did what they did for a reason until proven otherwise.

I definitely agree that if this was intended, some comment should have been put in place.