edA‑qa mort‑ora‑y

Posted on Jun 13, 2017 • Originally published at mortoray.com

Only 2 or 3 error types are needed

#programming #compiler #language #errors

Error handling is hard, and it's made harder by a rich error hierarchy. Programs that successfully handle errors tend to have only a couple of generic handlers. Code that catches specific types of errors, in numerous locations, and tries to differentiate its handling, ends up going wrong. How can we provide both rich error information and simplified handling?

How many error types do we need? It looks like just 2, or maybe 3.

Error information vs. severity

When an error occurs, we rightfully want to collect as much information as we can about it. Was an argument out of range, could the file not be loaded, was the network down, or is there anything worth noting? This information is both valuable in debugging and giving meaningful error messages to a user. But this information is one of the reasons why we've ended up with bloated error hierarchies.

But this extended information is orthogonal to the type of the error. The handler cares only about the severity, not the details.

try {
    risky_stuff();
} catch( recoverable_error & re ) {
    restore();
    log( re );
}

What does the error require of the calling code? Can we do a simple clean up or do we need to abandon further processing? By mixing the extended information in with the error type, we've made this decision hard. Not only do we have way too errors to chose from we have to deal with various wrapping classes that hide the underlying error.

In an environment with a rich exception hierarchy, like Java, C#, and even most C++, the only useful handling is to catch all exceptions!

The error mechanism is not relevant to this problem. Whether a language uses exceptions, return values, or monads, it is still subject to a proliferation of error types.

Tagging errors

There is a solution to the information problem. I first saw it in the Boost exception library for C++. Instead of creating an endless number of exception types, it uses a tagging mechanism. We can add arbitrary details to any exception without changing its type.

if( !valid_input(data) ) {
    BOOST_THROW_EXCEPTION( recoverable_error() <<
        tag_reason( "invalid-input" ) << 
        tag_input_data( data ) )
}

This code uses a rather generic recoverable_error. It adds a tag_reason, saying what went wrong, and a tag_input_data, referencing the source data. We've created a detailed error without modifying the type of the exception.

There's no need to do wrapping in this approach either. Handlers can add more information directly.

try {
    auto data = load_file( name );
    return parse( data );
} catch( recoverable_error & re ) {
    re << tag_filename( name );
    throw;
}

We keep the same recoverable_error type and have added a tag_filename to it. This error now has a tag_reason, tag_input_data and tag_filename on it. We'll print all of them to the log.

I consider tagged errors a robust solution but have not seen them outside of using Boost exceptions in C++.

Severity levels

If we don't need new types to carry details, what error types do we need?

I've argued this at length with a few colleagues. We're still in disagreement on whether it is two or three types. Yes, those are both shockingly low numbers!

One of those error types is easy to support: the critical error. These are situations that can't be handled correctly. Maybe when a severe failure has essentially broken the system, like detecting memory corruption, the inability to allocate a small object, or a VM security violation. Many C libraries call abort on such errors. We can propagate these normally, but only to get more information for debugging -- they are not recoverable!

There's no question that a critical error type exists, so the interesting question is whether we have two or just one addition error type?

The two types: stateless vs. stateful

I'm in favour of two types:

Stateless/unwindable errors: These are things like argument checks and pre-validation. They happen before any state in the system has been modified. The caller's state will be exactly as it was before the failed function call.
Stateful/recoverable errors: These happen after something has been modified already. The caller has to assume that the objects they are using, involved in the call, are not in the same state and must be cleaned up.

These give the caller clear direction in what options they have for handling the error. An unwindable error can be caught at any point and execution can proceed as though it didn't happen. A recoverable error requires the caller to cleanup before continuing.

Functional programming offers another view on these two errors. If we use only pure functions, we can't have the stateful class of errors. Unfortunately, a program can't be built from pure functions alone, but it's a thoughtful view to keep in mind.

The argument against the two type approach goes roughly as follows: programmers are just going to mess it up.

It sounds trivial, but it's a compelling argument. Handling these error types is not the problem. The problem is raising the errors. Do I expect programmers to know when it's appropriate?

More than likely a programmer will be cautious and only raise the "stateful / recoverable" type since they aren't confident they haven't modified anything. It's always safe to raise a stateful error, but incorrectly raising a stateless error can be disastrous.

Worse, a stateless error in one context may become a stateful error in another. It depends on what the previous call path has already done. Not only would we need to get the source call sites correct, but we'd also have to worry about the escalation at the right times.

There's no way this approach could work unless the compiler did most of the work.

In Leaf

But I'm writing a compiler, so maybe the idea can work. Can the compiler decide what type of error to raise, and do the required escalations? It seems like it should be possible. The compiler knows about all the memory involved, and all the assignments performed. Surely it can detect whether an error is stateless or stateful.

In practice, this is a problem. In a language that supports global memory, shared values, closures, and mutable caches, it's not easy to determine whether a function call has observable side-effects. It's not impossible though.

I think I've just eroded support for my own argument. Providing stateful and stateless errors would be onerous. Unless I want to spend the next half year on this problem alone I can't realistically implement my idea. However, since the compiler has to do it all, it's something I could introduce later and be completely backwards compatible.

Top comments (5)

Nat Ersoz • Jun 13 '17 • Edited

This is a very refreshing article and a must read.

I work in embedded products. It is remarkable how there are long enumerations of error codes, nested together, munged, one enum chaining after another. Complexity.

You think "wow, this must be important stuff". Only to find that all errors are either ignored or wrapped in some (god-dammed) macro that resets the device when an error occurs. Like this:

#define CHECK_RESULT(error_code) \
do { assert(error_code == SUCCESS) } while(0)

CHECK_RESULT( send_packet(packet) );
CHECK_RESULT( recv_packet(&packet) );

Having engineers read and take notice of this article should is worth the time and thought process in order to avoid brain-dead nonsense as seen above and in so many products.

Happily and timely, we have design discussions for the new products starting this week. A perfect means for promoting thoughtful, consistent and useful errors and how they might be handled.

edA‑qa mort‑ora‑y • Jun 14 '17

Yes, at first the multiple errors seem like a good idea, but then something screws them up. Somebody misuses a code, the code is lost in propagation, or it replaces and hides another failure.

Most likely the error code distignuished the errors at some point. But then a defect was discovered. More error codes were checked. Repeat. Eventually somebody just decided that it'd be easier to just handle any error the same way. It may not be optimal, but it had the best chance of working consistently.

It's not the type of error that decides how to handle it, it's where and when it occurs.

Paolo Milani • Jun 13 '17

I find this interesting but fundamentally misguided.

It's true that what ultimately matters when handling an error is "what am I going to do now?". But the options for "what to do" depend on the context of the application.

As an example, in a piece of code that processes messages from some queues, I need to divide errors into 3 groups, to handle them in 3 different ways:

bad message: throw away the message and continue with the next one in the queue
fatal environment error: cannot do any processing as some needed resource is unavailable. Stop processing and retry later
unknown: I do not know if error falls in the first or second class. Put message in a retry queue to retry later (or eventually discard if it keeps failing), and move on to next message in the mean while. This class is necessary because of errors that have never been seen or considered before, as well as for ambiguous errors from external systems my processor is talking to (I get an HTTP 5xx from an API: is it down or did my request hit a corner-case bug?)

This is just one example, there can be many other situations where errors need to be handled differently, and other distinctions need to be made. When I'm writing a piece of library code that can be used in many different contexts, I have no way of knowing what different handling may be needed for the different errors I raise. In fact, even the same exact error in the same exact application can need completely different handling depending on where/when it happens.

So when writing a library I make sure to:

have one or a few root exception classes that caller can handle if she just wants to deal with any error
provide more specific subclasses, possibly divided into groups through intermediate classes, for cases where specific errors need to be handled

I like the idea of tagging errors with extra metadata to avoid having a proliferation of error classes, but providing more error information is not the only reason we have error hierarchies.

edA‑qa mort‑ora‑y • Jun 14 '17 • Edited

The location of error handling, and the structure of the calling code, is more valuable than classes of errors. If you need to distinguish between parsing,or preparing message processing from the actual processing that split the code along those two lines.

If the state of processing needs to be communicating to higher layers than it should be communicated as a proper state and not through the error channels.

In virtually all cases where I've needed to distinguish the type of error handling based on what went wrong it's actually indicated some kind of architectural issue in the code.

Error types that carry extra meaning are local to the code that raises that error. If they escape that context they lose any special meaning. this implies any of these error types are not optional to handle, but mandatory to handle when calling a particular API.

They become part of the API contract, which starts sounding like checked exceptions. Checked excpetions have been a failure in all languages that have tried them. The idea is nice, but it just doesn't work out. It's too complex to get this working correctly and cleanly, and all it takes is one bad intermediate function to ruin the whole thing.

But even if you go down the error proliferation route, which I don't recommend, tagged exceptions are still the better alternative. If you truly wish to change the error handling based on an HTTP error, it makes more sense to encode that into a tag on the error. Then you have a guarantee that this information won't be lost during propagation. There's no wrapping, rewriting, or serialization issues to worry about.

Paolo Milani • Jun 14 '17

Thanks for the well-thought-out response.

If you need to distinguish between parsing,or preparing message
processing from the actual processing that split the code along
those two lines.

Definitely, that's a first line of defence. If I'm in the parsing routine I can catch the top level exception class and decide that I'm parsing a bad message if that is raised for whatever reason.

But whenever I am making a request of something outside of my process (a file system? network? database? API server?..) a single request can fail in many ways that may need to be handled in different ways.

My point is that the person who writes e.g. the db library or api client library cannot know what classes of error handling will be needed in the context of a given application, so they are not able to meaningfully raise exceptions of the two types listed in the article.

Instead, they presumably divide the errors that can happen into broad classes, which map to exception classes, and annotate them with the specific e.g. client or server error code and message.

If the state of processing needs to be communicating to higher
layers than it should be communicated as a proper state and not
through the error channels.

Here I disagree, an exception seems an effective language construct to communicate an error state up the stack.

Error types that carry extra meaning are local to the code that
raises that error. If they escape that context they lose any
special meaning. this implies any of these error types are not
optional to handle, but mandatory to handle when calling a
particular API.

More or less... you can have a default error handling behaviour for "all other errors".

But to do better than that, yes, at some point I need to examine the specific error cases raised and decide how to map them to my available error handling behaviours. This probably needs to happen not too far from the call site (as you say, where the semantics of the errors is known), while the actual error handling might happen far up the stack, in which case I will probably wrap the error into an exception class based on the type of error handling I need to do.

All of this could be done with a single exception class with tags, but since the language provides syntax to catch exceptions by class and not by tag, using multiple classes seems the more natural option.

Checked exceptions have been a failure in all languages that have
tried them.

that seems a strong statement, not an expert on this but "declare or raise" seems to be a good model for a statically typed language like Java, which hasn't been exactly a failure. Not an option in my current python work though, where not knowing what errors a library call might raise is a common problem we face.

DEV Community

Only 2 or 3 error types are needed

Error information vs. severity

Tagging errors

Severity levels

The two types: stateless vs. stateful

In Leaf

Top comments (5)

Read next

The Three Golden Rules of Successful Product Development

20 Handpicked Daily Useful Tools For Web Developers

JavaScript Runtimes: Introduction to JavaScript Runtimes

Free Programming Resources: Your Gateway to Coding Excellence