Keep calm - let it crash

felbit profile image Martin Penckert ・2 min read

A Story about Erlang's Error Handling

The way I used to think about errors and handled exceptions in my software changed significantly some weeks after I started picking up Erlang. If you are coming from a language like Java or C you are used to think about all the possible ways the software might fail and handle that failure to prevent crashes in advance. The following text is for you.

Writing software in Erlang I will concern myself with preventing failure. But I will think about is the aftermath of crashes and failures. This post is meant to be an introduction to this way of thinking.

Writing Software defensively means to think hard about what could go wrong and what should happen instead. Defensive code tends to be littered with checks for arguments and types, with try-catch-finally-frames and log messages.

And this is only for software running in one thread. It multiplies the second you add concurrency to you architecture. In most languages working with multiple processes is painful and error-prone. That leads to developers shying away from code that handles more than one process. If that process dies from an unhandled error, though, the whole program crashes and leaves the user out in the rain.

So a good developer will test and check and prove that his software works in all possible cases (that one can think of). The result is code that is full of error checking code that is convoluted with the business logic.

In Erlang I just let it crash. Simple as that. It is more or less the opposite of defensive programming. Since processes are cheap and in Erlang processes will often be used like objects in other languages I will let the process crash and die. The software that solves the problem will only care about problem-solving. Writing the part the developer will assume that all input will be faultless and failure will not happen.

"Death does not concern us, because as long as we exist, death is not here. And when it does come, we no longer exist." - Epikur

So on one hand I have code that is completely free of error checking or failure preventing logic. And this code consists of a lot of processes that don't concern themselfs (or even know) of the health of their surroundings.

I will then set up monitoring processes that will not contain business logic but will only monitor the health of other processes. If a process hangs, crashes or dies the monitor observes that and will know what to do with the situation; e.g. it might start another process replacing the crashed one or clean up some data.

This monitoring works across machine boundaries since creating fault tolerant systems will on a single machine.

That is the second part of an Erlang system. Next to the problem-solving business logic the failure handling and error-correcting code often is generic so it can be reused in future applications.

In my opinion, this is a nice separation of concerns. Writing code that solves the problem and separating it from code that fixes failures.

Posted on by:

felbit profile

Martin Penckert


Software Architect, Developer, Mathematician. My repositories may contain Lambdas.


markdown guide