Andrew Buntine
Conjurer of computational spells and trickery.
When we think of esoteric programming languages, most of us quickly gravitate towards one in particular: the infamous Brainfuck - a super minimal programming language that achieves Turing Completeness in just six simple instructions (not including the two for I/O).
The incessantly lovely Brainfuck was created in 1993 by Urban Müller in an attempt to construct a usable - albeit barely - programming language with a compiler under 1024 bytes. Why 1024 bytes? Well, he had been inspired by Wouter van Oortmerssen's FALSE, a stack-based language with a compiler of exactly 1024 bytes. So I guess you could say it was a competition of sorts.
In this article, purely for the joy of it, we will see that Brainfuck is actually an informal dialect of a programming language invented in Italy more than 50 years ago: Corrado Böhm's P″ (pronounced "P Prime Prime" or "P Double-Prime).
I do expect that the reader has basic familiarity with Brainfuck - although, if you've never heard of it, I definitely recommend you take a look. Oh, and don't worry; despite the scary name, it's really very easy to learn (note, I didn't say "easy to use"!).
Let's talk a bit about Corrado Böhm's place in the history of Computer Science before we get into the nuts and bolts of his decidedly strange creation.
You can skip this section if you're only interested in the details and not the motivation.
When we talk about "programming" today, we are almost universally referring to what is known as "structured programming". That is, writing programs that employ the use of block structures, functions and loops. Pretty simple, huh? In fact, this all seems so obvious nowadays that we don't feel the need to qualify so specifically what we mean - so we use more general terms like "imperative" and "declarative" to describe our favorite paradigms of programming. The whole "structured" part is kind of a given.
But it wasn't always this way...
Many of the earliest programming languages, including heavy-hitters like BASIC, FORTRAN and COBOL^{1}, were unstructured: Statements were sequentially ordered, generally one per-line. And those lines were given labels or numbered in such a way that a program could perform an unconditional jump from one part of the program to the other.
And when I say "jump", I really mean jump! Execution was not returned back to the calling context as is the case with a function call (unless, of course, it was physically asked to with another jump). One could, at least theoretically, jump straight into the middle of a SUBROUTINE
or an IF
^{2}. Following the execution of a program meant tracing all of the jumps. Such a messy form of execution eventually gave rise to a phrase we still hear a lot of today (but never about our own code, of course): "Spaghetti code"^{3}!
In case you haven't worked it out already; yes, I am talking about the notorious GO TO
.
The early '60s were dominated by the debate over structured vs. non-structured programming. This was further ignited when, in 1966, a landmark paper titled "Flow diagrams, Turing Machines and Languages with only Two Formation Rules" was published by Corrado Böhm and Giuseppe Jacopini. This paper^{4} provided the theoretical foundation for structured programming (and therefore the abolishment of GO TO
) by proving that one can compute any computable function with only three simple control structures:
To demonstrate this in relation to Turing machines, they, of course, used a programming language: P″
P″ is a very simple. In fact, it's so simple that the whole thing can get a bit confusing. Kinda' like those movies that are so bad they loop back around to being good again. The language consists of only four characters: R
, λ
, (
and )
that act upon a Turing machine with infinite tape (main memory) and a finite alphabet (the set of things that can be written in a memory cell - such as the digits 0..9
).
Böhm defined the syntax rules^{5} as follows:
λ, R ∈ P″
q₁, q₂ ∈ P″
implies q₁q₂ ∈ P″
q ∈ P″
implies (q) ∈ P″
Right. Let's try that without the set notation:
λ
and R
are syntactically valid programs in P″.q₁
and q₂
are programs in P″, then so is their composition: q₁q₂
.q
is a program in P″, then so is (q)
. This will make sense when we get to the semantics.Böhm then goes on to explain the language semantics, which I've summarized and simplified here:
N
, where N > 1
. 0
is considered a special, blank symbol. So each memory cell can contain any value from 0
to N
. Because we are dabbling in the theoretical world of Turing machines, we can say that the exact value of N
is precisely what it needs to be for the computation at hand.R
is the operation of shifting the tapehead (a.k.a data pointer) forward to the right one cell (if there is one).λ
is the operation of replacing the symbol, c
, at the tapehead with (c+1) mod (N+1)
and then shifting the tapehead to the left one cell. Note the modulus operation - so if N = 5
then trying to increment 5
will result in 0
(the blank symbol) because 6 mod 6 = 0
.(q)
, where q
is any valid program, is interpreted as a while loop that iterates while the current cell does not contain the blank symbol (0
). Endless loops are possible, of course.So, we can think of P″ as being a near-literal implementation of a Turing machine in a similar sense that we can see the original Lisp as being deeply related to Church' Lambda Calculus. The former is just far less useful to the practicing programmer.
In a paper published in 1964 for the International Computation Center in Rome^{6}, Böhm proved that P″ was Turing-complete, which makes it the first structured programming language that did not contain a GO TO
instruction but instead relied upon iteration. Djikstra would go on to cite Böhm and Jacopini and in his now-famous paper, Go To Statement Considered Harmful, which would help to solidify their place in computer science folklore.
Here is a simple program to add two numbers together for a Turing machine where N = 3
and main memory looks something like this [2, 1]
:
(
λRλRλR # Decrement current cell (remember that N+1 wraps back around to 0, so decrementing is equivalent to N increments).
R # Move tapehead right.
λR # Increment current cell.
λRλRλRλ # Move tapehead left by decrementing the current cell and then executing one final λ.
) # Iterate if the current cell is not 0. This means we iterate twice in this example.
At the end of execution main memory will look like this: [0, 3]
.
Ok, cool, I guess. Maybe you can see some minor similarities to Brainfuck, but we are definitely missing some things. And even in esoteric programming language terms, it's all very clunky!
But, luckily for us, Böhm provided several small abstractions that allow us to write much simpler programs. And in doing so we will see that Brainfuck and P″ are equivalent.
Let's take a look.
There are a few instructions that Brainfuck gives us that we don't get out-of-the-box with P″. Specifically, we still need to fill in the following gaps:
Let's define those as follows:
I = λR
D = I^N
N
I
's in a row. So for a Turing machine where N = 5
it's literally IIIII
. The best way to think of D
is as a macro that expands to N
I
's rather than a runtime construct like a function.L = Dλ
To recap, we have now now defined the following instructions:
R
L
I
D
(
)
Now it's starting to look familiar! And so, finally, let's write the previous program again with our new abstractions:
(D R I L)
And how does the same program look in Brainfuck?
[- > + <]
Interesting! Although we've decided upon slightly different syntax, these two programs are now semantically identical.
I might just take a moment here to mention that there is something very fulfilling about defining a programming language in which DRILL
is a valid program (although, I must admit, Böhm used different names). I think I'll call my dialect "Braindriller".
Let's try another program^{7} that moves a value from cell 0
two places to the right (cell 2
):
R R ( D ) L L ( D R R I L L )
And Brainfuck:
> > [ - ] < < [ - > > + < < ]
Well, I don't know about you, but I'm sold. I implore you to keep trying other Brainfuck programs but, considering the two models of computation are now the same, it's my hypothesis that they will all be instruction-for-instruction identical.
One final note - you'll notice that we've skipped on the I/O instructions ,
and .
that exist in most dialects of Brainfuck. This is because the theoretical nature of P″ means that practical things like input and output would serve no purpose - we are only interested in the effect of the instructions upon the machine state (memory cells). But if you were to build a P″ interpreter - and I encourage you to do so - then you could implement I/O exactly as Brainfuck does.
If you've made it this far - well done! And I'm sure the mental energy you've just exerted will all pay off in the long run. Think about it: Next time you're talking to someone who brings up Brainfuck, you can tell them all about the wonders of P″ and the genius of Corrado Böhm.
As always, corrections and feedback are always more than welcome. :)
<program> ::= R|λ|<program><program>|(<program>)
.
Love it! Absolutely love it! :D I'd never heard of P'' before now. Thanks for sharing!