Perhaps to no one’s surprise, I’m writing yet another article on content that I’m studying for my qualifying exam. Apparently, I figure that I can point to this when I inevitably fail my exam, but I digress. Let’s talk about the Lisp programming language and how to design an interpreter for it.

## Lisp Overview

Honestly, I hadn’t touched any functional programming until I started playing around with languages like Haskell and Scheme in my Sample Programs in repo. Of course, eventually I learned about Lisp as a part of a programming languages course at The Ohio State University. Naturally, I have an exam on this very subject in just over a week, so let’s dig in.

First off, Lisp is one of the oldest programming languages—second to FORTRAN, I believe. Of course, Lisp differs from FORTRAN quite a bit in that it’s a functional programming language. In other words, there’s no notion of state or side-effects. Instead, computations are expressed through functions and expressions rather than statements.

In this article, we’ll talk about a fairly limited form of Lisp which only has a handful of builtin functions. As a result, don’t expect to be able to pass any of the code we discuss directly into a Scheme interpreter or anything like that.

## Lisp S-Expressions

In order to grasp Lisp, it’s important to understand its design. Specifically, everything in Lisp is a symbolic expression or **s-expression** which is a text-based way of expressing a binary tree. In other words, instead of drawing out a binary tree, we can write one using parentheses and dots:

Of course, the smallest possible s-expression is called an atom. As the name implies, an **atom** is an indivisible programming element like a number or a function name. In this case, atoms are what get stored in the leaves of the binary tree.

In Lisp, these s-expressions are all we need to understand to be able to grasp the language. After all, everything else is built on top of them.

## Lisp Syntax

Given what we know about s-expressions, let’s take a look at a simple Lisp grammar as provided in the slides for the CSE 6431 course at OSU:

```
<atom> ::= <numeric atom>
| <literal atom>
<numeric atom> ::= <numeral>
| -<numeral>
| +<numeral>
<numeral> ::= <digit>
| <numeral><digit>
<literal atom> ::= <letter>
| <literal atom><letter>
| <literal atom><digit>
<letter> ::= a
| A
| b
| B
| …
| z
| Z
<digit> ::= 0
| 1
| 2
| …
| 9
<S-exp> ::= atom
| (<S-exp> . <S-exp>)
```

In its simple form, Lisp is composed of s-expressions which can either be standalone atoms or nested s-expressions of the form `(<s-exp> . <s-exp>)`

.

From there, we define two productions for atoms. On one hand, they can be integers of arbitrary size or literals which are essentially strings.

All that said, most forms of Lisp include yet another s-expression production called the list:

```
list ::= (<items>)
| ()
items ::= <S-exp>
| <S-exp> <items>
```

In other words, a list would look like a space-separated collection of s-expressions. Of course, as a binary tree, a list looks as follows:

## Lisp Semantics

At this point, we’ve looked quite a bit at what Lisp looks like, but we haven’t talked a lot about how Lisp actually works. For example, the following code snippet is perfectly valid Lisp syntax, but what does it do?

```
(cons (cdr ′(11 . 6)) (car ′(4 . 5)))
```

In this case, we’ll generate a new s-expression that looks like `(6 . 4)`

, but how would we know that? As it turns out, Lisp has a ton of builtin functionality.

### CONS

First, we should talk about how we can build up our binary trees using a function called `cons`

. `cons`

, short for construct, creates an s-expression from two s-expressions:

```
(cons 4 7)
```

In this example, `cons`

is supplied two atoms: 4 and 7. As a result, `cons`

returns an s-expression of the form `(4 . 7)`

.

Naturally, we can nest `cons`

calls to create more interesting s-expressions:

```
(cons (cons 1 2) 3)
```

In this case, the inner `cons`

would construct `(1 . 2)`

which would then be combined with 3 to form `((1 . 2) . 3)`

.

### CAR and CDR

First, we should talk about the two s-expression traversal functions: `car`

and `cdr`

. I call them traversal functions because they allow us to access nodes in our binary trees. For example, if we had the s-expression `(3 . 7)`

, we could return the left s-expression, 3, using `car`

and the right s-expression, 7, using `cdr`

.

Interestingly enough, `car`

and `cdr`

are often extend to provide even deeper nesting functionality. For example, `cadr`

would grab the right s-expression using `cdr`

then grab the left s-expression of the child using `car`

.

In situations where we have atoms, `car`

and `cdr`

and their extended forms have undefined behavior. In other words, they should return an error.

### ATOM, INT, and NULL

In addition to the traversal functions, there are a handful of unary functions which can be used to test s-expressions for various properties.

One of those unary functions is called `atom`

, and like the other two unary functions, it accepts one s-expression as input (i.e. `(atom 5)`

) If the s-expression is an atom, the function returns `T`

. Otherwise, it returns `NIL`

.

Naturally, both `int`

and `null`

perform similarly. In particular, `int`

returns `T`

if the s-expression is an integer and `NIL`

otherwise. Likewise, `null`

returns `T`

if the s-expression is `NIL`

and `NIL`

otherwise.

As you can probably imagine, these three functions are useful when writing custom functions. For example, if we want to write a function which can find the smallest element in a list, a null check would be a good place to start.

### PLUS, MINUS, TIMES, QUOTIENT, and REMAINDER

Like most programming languages, there has to be some way to perform arithmetic. Of course, in a purely functional language, operators are replaced by functions. For instance, addition might look as follows:

```
(plus 2 7)
```

In total, there are five arithmetic functions in Lisp—`plus`

, `minus`

, `times`

, `quotient`

, and `remainder`

—and they correspond to addition, subtraction, multiplication, division, and remainder, respectively.

### LESS, GREATER, and EQ

In addition to the arithmetic functions, there are also relational functions similar to less than, greater than, and equals in other languages. For example, the check if two values are equal, we would use the `EQ`

function:

```
(eq 7 2)
```

In this case, the expression evaluates to false because 7 is not equal to 2. Naturally, all three of these functions work similarly.

### QUOTE

One of the interesting features of Lisp is how everything gets evaluated in the interpreter. Each time the interpreter encounters an expression, it attempts to evaluate it. As a result, there are expressions which seem valid to us but are not to the interpreter. For example, the following might seem fine:

```
(atom (2 3 5))
```

Unfortunately, this will crash because `(2 3 5)`

has to be evaluated. Naturally, it has no semantic meaning, so the interpreter tries to treat it like a function. Since 2 isn’t a function name, the interpreter crashes with an error:

2 is not a function name; try using a symbol instead

Instead, we can use the `quote`

function to say to the interpreter: “return this value without evaluating it.”

```
(atom (quote (2 3 5))
```

As you can imagine, `quote`

can really blow up a code base, so most versions of Lisp have some form of syntactic sugar as follows:

```
(atom '(2 3 5))
```

Now, that’s slick! And of course, this expression now returns `NIL`

.

### COND

Another interesting feature of Lisp is the ability to build up control flow similar to a switch statement:

```
(cond
(b1 e1)
(b2 e2)
...
(bn en)
)
```

Here, each boolean expression is evaluated from b1 to bn until one of the expressions returns `T`

. The passing expression then executes its corresponding expression. For example, if b2 evaluates to `T`

then e2 is evaluated.

Naturally, we can use this syntax for more interesting logic which we’ll get to in the next section.

### DEFUN

Finally, there’s a syntax for creating our own functions called `defun`

. In general, the syntax is as follows:

```
(defun f (x y) z)
```

Here, the function name is denoted by `f`

, and the function parameters are denoted by the list containing `x`

and `y`

. Ultimately, the function body is denoted by `z`

.

Based on all the functions we’ve covered up to this point, Lisp functions must be recursive, right? After all, there are no loops in Lisp, so any sort of repetition must be accomplished exclusively through function calls. For example, if we wanted to write a function to search a list for a value, it might look as follows:

```
(defun find (val list)
(cond
((null list) nil )
(T (cond
((eq val (car list)) T )
(T (find val (cdr list)))))))
```

Honestly, I don’t know what the best indentation would be here, but this function gets the job done. It works by first checking if the input list is null. If it is, then we’re done. Otherwise, we check to see if the first value in the list is what we want. If it is, we return true. Otherwise, we run a recursive call on the remainder of the list.

If we want to use our new `find`

function, we call it as follows:

```
(find 4 `(3 7 2 4))
```

Since 4 is in this list, we return `T`

as expected.

At this point, we’ve covered just about everything about the syntax and semantics of Lisp. Now, let’s get into the interpreter design.

## Lisp Interpreter Design

As it turns out, we can actually use Lisp to define our Lisp interpreter—aka bootstrapping. In particular, we’ll be looking at the interpeter from the point of view of the top-level `eval`

function.

### EVAL

As the name states on the tin, `eval`

evaluates a Lisp s-expression. However, in addition to the s-expression, a `eval`

function requires a bit of context in the form of two lists: the association list (a-list) and the function definition list (d-list).

Specifically, the a-list stores all parameter bindings within the current scope in the following form:

```
((parameter1 . binding1) (parameter2 . binding2) ...)
```

Meanwhile, the d-list stores all function definitions in the following form:

```
((function_name1 (param_list1 . body1)) (function_name2 (param_list2 . body2) ...)
```

Unlike like the a-list, the d-list is global and consequently one of the few parts of Lisp that introduces side-effects.

At any rate, these two lists are then passed to `eval`

which has the following body:

```
eval(exp, a_list, d_list) =
[atom[exp] ->
[eq[exp, T] -> T |
eq[exp, NIL] -> NIL |
int[exp] -> exp |
bound[exp, a_list] -> getval[exp, a_list] |
T -> ERROR!]
T ->
[eq[car[exp], quote] -> cadr[exp] |
eq[car[exp], cond] -> evcon[cdr[exp], a_list, d_list] |
eq[car[exp], defun] -> add to d_list |
T -> apply[car[exp], evlist[cdr[exp], a_list, d_list], a_list, d_list]]]
```

Now, that’s surprisingly simply. In fact, this alone is the interpreter. Obviously, there are a few undefined functions here and even some pseudocode, but this is incredibly elegant.

In the first of `eval`

, we deal with atoms only. In particular, we handle the three main atoms—`T`

, `NIL`

, and integers—as well as any symbol bindings such as variables. If we find a symbol binding with `bound`

, we return that binding using `getval`

.

In the second half of `eval`

, we deal with everything else. First, we check if the left element of our s-expression is `quote`

, `cond`

, or `defun`

, and then we apply whatever makes sense for those three special functions. Otherwise, we attempt to apply whatever symbol we see.

In the next few sections, we break down some of these undefined functions.

### EVCON

Whenever we have a set of conditions, we use a special function called `evcon`

to evaluate them:

```
evcon[cond_list, a_list, d_list] =
[null[cond_list] -> ERROR! |
eval[caar[cond_list], a_list, d_list] -> eval[cadar[cond_list], a_list, d_list] |
T -> evcon[cdr[cond_list], a_list, d_list]
```

As expected, evaluating a set of conditions is pretty simply. If there are no conditions, we throw an error. If the current condition evaluates to `T`

, we evaluate its expression. Otherwise, we grab the next condition, and recursively call `evcon`

on it.

### EVLIST

Like `evcon`

, `evlist`

is also a pretty simple helper function. It’s main role is to evaluate each expression in a list, and return a new list with all of the expressions evaluated:

```
evlist[arg_list, a, d] =
[null[arg_list] -> NIL |
T -> cons[eval[car[arg_list], a_list, d_list], evlist[cdr[arg_list], a_list, d_list]]]
```

With `evlist`

, we iterate over the list while evaluating each expression. Each result is then combined with the result of a recursive call on the remainder of the list. The result should be a list of evaluated s-expressions.

### APPLY

Finally, we bring the interpreter home with an implementation of the `apply`

function:

```
apply[fname, arg_list, a_list, d_list] =
[atom[fname] ->
[eq[fname, car] -> caar[list] |
eq[fname, cdr] -> cdar[list] |
eq[fname, cons] -> cons[car[list], cadr[list]] |
eq[fname, atom] -> atom[car[list]] |
eq[fname, eq] -> eq[car[list], cadr[list]] |
int, null, plus, minus, less, etc. |
T -> eval[cdr[getval[fname, d\_list]],
addpairs[car[getval[fname, d\_list]], list, a\_list],
d\_list] ] |
T -> ERROR! ]
```

Naturally, `apply`

is a bit bigger than all the other functions because we have to perform manual checks for each of our builtin functions: `car`

, `cons`

, `plus`

, `null`

, etc.

Of course, the most interesting part of the `apply`

function is the last case which covers user-defined functions. In that case, we bind the formal parameters to the function arguments using `addpairs`

and place the results in the a-list. Then, we evaluate the function body with the new bindings in the a-list.

And, that’s it! That’s the entire Lisp interpreter written in mathematical notation for Lisp. Overall, the interpreter looks like its about 30 or 40 lines of code, but my buggy Java implementation is much larger.

## Want to Learn More?

Naturally, there’s a lot more I can cover in terms of Lisp, but I think that’s enough for today. After all, I have a lot more studying to do.

If you liked this article and you’d love to get content like this sent directly to your inbox, subscribe to The Renegade Coder newsletter. As always, you’re welcome support the site directly through Patreon as many people already do. Your support goes a long way to keeping educational content like this free to the public.

As always, you can stick around. Here are few functional programming articles:

At any rate, thanks again for your support. If this is your first time reading my work, let me know what you think of it in the comments!

The post The Lisp Programming Language: Interpreter Design appeared first on The Renegade Coder.

## Top comments (0)