galzmarc

Posted on Jun 30

Building a Lisp Interpreter in Rust

#rust #programming #coding

I’ve been playing around with Rust for some time now; it's a pretty cool systems language with a very nice set of features: it's statically typed, and it enforces memory safety without a garbage collector (it uses a borrow checker instead) by statically determining when a memory object is no longer in use. These characteristics make it a “different” language that I was interested in exploring, plus the community is absolutely amazing.

As for the question "why building a Lisp interpreter?", it just happens to be one of John Crickett's Coding Challenges and I decided to give it it a try. Also, Scheme's minimal syntax offers a clear pathway for building core language features, thus making it an ideal language for my purpose.

Inspiration and acknowledgements

I have two sources to cite here:

Peter Norvig's Lispy: Many years ago, Peter Norving wrote this beautiful article about creating a Lisp interpreter in Python. I used his article as the main guide for my Rust implementation.
Stepan Parunashvili's Risp: Norvig's article inspired Stepan to write Risp, a Lisp interpreter in Rust. While I took some different decisions in my own code, this was a great source.

As Norvig does, we are going to actually create a Scheme interpreter (so not exactly Lisp). I will also likely depart from both articles at a certain point, but I'll do my best to note when I do that.

Step 1: A Basic Calculator

I will not even try to match Norvig's ability to offer definitions and explanations, so please refer to his article above for any of that. In here, I'll only try to document my thought process and provide some guidance to anyone willing to try this feat.

The only thing we need to remember at this point is the flow of an interpreter:

our program ⟶ parse ⟶ abstract syntax tree ⟶ eval ⟶ result

In short, our goal is to get the following:

> parse(program)
['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]
> eval(parse(program))
314.1592653589793

Type Definitions

For now, our interpreter is going to have three kinds of values: a Scheme expression is either an Atom or List, with the former being a string or a float.

#[derive(Debug, Clone, PartialEq)]
pub enum Atom {
    Symbol(String),
    Number(f64),
}

#[derive(Debug, Clone, PartialEq)]
pub enum Exp {
    Atom(Atom),
    List(Vec<Exp>),
}

We are also going to need a Scheme environment, which is essentially a key-value mapping of {variable: value}; we are going to use a HashMap for this, although we wrap our own struct around it:

#[derive(Debug, Clone, PartialEq)]
pub struct Env {
    data: HashMap<String, Exp>,
}

Parsing

The first step in our parsing process is lexical analysis, which means we take the input character string and break it up into a sequence of tokens (at this point, parentheses, symbols, and numbers):

tokenize("(+ 1 2)") //=> ["(", "+", "1", "2", ")"]

And we do that as follows:

pub fn tokenize(exp: String) -> Vec<String> {
    exp.replace("(", " ( ")
        .replace(")", " ) ")
        .split_whitespace()
        .map(|x| x.to_string())
        .collect()
}

Then we proceed with syntactic analysis, in which the tokens are assembled into an abstract syntax tree.

Essentially, we take the first token; if it’s the beginning of a list “(“, we start reading and parsing the tokens that follow, until we hit a closing parenthesis:

fn read_from_tokens(tokens: &mut Vec<String>) -> Result<Exp, String> {
    if tokens.is_empty() {
        return Err("No input provided. Please provide a Scheme expression.".to_string());
    }
    let token = tokens.remove(0);
    if token == "(" {
        let mut list: Vec<Exp> = Vec::new();
        while tokens[0] != ")" {
            list.push(read_from_tokens(tokens)?);
        }
        tokens.remove(0); // pop off ')'
        Ok(Exp::List(list))
    } else if token == ")" {
        return Err(format!("Unexpected ')'."));
    } else {
        Ok(atom(token))
    }
}

Otherwise, it can only be an atom, so we parse that:

fn atom(token: String) -> Exp {
    match token.parse::<f64>() {
        Ok(num) => Exp::Atom(Atom::Number(num)),
        Err(_) => Exp::Atom(Atom::Symbol(token)),
    }
}

Finally, we put everything together in our parse() function:

pub fn parse(input: String) -> Result<Exp, String> {
    // Read a Scheme expression from a string
    read_from_tokens(&mut tokenize(input))
}

Environment

Environments are where we store variable definitions and built-in functions, so we need to create a default one for our interpreter.

To start implementing our basic built-in operations (+, -), we need a way to save Rust functions references. To do that, we need to update our Exp enum to allow an additional type, Func(fn(&[Exp]):

#[derive(Debug, Clone, PartialEq)]
pub enum Exp {
    Bool(bool),
    Atom(Atom),
    List(Vec<Exp>),
    Func(fn(&[Exp]) -> Exp),
}

Then, we can implement some methods and functions for our Env struct:

impl Env {
    fn new() -> Self {
        Env {
            data: HashMap::new(),
        }
    }
    pub fn get(&self, k: &str) -> Option<&Exp> {
        self.data.get(k)
    }
}

pub fn standard_env() -> Env {
    // An environment with some Scheme standard procedures
    let mut env = Env::new();
    // Adding basic arithmetic operations
    env.insert("+".to_string(), Exp::Func(|args: &[Exp]| add(args)));
    env.insert("-".to_string(), Exp::Func(|args: &[Exp]| subtract(args)));

    env
}

And finally, our add() and subtract() functions:

pub fn add(args: &[Exp]) -> Exp {
    let sum = args.iter().fold(0.0, |acc, arg| {
        if let Exp::Atom(Atom::Number(num)) = arg {
            acc + num
        } else {
            panic!("Expected a number");
        }
    });
    Exp::Atom(Atom::Number(sum))
}

pub fn subtract(args: &[Exp]) -> Exp {
    let first = if let Some(Exp::Atom(Atom::Number(n))) = args.iter().next() {
        *n
    } else {
        panic!("Expected a number");
    };
    let result = args.iter().skip(1).fold(first, |acc, arg| {
        if let Exp::Atom(Atom::Number(num)) = arg {
            acc - num
        } else {
            panic!("Expected a number");
        }
    });
    Exp::Atom(Atom::Number(result))
}

Evaluation:

We are now ready to write our eval() function. As a refresher, right now our Exp can assume 4 values: an Atom (either a Symbol or a Number), a List, or a Func. This is the implementation:

pub fn eval(exp: Exp, env: &Env) -> Result<Exp, String> {
     match exp {
         Exp::Atom(Atom::Symbol(s)) => {
             env.get(&s).cloned().ok_or_else(|| panic!("Undefined symbol: {}", s))
         },
         Exp::Atom(Atom::Number(_)) => Ok(exp),
         Exp::List(list) => {
             let first = &list[0];
             if let Exp::Atom(Atom::Symbol(ref s)) = first {
                 if let Some(Exp::Func(f)) = env.get(s) {
                     let args = list[1..].iter()
                         .map(|x| eval(x.clone(), env))
                         .collect::<Result<Vec<_>, _>>()?;
                     return Ok(f(&args))
                 } else {
                     panic!("Undefined function: {}", s);
                 }
             } else {
                 panic!("Expected a symbol");
             }
         },
         Exp::Func(_) => Ok(exp),
     }
 }

Let's make it prettier

We now have functioning code, but an hypothetical input of "(+ 1 1)" will give us the not so pretty stdout of Atom(Number(2.0)). Not great.

To make it look better, we need to work with the Display trait of our Atom struct:

impl fmt::Display for Atom {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Atom::Symbol(s) => write!(f, "{}", s),
            Atom::Number(n) => write!(f, "{}", n),
        }
    }
}

Next, we add fmt::Display for Exp as well, so that it uses the Display implementation of Atom:

impl fmt::Display for Exp {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Exp::Atom(atom) => write!(f, "{}", atom),
            Exp::List(list) => {
                let formatted_list: Vec<String> =
                    list.iter().map(|exp| format!("{}", exp)).collect();
                write!(f, "({})", formatted_list.join(" "))
            }
            Exp::Func(_) => write!(f, "<function>"),
        }
    }
}

REPL

One of Lisp's best features is the notion of an interactive read-eval-print loop: a way for a programmer to enter an expression, and see it immediately read, evaluated, and printed, without having to go through a lengthy build/compile/run cycle.

We can do that by using a loop in our main() function:

use std::env;

use rustylisp::{eval, parse, standard_env};

fn main() {
     let args: Vec<String> = env::args().skip(1).collect();

     if args.is_empty() {
         eprintln!("Error: No input provided. Please provide a Lisp expression.");
         std::process::exit(1);
     }

     let env = standard_env();
     let input = args.join(" ");
     let parsed_exp = match parse(input) {
         Ok(exp) => exp,
         Err(e) => {
             eprintln!("Error during parsing: {}", e);
             std::process::exit(1);
         }
     };
     let result = match eval(parsed_exp, &env) {
         Ok(res) => res,
         Err(e) => {
             eprintln!("Error during evaluation: {}", e);
             std::process::exit(1);
         },
     };
     println!("{:?}", result);
 }

And voilà, the first implementation of our interpreter is done! We now have a REPL that allows us to perform addition and subtraction.

Step 2: A Better Calculator

Let's add support for multiplication, division, and comparison operators.

Multiplication and division

To add multiplication and division, we need to create two new helper functions:

pub fn multiply(args: &[Exp]) -> Exp {
    let product = args.iter().fold(1.0, |acc, arg| {
        if let Exp::Atom(Atom::Number(num)) = arg {
            acc * num
        } else {
            panic!("Expected a number");
        }
    });
    Exp::Atom(Atom::Number(product))
}

pub fn divide(args: &[Exp]) -> Exp {
    let first = if let Some(Exp::Atom(Atom::Number(n))) = args.iter().next() {
        *n
    } else {
        panic!("Expected a number");
    };
    let quotient = args.iter().skip(1).fold(first, |acc, arg| {
        if let Exp::Atom(Atom::Number(num)) = arg {
            if *num == 0.0 {
                panic!("Cannot divide by zero")
            }
            acc / num
        } else {
            panic!("Expected a number");
        }
    });
    Exp::Atom(Atom::Number(quotient))
}

Next, we add them to our Env (since we are at it, let's also add the Pi constant):

use std::f64::consts::PI;
...
pub fn standard_env() -> Env {
    // An environment with some Scheme standard procedures
    let mut env = Env::new();
    // Adding basic arithmetic operations
    env.insert("+".to_string(), Exp::Func(|args: &[Exp]| add(args)));
    env.insert("-".to_string(), Exp::Func(|args: &[Exp]| subtract(args)));
    env.insert("*".to_string(), Exp::Func(|args: &[Exp]| multiply(args)));
    env.insert("/".to_string(), Exp::Func(|args: &[Exp]| divide(args)));
    // Adding pi
    env.insert("pi".to_string(), Exp::Atom(Atom::Number(PI)));

    env
}

Booleans

To add support for comparison operators, we first need to introduce booleans (otherwise, what's "(> 1 0)" going to return?); let's do that.

First we edit our Exp enum

pub enum Exp {
    Bool(bool),
    Atom(Atom),
    List(Vec<Exp>),
    Func(fn(&[Exp]) -> Exp),
}

at which point Rust will tell us to update Display

impl fmt::Display for Exp {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Exp::Bool(a) => write!(f, "{}", a),
            ...
        }
    }
}

Then we'll change eval() to consider booleans

pub fn eval(exp: Exp, env: &Env) -> Result<Exp, String> {
    match exp {
        Exp::Bool(_) => Ok(exp),
        Exp::Atom(Atom::Symbol(s)) => env
        ...
    }
}

Last but not least, we update our atom() function

fn atom(token: String) -> Exp {
    match token.as_str() {
        "true" => Exp::Bool(true),
        "false" => Exp::Bool(false),
        // Numbers become numbers; every other token is a symbol
        _ => match token.parse::<f64>() {
            Ok(num) => Exp::Atom(Atom::Number(num)),
            Err(_) => Exp::Atom(Atom::Symbol(token)),
        },
    }
}

Comparison Operators

We are now ready to add comparison operators to our Env:

pub fn standard_env() -> Env {
    ...
    // Adding comparison operators
    env.insert(
        "=".to_string(),
        Exp::Func(|args: &[Exp]| compare(args, "=")),
    );
    env.insert(
        ">".to_string(),
        Exp::Func(|args: &[Exp]| compare(args, ">")),
    );
    env.insert(
        "<".to_string(),
        Exp::Func(|args: &[Exp]| compare(args, "<")),
    );
    env.insert(
        ">=".to_string(),
        Exp::Func(|args: &[Exp]| compare(args, ">=")),
    );
    env.insert(
        "<=".to_string(),
        Exp::Func(|args: &[Exp]| compare(args, "<=")),
    );

    env
}

And our helper function compare():

pub fn compare(args: &[Exp], op: &str) -> Exp {
    if args.len() != 2 {
        panic!("Comparison operators require exactly two arguments");
    }

    let a = if let Exp::Atom(Atom::Number(n)) = args[0] {
        n
    } else {
        panic!("Expected a number");
    };

    let b = if let Exp::Atom(Atom::Number(n)) = args[1] {
        n
    } else {
        panic!("Expected a number");
    };

    let result = match op {
        "=" => a == b,
        ">" => a > b,
        "<" => a < b,
        ">=" => a >= b,
        "<=" => a <= b,
        _ => panic!("Unknown operator"),
    };

    Exp::Bool(result)
}

We now have a pretty decent calculator, supporting basic arithmetic operations and comparison operators!

Note: Stepan's implementation uses an approach taken from Clojure, where comparison operators can take more than 2 args, and return true if they are in a monotonic order that satisfies the operator. I have instead decided to limit my interpreter to two args only.

Step 3: Almost a Language

At this point, we can build on booleans to add if statements, and we are going to introduce the define keyword to allow us to create our own variables and functions within the Env.

Define

We need to update our eval() function so that it recognizes the 'define' keyword; when it does, it creates a new variable in our Env, using the first arg as the key, and the second as the value:

fn eval(exp: Exp, env: &mut Env) -> Result<Exp, String> {
     match exp {
         ...
         Exp::List(list) => {
             let first = &list[0];
             if let Exp::Atom(Atom::Symbol(ref s)) = first {
                 if s == "define" {
                     if list.len() != 3 {
                         return Err("define requires exactly two arguments".into());
                     }

                     let var_name = match &list[1] {
                         Exp::Atom(Atom::Symbol(name)) => name.clone(),
                         _ => return Err("The first argument to define must be a symbol".into()),
                     };

                     let value = eval(list[2].clone(), env)?;
                     env.insert(var_name, value.clone());
                     Ok(value)
                 } else if let Some(Exp::Func(f)) = env.get(s) {
                     // Clone the function to avoid borrowing `env` later
                     let function = f.clone();
                     let args: Result<Vec<Exp>, String> = list[1..]
                         .iter()
                         .map(|x| eval(x.clone(), env))
                         .collect();
                     Ok(function(&args?))
                 } else {
                     Err(format!("Undefined function: {}", s))
                 }
             } else {
                 Err("Expected a symbol".into())
             }
         }
         Exp::Func(_) => Ok(exp),
    }
}

We now have some nice built-in functionality, allowing us to do this:

> (define r 10)
10
> (+ r 5)
15

We can also further improve on this feature in the next step, so that we will be able to 'define' functions and turn this into a full-on language.

Improved 'define'

To extend the define functionality to support the definition of new functions, we'll need to update the parser, evaluator, and the Env structure to handle function definitions and closures. Specifically, this involves recognizing when a function is being defined, parsing its arguments and body, and storing this in the environment.

First we update the Exp

pub enum Exp {
    Bool(bool),
    ...
    FuncDef {
         params: Vec<Exp>,
         body: Vec<Exp>,
         env: Env,
     },
 }

and the related Display trait

impl fmt::Display for Exp {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            ...
            Exp::FuncDef{..} => write!(f, "<function>"),
         }
     }
 }

and finally eval()

fn eval(exp: Exp, env: &mut Env) -> Result<Exp, String> {
    match exp {
        ...
        Exp::List(list) => {
             let first = &list[0];
             if let Exp::Atom(Atom::Symbol(ref s)) = first {
                if s == "define" {
                    if list.len() < 3 {
                         return Err("'define' requires at least two arguments".into());
                     }
                     // Define a new function
                     if let Exp::List(ref func) = list[1] {
                         if let Exp::Atom(Atom::Symbol(ref func_name)) = func[0] {
                             let params = func[1..].to_vec();
                             let body = list[2..].to_vec();
                             let lambda = Exp::FuncDef {
                                 params,
                                 body,
                                 env: env.clone(),
                             };
                             env.insert(func_name.clone(), lambda);
                             return Ok(Exp::Atom(Atom::Symbol(func_name.clone())));
                         }
                     // Define a new variable
                     } else if let Exp::Atom(Atom::Symbol(ref var_name)) = list[1] {
                         let value = eval(list[2].clone(), env)?;
                         env.insert(var_name.clone(), value.clone());
                         return Ok(value);
                     } else {
                         return Err("Invalid define syntax".into());
                     }
                 } else if let Some(exp) = env.get(s) {
                     if let Exp::Func(f) = exp {
                         // Clone the function to avoid borrowing `env` later
                         let function = f.clone();
                         let args: Result<Vec<Exp>, String> = list[1..]
                             .iter()
                             .map(|x| eval(x.clone(), env))
                             .collect();
                         return Ok(function(&args?));
                     } else if let Exp::FuncDef { params, body, env: closure_env } = exp {
                         // Clone `env` to avoid borrowing later
                         let env_clone = &mut env.clone();
                         let args: Result<Vec<Exp>, String> = list[1..]
                             .iter()
                             .map(|x| eval(x.clone(), env_clone))
                             .collect();
                         let mut local_env = closure_env.clone();
                         for (param, arg) in params.iter().zip(args?) {
                             if let Exp::Atom(Atom::Symbol(param_name)) = param {
                                 local_env.insert(param_name.clone(), arg);
                             } else {
                                 return Err("Invalid parameter name".into());
                             }
                         }
                         let mut result = Exp::Bool(false);
                         for exp in body {
                             result = eval(exp.clone(), &mut local_env)?;
                         }
                         return Ok(result);
                     }
                 }
                 return Err(format!("Undefined function: {}", s));
             } else {
                 Err("Expected a symbol".into())
             }
         }
         Exp::Func(_) => Ok(exp),
         Exp::FuncDef { .. } => {
             Err("Unexpected function definition".into())
         }
     }
 }

With the changes above, we now have the possibility to do something like this, which I think is very cool:

> (define (square x) (* x x))
square
> (square 5)
25

If statements (and some refactoring)

As mentioned above, we can build on our booleans to implement if statements. This is the function:

fn eval_if(list: &[Exp], env: &mut Env) -> Result<Exp, String> {
    if list.len() < 4 {
        return Err("'if' requires at least three arguments".into());
    }
    let condition = eval(list[1].clone(), env)?;
    match condition {
        Exp::Bool(true) => eval(list[2].clone(), env),
        Exp::Bool(false) => eval(list[3].clone(), env),
        _ => Err("Invalid condition in if expression".into()),
    }
}

Since the eval() function is getting quite long, we handle 'define' and 'if' separately

fn eval_define(list: &[Exp], env: &mut Env) -> Result<Exp, String> {
     if list.len() < 3 {
         return Err("'define' requires at least two arguments".into());
     }
     // Define a new function
     if let Exp::List(ref func) = list[1] {
         if let Exp::Atom(Atom::Symbol(ref func_name)) = func[0] {
             let params = func[1..].to_vec();
             let body = list[2..].to_vec();
             let lambda = Exp::FuncDef {
                 params,
                 body,
                 env: env.clone(),
             };
             env.insert(func_name.clone(), lambda);
             return Ok(Exp::Atom(Atom::Symbol(func_name.clone())));
         } else {
             return Err("Invalid define syntax".into());
         }
     // Define a new variable
     } else if let Exp::Atom(Atom::Symbol(ref var_name)) = list[1] {
         let value = eval(list[2].clone(), env)?;
         env.insert(var_name.clone(), value.clone());
         return Ok(value);
     } else {
         return Err("Invalid define syntax".into());
     }
}

fn eval(exp: Exp, env: &mut Env) -> Result<Exp, String> {
    match exp {
        ...
        Exp::List(list) => {
             let first = &list[0];
             if let Exp::Atom(Atom::Symbol(ref s)) = first {
                 match s.as_str() {
                     "define" => eval_define(&list, env),
                     "if" => eval_if(&list, env),
                     _ => {
                 // Truncated
                 }
             }
         }
         Exp::Func(_) => Ok(exp),
         Exp::FuncDef { .. } => Err("Unexpected function definition".into()),
     }
}

We have added some new functionality, and can now type something like this:

> (define r 10)
10
> (if (< r 8) true false)
false

Step 4: A Full Language

Note: This is where I depart the most from Stepan's implementation. While he introduced a Lambda type for his Exp enum, I have decided to build upon FuncDef. My reasoning was that I did not really need lambda functions at all.
In the end, typing (define (circle-area r) (* pi (* r r))) was perfectly valid syntax with my implementation. What I really needed was a scoped environment for new functions, which would also allow for recursion because the key was for the function to access itself, and if this is not magic, I don't know what is.

What's even better is that all of the above could be achieved [drumroll] with one single line of code: local_env.insert(s.clone(), exp.clone());
That's it. One line.

The full eval() function is below:

fn eval(exp: Exp, env: &mut Env) -> Result<Exp, String> {
    match exp {
        Exp::Bool(_) => Ok(exp),
        Exp::Atom(Atom::Symbol(s)) => env
            .get(&s)
            .cloned()
            .ok_or_else(|| format!("Undefined symbol: {}", s)),
        Exp::Atom(Atom::Number(_)) => Ok(exp),
        Exp::List(list) => {
            let first = &list[0];
            if let Exp::Atom(Atom::Symbol(ref s)) = first {
                match s.as_str() {
                    "define" => eval_define(&list, env),
                    "if" => eval_if(&list, env),
                    _ => {
                        if let Some(exp) = env.get(s) {
                            match exp {
                                Exp::Func(f) => {
                                    // Clone the function to avoid borrowing `env` later
                                    let function = f.clone();
                                    let args: Result<Vec<Exp>, String> =
                                        list[1..].iter().map(|x| eval(x.clone(), env)).collect();
                                    Ok(function(&args?))
                                }
                                Exp::FuncDef {
                                    params,
                                    body,
                                    env: closure_env,
                                } => {
                                    // Clone `env` to avoid borrowing later
                                    let env_clone = &mut env.clone();
                                    let args: Result<Vec<Exp>, String> = list[1..]
                                        .iter()
                                        .map(|x| eval(x.clone(), env_clone))
                                        .collect();
                                    let mut local_env = closure_env.clone();
                                    local_env.insert(s.clone(), exp.clone());
                                    for (param, arg) in params.iter().zip(args?) {
                                        if let Exp::Atom(Atom::Symbol(param_name)) = param {
                                            local_env.insert(param_name.clone(), arg);
                                        } else {
                                            return Err("Invalid parameter name".into());
                                        }
                                    }
                                    let mut result = Exp::Bool(false);
                                    for exp in body {
                                        result = eval(exp.clone(), &mut local_env)?;
                                    }
                                    Ok(result)
                                }
                                _ => Err(format!("Undefined function: {}", s)),
                            }
                        } else {
                            Err(format!("Undefined function: {}", s))
                        }
                    }
                }
            } else {
                Err("Expected a symbol".into())
            }
        }
        Exp::Func(_) => Ok(exp),
        Exp::FuncDef { .. } => Err("Unexpected function definition".into()),
    }
}

With the support for lambdas, we can now harness the power of recursion, and write something like this:

> (define fact (n) (if (<= n 1) 1 (* n (fact (- n 1)))))
fact
> (fact 5)
120

Fin

Rest here weary traveler. You've reached the end of this post.

My implementation is far from complete, and even further from elegant. I have willingly omitted some parts of my code in order to focus on what I deemed most important, and I have since done some refactoring as well.

You can find my full project here.

Contributions are more than welcome, so if you get to it, send me your thoughts 🙂.

DEV Community