ryo ariyama

Posted on Dec 21, 2025

Building an Interpreter in Rust: A Journey Through Lexical Analysis, Parsing, and Evaluation

#rust #go #computerscience #webdev

Introduction

I previously read Writing An Interpreter In Go and implemented the interpreter in Rust for learning purposes. I'll put what I learned in this article.

The Output

The completed interpreter can be found here. Since it's Rust source code mimicking Go, I named the repository imitation_interpreter.

GitHub Repository

Since it's for learning purposes, there are some minor issues like lack of newline support, but it can perform minimum viable programming including variable assignment, if statements, and function usage.

For the runtime environment, if you have cargo*2 installed you can run it locally. If you don't but want to try it, you can run the Dockerfile with the command described in the README.

About the Book

This book explains the implementation of an interpreter that evaluates a fictional C-like programming language called Monkey, centered around Go source code. By reading through this book, you can create an interpreter that evaluates programs with the following features:

C-style syntax
Variable binding
Integers and booleans
Arithmetic expressions
Built-in functions
First-class and higher-order functions
Closures
String data type
Array data type
Hash data type

The implementation divides the interpreter into components, implementing them step by step.

In the following sections, I'll explain the implementation of these components using actual source code snippets and diagrams.

Lexer (Lexical Analyzer)

The first thing we develop is the lexer. Lexical analysis means converting source code into a format that's easier to interpret - splitting the program into words and assigning a type to each word.

For example, if we have a program like let x = 5;, this program is split into words with types: let, x, =, 5, ;, and passed to the parser. These words are called tokens.

In the implementation, we define the types of words that Monkey outputs as tokens, and the lexer analyzes them through pattern matching.

pub enum TokenKind {
    ILLEGAL,     // ILLEGAL
    EOF,         // EOF

    // identifier and literal
    IDENT,       // IDENT
    INT,         // 123...

    // operator
    ASSIGN,      // =
    PLUS,        // +

    LPAREN,      // (
    RPAREN,      // )

    // keyword
    FUNCTION,    // FUNCTION
    LET,         // LET
}

pub struct Token {
    pub token_type: TokenKind,
    pub literal: String
}

pub struct Lexer {
    input:        &'a str,
    position:     usize, // current input position
    read_position: usize, // next input position
    ch:           u8, // a letter which is currently read
}

impl Lexer {
    pub fn new(input: &'a str) -> Self {
        let mut l = Lexer{
            input,
            position: 0,
            read_position: 0,
            ch: 0
        };
        l.read_char();
        return l;
    }

    pub fn new_token(token_type: TokenKind, ch: u8) -> Token {
        Token {
            token_type,
            literal: String::from_utf8(vec![ch]).unwrap(),
        }
    }

    pub fn next_token(&mut self) -> Token { 
        // Skip spaces as they have no meaning
        self.skip_whitespace();
        let token;
        match self.ch { // Pattern match source code to define tokens
            b'*' => {
                token = Self::new_token(TokenKind::ASTERISK, self.ch);
            }
            b'/' => {
                token = Self::new_token(TokenKind::SLASH, self.ch);
            }
            b'<' => {
                token = Self::new_token(TokenKind::LT, self.ch);
            }
            b'>' => {
                token = Self::new_token(TokenKind::GT, self.ch);
            }
            // ... more patterns
        }
        self.read_char();
        return token;
    }
}

The new() method receives the source code, and the next_token() method pattern matches words and returns token structures. Since Monkey doesn't assign meaning to spaces, we skip them in this method.

The result for let five = 5; looks like this:

Token { token_type: LET, literal: "let" }
Token { token_type: IDENT, literal: "five" }
Token { token_type: ASSIGN, literal: "=" }
Token { token_type: INT, literal: "5" }
Token { token_type: SEMICOLON, literal: ";" }

Parser (Syntax Analyzer)

Next, we develop the parser. By implementing this parser, we can correctly calculate expressions like 1 + 2 * 3 with the proper precedence.

In this parser implementation, we build a tree structure to simply represent the program's precedence. Taking 1 + 2 * 3 as an example, we get the following tree:

For more complex calculations, we nest the tree deeper. For example, 4 + 5 * (6 - 7) + 8 * 9 becomes:

This tree is called an Abstract Syntax Tree (AST), and we calculate it from the deepest left to right. It's called "abstract" because the tree is represented with only the minimum necessary elements. For example, parentheses are necessary for calculation but are redundant for building the tree, so they don't appear in the AST.

I used arithmetic operations as an example to explain AST, but general programs are also interpreted by creating this AST.

In the implementation, we parse the program with Program as the root, decomposing it into Statements and Expressions. A statement is a language element with no return value that's complete by itself, while an expression has a return value and becomes part of a statement.

For example, when parsing let x = 1 * (2 + 3);, we get an AST like the following. In this case, the statement is let x = 1 * (2 + 3);, and the expression is 1 * (2 + 3). 1 * (2 + 3) is part of the let statement, and when executed, returns 5. Understanding this might be a bit difficult, but writing expressions as ASTs on paper makes it easier to understand.

In the implementation, we create Statement and Expression enums, exhaustively define patterns, and the parser analyzes them through pattern matching.

pub struct Program {
    pub statements: Vec
}

pub enum Statement {
    LetStatement{identifier: Expression,
                 value: Expression},
    Return(Expression),
    ExpressionStatement(Expression),
    Block(Vec),
}

pub enum Expression {
    Identifier(String),
    String(String),
    Integer(i32),
    // ... more variants
}

pub struct Parser {
    lexer: lexer::Lexer,
    current_token: Token,
    next_token: Token,
}

impl Parser {
    pub fn new(l: lexer::Lexer) -> Self {
        let mut p = Parser{
            lexer: l,
            current_token: Token{token_type: TokenKind::DEFAULT, literal: "default".to_string()},
            next_token: Token{token_type: TokenKind::DEFAULT, literal: "default".to_string()},
        };
        p.next_token();
        p.next_token();
        p
    }

    pub fn parse_program(&mut self) -> Result {
        let mut statements: Vec = vec![];

        // read token until it reaches the end of the sentence
        while !self.is_current_token(TokenKind::EOF){
            let statement = self.parse_statement()?;
            statements.push(statement);
            self.next_token();
        };
        Ok(Program {statements: statements})
    }

    fn parse_statement(&mut self) -> Result {
        match self.current_token.token_type {
            TokenKind::LET => {
                Ok((self.parse_let_statement()?))
            },
            TokenKind::RETURN => {
                Ok(self.parse_return_statement()?)
            },
            _ => {
                Ok(self.parse_expression_statement()?)
            }
        }
    }

    fn parse_expression(&mut self, precedence: Precedence) -> Result {
        let mut exp = match self.current_token.token_type {
            TokenKind::IDENT => {Expression::Identifier(self.parse_identifier()?)},
            TokenKind::STRING => {Expression::String(self.parse_string()?)},
            TokenKind::INT => Expression::Integer(self.parse_integer()?),
            TokenKind::TRUE => Expression::Bool(true),
            _ => return Err(Errors::TokenInvalid(self.current_token.clone()))
        };
        // ... more logic
    }
}

The new() method receives the lexically analyzed result, and the parse_program() method recursively constructs the AST.

The result for let x = 1 * (2 + 3) looks like this:

LetStatement { 
    identifier: Identifier("x"),
    value: InfixExpression { 
        left_expression: Integer(1),
        operator: "*",
        right_expression: InfixExpression {
            left_expression: Integer(2),
            operator: "+",
            right_expression: Integer(3)
        }
    }
}

Evaluation

Finally, we develop the evaluation. In this book, we sequentially execute the AST constructed by the parser in order using the host language (Rust). This method is called a Tree-walking interpreter. As a side note, this method was actually used in Ruby 1.8 and earlier. (From 1.9 onwards, to improve performance, they changed to a method that compiles the AST to bytecode and evaluates it on a virtual machine.)

By the way, from around this point I started to feel comfortable with Rust syntax and implementation became easier.

In the implementation, we define a type called Object to represent it in the host language, converting AST values to Objects.

pub enum Object {
    Identifier(String),
    String(String),
    Integer(i32),
    Boolean(bool),
    Return(Box),
    Let(Box),
    Array(Vec),
    // ... more variants
}

pub fn evaluate(&mut self, program: &ast::Program) -> Result {
    let mut result = Object::Default;
    // evaluate sentence per semicolon
    for statement in program.statements.iter() {
        result = self.evaluate_statement(statement)?;
        // if statement contains 'return', process should be broken and return value
        if let Object::Return(value) = result {
            return Ok(*value)
        }
        // if the result of evaluation, process should be broken
        if result == Object::Error(Errors::InvalidInfix){
            return Ok(result)
        }
    }
    Ok(result)
}

fn evaluate_statement(&mut self, statement: &ast::Statement) -> Result {
    match statement {
        ast::Statement::ExpressionStatement(expression) => self.evaluate_expression(expression),
        ast::Statement::Block(stmt) => self.evaluate_block_statements(stmt),
        ast::Statement::Return(expression) => {
            let return_expression = self.evaluate_expression(expression)?;
            Ok(Object::Return(Box::new(return_expression)))
        },
        ast::Statement::LetStatement{identifier, value} => {
            if let Expression::Identifier(identifier) = identifier {
                // if expression is identifier, evaluate value and append identifier as variable
                let evaluated_value = self.evaluate_expression(&value)?;
                let value = self.set(identifier.to_owned(), evaluated_value);
                return Ok(value)
            }
            Ok(Object::Null)
        },
        _ => Err(Errors::NodeError),
    }
}

fn evaluate_expression(&mut self, expression: &ast::Expression) -> Result {
    match expression {
        ast::Expression::String(value) => Ok(Object::String(value.to_owned())),
        ast::Expression::Integer(value) => Ok(Object::Integer(*value)),
        ast::Expression::Bool(bool) => Ok(Object::Boolean(*bool)),
        ast::Expression::Array(value) => {
            let array = self.evaluate_arguments(value.to_vec())?;
            Ok(Object::Array(array))
        }
        // ... more patterns
    }
}

The evaluate_expression() function continuously converts received expressions to Objects.

When evaluating let x = 5; x;, the result Integer(5) is returned. We want the terminal to display 5, so I decided to use Rust's Formatter feature, which nicely formats and displays values.

pub enum Object {
    Integer(i32),
    // ... more variants
}

impl fmt::Display for Object {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            // Only the internal value of Integer is displayed
            Object::Integer(value) => write!(f, "{}", value),
            // ... more patterns
        }
    }
}

What I Did to Become Proficient in Rust

Rust is generally considered a language with higher learning costs compared to other languages. The main reason is that it's a feature-rich language, but separately, there's also less knowledge available on the web compared to other languages.

While it's impossible to explain all of Rust's syntax in this blog, I hope to help improve the knowledge gap a bit, so I'll briefly explain the books and documents I referenced in chronological order.

Before Implementation

Before starting development, I worked through the official documentation. Since it's free and very carefully written, I think it was excellent as an initial document to read.

Knowledge up to about chapter 11 seemed essential, so I read through it all, memorizing syntax while copying the number guessing game program from the early chapters. For later chapters, I only worked on parts that seemed relevant. Therefore, there are areas I barely touched, which I think is a challenge.

During Implementation

During implementation, I mainly referenced "Programming Rust" alongside the official documentation. This book is written for beginners so that even those without base knowledge of Rust like C can understand it, making it very accessible.

Rust has a more difficult impression regarding string handling and type conversion compared to other languages, but perhaps the author sympathized with this feeling, as they devoted many pages to carefully explaining it. It's also attractive that it contains sample code that seems useful in everyday development, such as database access and web API usage, so I can recommend it to those who want to use Rust in production.

I also searched for GitHub repositories doing similar things in Rust and referred to commit logs. Some might avoid this as if looking at problem set answers, but you can learn about notation and libraries not covered in books, and by following commit logs you can see how others avoided the same pitfalls you encountered, which was very helpful.

Things You Should Do When Writing Rust

Build a development environment using RLS (Rust Language Server)

Rust is a restrictive language due to its type system and unique ownership system. Therefore, you'll encounter many compile errors, and in this development too, I had the feeling that working code emerged from fixing errors (partly because I was inexperienced).

I usually use VSCode as my editor, and by installing the rls*3 extension, I could check syntax errors and types in the editor, which saved development time. Especially since Rust has type constraints but it's difficult to check types in the program, being able to check types in the editor was very helpful.

Also, when running tests, you can execute them at the function level, which was very convenient when debugging only newly developed functions in pattern matching.

Installation reference:
Rust Analyzer - Visual Studio Marketplace

Use the Result Type for Return Values

As an implementation strategy for the parser and evaluator, we pattern match tokens and execute functions to process them. Therefore, we implement functions step by step for each token and extend the match statement, but for unimplemented tokens, there's a problem that we can't return a return value with the appropriate type because it's not implemented.

In this case, we use the Result type, which can define return values for successful termination Ok(T) and abnormal termination Err(E). In the code example above, we define the Expression type defined as an enum for successful termination and the Errors type for abnormal termination as return values. Wanting to return errors within functions is common, such as in API implementations, and using the Result type allows high flexibility in implementation.

I think it becomes easier to handle errors if you define an Errors type in detail in a dedicated errors.rs file using an enum.

Conclusion

I was able to implement an interpreter in about 2,000 lines including test code, and it was excellent study as an introduction to language processing and Rust. (The preface mentioned aiming for something between a technical book and a blog, which seemed accurate.)

Particularly when implementing the parser logic and successfully parsing complex expressions, I felt I understood why calculation formulas work correctly. As its reputation suggests, Rust is a language with high learning costs, and I had quite a difficult time in the beginning, but learning Rust-specific syntax like the ownership system means learning programming at a lower layer, so as someone who usually develops with scripting languages, I now think it was a good language worth learning.

I just learned while writing this blog that this book has a sequel called "Writing A Compiler In Go," so to prevent the Rust knowledge I've acquired from fading away, I'm thinking of trying to make a compiler next time.

Footnotes

*1: A tool that can build, run, test, and manage packages of Rust code - essential for Rust development. Cargo Documentation

*3: Abbreviation for Rust Language Server, a backend server that provides various completions for Rust programs in IDEs and editors. RLS GitHub

DEV Community