Shritesh Bhattarai

Posted on Dec 2, 2017 • Edited on Dec 4, 2017 • Originally published at shr.ite.sh

Writing a Brainfuck Interpreter in Rust (and WebAssembly)

#rust #webassembly #brainfuck

Rust recently added support for WebAssembly and I wanted to try it out. So, I wrote a brainfuck interpreter in Rust that compiles to WebAssembly. You can play with it here. Follow the rest of this post to read how I did it (and how you can too.)

Brainfuck

This is a hello world program in brainfuck
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

Brainfuck is an esoteric programming language and can be considered a turing tarpit. The language only has 8 instructions, each mapped to a specific character and all other characters are ignored. The execution environment generally has 30,000 "memory cells" that wrap around. Computation is performed by manipulating a pointer to these memory cells. You can read more about brainfuck in its esolangs entry

Setup

WebAssembly support in Rust requires the nightly toolchain and a few other components. The instructions from Hello, Rust! helped me get started.

With the toolchain installed let's create a new Rust project using cargo.

cargo new brainfuck
cd brainfuck

We are going to create a run function that takes in two strings: the source code and program input and returns the output. This will make it easy to interface with JavaScript and also run tests.

We will be using three programs to test our implementation: hello world, a unix cat-like program and a quine.

// src/lib.rs
pub fn run(source: &str, input: &str) -> String {
    unimplemented!()
}

#[cfg(test)]
mod test {
    use super::run;

    #[test]
    fn hello_world() {
        assert_eq!(
            run("++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.", ""), 
            "Hello World!\n");
    }

    #[test]
    fn cat() {
        assert_eq!(run(",[.,]", "hello"), "hello");
    }

    #[test]
    fn quine() {
        // Written by Erik Bosman
        // https://copy.sh/brainfuck/prog/quine505.b
        let program = r">+++++>+++>+++>+++++>+++>+++>+++++>++++++>+>++>+++>++++>++++>+++>+++>+++++>+>+>++++>+++++++>+>+++++>+>+>+++++>++++++>+++>+++>++>+>+>++++>++++++>++++>++++>+++>+++++>+++>+++>++++>++>+>+>+>+>++>++>++>+>+>++>+>+>++++++>++++++>+>+>++++++>++++++>+>+>+>+++++>++++++>+>+++++>+++>+++>++++>++>+>+>++>+>+>++>++>+>+>++>++>+>+>+>+>++>+>+>+>++++>++>++>+>+++++>++++++>+++>+++>+++>+++>+++>+++>++>+>+>+>+>++>+>+>++++>+++>+++>+++>+++++>+>+++++>++++++>+>+>+>++>+++>+++>+++++++>+++>++++>+>++>+>+++++++>++++++>+>+++++>++++++>+++>+++>++>++>++>++>++>++>+>++>++>++>++>++>++>++>++>++>+>++++>++>++>++>++>++>++>++>+++++>++++++>++++>+++>+++++>++++++>++++>+++>+++>++++>+>+>+>+>+++++>+++>+++++>++++++>+++>+++>+++>++>+>+>+>++++>++++[[>>>+<<<-]<]>>>>[<<[-]<[-]+++++++[>+++++++++>++++++<<-]>-.>+>[<.<<+>>>-]>]<<<[>>+>>>>+<<<<<<-]>++[>>>+>>>>++>>++>>+>>+[<<]>-]>>>-->>-->>+>>+++>>>>+[<<]<[[-[>>+<<-]>>]>.[>>]<<[[<+>-]<<]<<]";
        assert_eq!(run(program, ""), program);
    }
}

Running cargo test fails now, but it will pass after our implementation is complete. We will deal with WebAssembly after that.

Parsing Source

These are the eight brainfuck instructions:

>: Move pointer to right
<: Move pointer to left
+: Increment value in current cell
-: Decrement value in current cell
.: Output the character from the current cell
,: Input a character into the current cell
[: Jump next to the matching ] if current value is 0
]: Jump back next to the matching [ if current value is not 0

All the instructions are fairly simple except the last two. To make things easy later, we will figure out the the location of the matching bracket (represented by the wrapped usize below) at parse time. Let's represent the instructions in an Enum.

enum Instruction {
    MoveRight,
    MoveLeft,
    Increment,
    Decrement,
    Output,
    Input,
    Open(usize),
    Close(usize),
}

We parse the strings into a Vec of Instruction by iterating over all the characters and only keeping the instruction characters.

fn parse(source: &str) -> Vec<Instruction> {
    let ops: Vec<char> = source
        .chars()
        .filter(|c| match *c {
            '>' | '<' | '+' | '-' | '.' | ',' | '[' | ']' => true,
            _ => false,
        })
        .collect();

    let mut instructions = vec![];
    // TODO: Fill `instructions`
    instructions
}

We still need a way to find the matching brackets. We could use some complicated recursive algorithm or maintain a stack but I'm aiming for simplicity here. We will instead scan the characters in either direction and count the brackets to figure out the index of the pair. We will add 1 to the counter when we find an opening bracket and subtract 1 when we find a closing bracket. Once the counter reaches 0, we have found our index. Let's write a closure for this.

    let find_matching_paren = |open, close, start_index, stop_index| {
        let mut index = start_index;
        let mut open_parens = 1;

        loop {
            if ops[index] == open {
                open_parens += 1;
            } else if ops[index] == close {
                open_parens -= 1;
            }

            if open_parens == 0 {
                return index;
            }

            if index == stop_index {
                panic!("Unmatched parens");
            } else if start_index < stop_index {
                index += 1;
            } else {
                index -= 1;
            }
        }
    };

That was more complicated than I'd like but it gets the job done.

Now we can match the characters and add to instructions. We loop over and match the characters with their respective Instruction and push it to instructions.

    for i in 0..ops.len() {
        match ops[i] {
            '>' => instructions.push(Instruction::MoveRight),
            '<' => instructions.push(Instruction::MoveLeft),
            '+' => instructions.push(Instruction::Increment),
            '-' => instructions.push(Instruction::Decrement),
            '.' => instructions.push(Instruction::Output),
            ',' => instructions.push(Instruction::Input),
            '[' => instructions.push(Instruction::Open(
                find_matching_paren('[', ']', i + 1, ops.len()),
            )),
            ']' => instructions.push(Instruction::Close(
                find_matching_paren(']', '[', i - 1, 0)
            )),
            _ => {}
        };
    }

Running the program

Now that our parser is complete, we can start executing those instructions.

We represent the memory of the program as an array of 30000 bytes. We also create an iterator from the input string so it'll be easier later to read from it.

We keep two counters: instruction_counter and memory_counter as a pointer into instructions and memory. An output string is created that will be filled and returned by the function.

We repeatedly match the instruction and increment the instruction_counter by 1 after every instruction is executed.

use std::char;

const MEMORY_SIZE: usize = 30_000;

pub fn run(source: &str, input: &str) -> String {
    let instructions = parse(source);
    let mut input_iter = input.chars();

    let mut memory = [0u8; MEMORY_SIZE];

    let mut instruction_counter: usize = 0;
    let mut memory_counter: usize = 0;

    let mut output = String::new();

    while let Some(instruction) = instructions.get(instruction_counter) {
        // TODO: execute the instructions

        instruction_counter += 1;
    }

    output
}

Now, we can pattern match over the instructions and execute them.

MoveRight and MoveLeft change the memory_counter and check for wraparound while Increment and Decrement simply add or subtract one from the pointed memory.

Output appends the current value to the output string. Similary, Input reads from input_iter into the current cell or a 0 if it is the end of string.

Open and Close check whether the current value is 0 or not and modify the instruction_counter to their corresponding pair indices.

        match *instruction {
            Instruction::MoveRight => if memory_counter + 1 == MEMORY_SIZE {
                memory_counter = 0;
            } else {
                memory_counter += 1;
            },
            Instruction::MoveLeft => if memory_counter == 0 {
                memory_counter = MEMORY_SIZE - 1;
            } else {
                memory_counter -= 1;
            },
            Instruction::Increment => memory[memory_counter] += 1,
            Instruction::Decrement => memory[memory_counter] -= 1,
            Instruction::Output => {
                output.push(char::from_u32(memory[memory_counter] as u32).unwrap())
            }
            Instruction::Input => memory[memory_counter] = input_iter.next().unwrap_or('\0') as u8,
            Instruction::Open(index) => if memory[memory_counter] == 0 {
                instruction_counter = index;
            },

            Instruction::Close(index) => if memory[memory_counter] != 0 {
                instruction_counter = index;
            },
        }

That was easier than I thought it would be. And our tests pass now. We have successfully written a working interpreter.

WebAssembly

The tooling around WebAssembly in Rust is still in infancy, but the community is improving that every day. There is a lot of (unsafe) boilerplate that needs to be written. I have used the code from Hello Rust! for this.

We need to use the no_mangle compiler directive to prevent name mangling. We also need to manually call our allocators and deallocators from JavaScript so we expose them. Our functions will receive the pointer to the source string and input string so, we need to convert them to Rust strings.

use std::mem;
use std::ffi::{CStr, CString};
use std::os::raw::{c_char, c_void};

#[no_mangle]
pub extern "C" fn alloc(size: usize) -> *mut c_void {
    let mut buf = Vec::with_capacity(size);
    let ptr = buf.as_mut_ptr();
    mem::forget(buf);
    return ptr as *mut c_void;
}

#[no_mangle]
pub extern "C" fn dealloc(ptr: *mut c_void, cap: usize) {
    unsafe {
        let _buf = Vec::from_raw_parts(ptr, 0, cap);
    }
}

#[no_mangle]
pub extern "C" fn dealloc_str(ptr: *mut c_char) {
    unsafe {
        let _ = CString::from_raw(ptr);
    }
}

#[no_mangle]
pub extern "C" fn brainfuck(source: *mut c_char, input: *mut c_char) -> *mut c_char {
    let source_str = unsafe { CStr::from_ptr(source).to_str().unwrap() };
    let input_str = unsafe { CStr::from_ptr(input).to_str().unwrap() };

    let output = run(source_str, input_str);

    CString::new(output).unwrap().into_raw()
}

We can now compile the WebAssembly. We also use wasm-gc to remove unneded symbols as the linker is not optimized yet.

rustc +nightly --target wasm32-unknown-unknown --crate-type=cdylib  -O src/lib.rs
wasm-gc lib.wasm brainfuck.wasm

Now we can create the HTML page and use it. We will be using the bundle.js from Hello, Rust! demo to make our lives easier. There is a lot of boilerplate but they are straightforward.

  <script src="bundle.js"></script>
  <script>
    window.Module = {}
    var Brainfuck = {
      run: function (source, input) {
        let src_buf = newString(Module, source);
        let input_buf = newString(Module, input);
        let outptr = Module.brainfuck(src_buf, input_buf);
        let result = copyCStr(Module, outptr);
        Module.dealloc(src_buf);
        Module.dealloc(input_buf);
        Module.dealloc(outptr);
        return result;
      }
    }
    fetchAndInstantiate("./brainfuck.wasm", {})
      .then(mod => {
        Module.alloc = mod.exports.alloc;
        Module.dealloc = mod.exports.dealloc;
        Module.dealloc_str = mod.exports.dealloc_str;
        Module.brainfuck = mod.exports.brainfuck;
        Module.memory = mod.exports.memory;

        alert(Brainfuck.run(",[.,]", "Hello World!"));
      });  </script>

newString allocates memory and copies the string to be used by the WebAssembly environment. Similarly, copyCStr copies the result back to javascript. All of them need to be later manually deallocated using Module.dealloc. This is all wrapped cleanly inside Brainfuck.run in a format that bundle.js expects after using fetchAndInstantiate to load the WebAssembly file.

We can execute the program and get the resulting string using Brainfuck.run(source, input).

That's it. You can see the final Rust source code here and the HTML source here. The hosted version is here.

Conclusion

I had a lot of fun writing the interpreter. Rust's Enums, pattern matching and safety make the language very powerful, yet accessible. I was confident about the program as soon as it compiled and the tests passed. The compiler errors are extremely helpful.

WebAssembly is definitely the next big thing in Web development and I'm happy that Rust is at the forefront of supporting it. Currently, there is a lot of boilerplate required to get anything done and the documentation is sparse, but I'm sure this will improve in the future.

I hope you enjoyed reading this. You can follow me on Twitter (@shritesh) or email me at shr@ite.sh