DEV Community

Void
Void

Posted on

Moving From Tree-Walk to Bytecode: Announcing My Ultimate C Programming Challenge

For the past few months, I’ve been building a custom, dynamically typed programming language from scratch in pure C. Up until now, it has lived comfortably as a classic, textbook AST Tree-Walk Interpreter.
If you’ve ever built one, you know the drill: your parser spits out a nested Abstract Syntax Tree, and you evaluate operations on the fly by recursively executing visit() functions down the nodes. It’s elegant, intuitive, and highly rewarding to write.
But recursive pointer-chasing and host-stack overhead eventually hit a performance wall.
So, I've decided to issue myself a massive technical challenge: I am going to completely tear down my comfortable tree-walk interpreter and rebuild it as a linear Bytecode Compiler and Virtual Machine.
I am giving myself a strict timeline to get a minimal, stack-based VM running. I haven't written a single byte of the new engine yet, this is a raw, ground-zero start. Here is why I'm taking on this architectural migration, the design hurdles I'm preparing to face, and how the mental models are about to collide.

The Core Problem: Why the Tree-Walker Has to Go

In my current interpreter, the execution pipeline is completely unified. A binary operator node evaluates its subtrees down to raw host pointers immediately, runs the C math logic on the spot, and passes the result up:

// MY CURRENT APPROACH: Direct recursive evaluation
Object* visitBinOpNode(ASTNode* node, Interpretator* ctx) {
  BinOpNode* binOper = (BinOpNode*)node;

  Object *srcObj = binOper->leftNode->visit(binOper->leftNode, ctx);
  Object *destObj = binOper->rightNode->visit(binOper->rightNode, ctx);

  // Runtime evaluation happens RIGHT HERE
  return (Object*)addNumber((Number*)destObj, (Number*)srcObj);
}

Enter fullscreen mode Exit fullscreen mode

The challenge of moving to a bytecode VM is that I have to split this execution into two completely distinct phases. I will no longer be evaluating expressions on the fly; I have to linearize them.
My new compiler pass will have to traverse the tree and flatten it into a one-dimensional array of instructions (uint8_t), which a completely separate Virtual Machine runner will execute later via a tight switch loop.

Challenge 1: Migrating the Memory System

One design constraint I'm setting for myself is to reuse my existing, heavily optimized type system. I already have robust, pooled constructor functions like initInt(), initFloat(), and initString().
My plan is to have the VM's computational evaluation stack literally hold arrays of these exact same host data structure pointers (Object* stack[256]).
The real test will be handling memory ownership. Right now, my AST visitor functions have clear rules for when to call freeObject(). In a VM, values will be flying off the evaluation stack constantly. My VM execution loop is going to have to take full, manual ownership of freeing intermediate, short-lived objects without destroying my persistent memory pools or causing catastrophic memory leaks.

Challenge 2: Control Flow Without a Safety Net

This is the part of the project I am dreading the most. In a tree-walk interpreter, implementing a while loop or an if statement feels like cheating because you lean entirely on the host language. You literally just write a native C while loop inside your visitor function and recursively execute the body node.
In a bytecode engine, there are no native loops. There are only directional jumps (OP_JUMP, OP_JUMP_IF_FALSE).
To make a loop work, I have to implement a backpatching mechanism. When compiling an if condition, the compiler has to emit a jump instruction before it even knows how many bytes the interior body block will take. I'll have to emit dummy placeholder bytes for the jump target offset, compile the interior statements, calculate the exact byte distance, and then surgically go back into my bytecode array to rewrite those placeholder bytes with the correct offset.
Getting those offsets right down to the exact byte without causing the instruction pointer to derail into illegal memory is going to be an absolute nightmare to debug.

The Roadmap Ahead

My goal for the next few days is simple:

  1. Define a core set of fundamental Opcodes (OP_CONSTANT, OP_ADD, OP_RETURN).
  2. Implement a dynamic array structure to hold raw bytes (the Chunk) alongside a Constant Pool for my objects.
  3. Build a simple stack-based VM loop that can execute hardcoded math arrays before hooking it up to the parser. Stripping away the abstraction of recursive AST interpretation is going to force me to reckon with how hardware actually processes instructions. I’m fully expecting a week of segmentation faults, index-out-of-bounds errors, and broken logic.

Wish me luck. I’ll be posting updates on my progress, design mistakes, and whatever breakthroughs happen along the way.

Top comments (0)