Natan Guzinski

Posted on Jun 28

I Built a Statically Typed Bytecode VM Language in C — Here's What I Learned

#c #computerscience #programming #showdev

A couple months ago I finished reading Crafting Interpreters and got the itch to go further. Instead of just following along with clox, I decided to design my own language from scratch. The result is Oli-Nat — a statically typed, bytecode-compiled language with its own GC, type checker, class system, and standard library, all written in C.

This post isn't a tutorial. It's about three specific design problems I ran into and how I solved them, because I think they're interesting and I haven't seen them written up clearly anywhere.

The Pipeline

Before diving in, here's the full pipeline so the rest makes sense:

source → scanner → Pratt parser → AST → type checker → two-pass bytecode compiler → stack-based VM

A quick taste of what the language looks like:

class Player {
    make int health = 100;
    make string name = "hero";

    make empty takeDamage(int amount) {
        this.health -= amount;
    }
    make int getHealth() {
        return this.health;
    }
}

make Player p = Player();
p.takeDamage(30);
println(p.getHealth()); // 70

Variables and functions are declared with make. Stdlib is imported with #pullf. Nothing groundbreaking syntactically, but the internals were a different story.

Problem 1: The Circular Dependency Between the Hashmap and Object System

My object system lives in object.h. My hashmap lives in hashmap.h. The problem: ObjClass needs a Hashmap to store its methods, and Hashmap needs ObjString* as a key type. Classic circular dependency.

The naive solution — just include one from the other — doesn't work because you end up with a cycle the preprocessor can't resolve.

My solution: a types-only header.

I created hashmap_types.h that contains just the struct definitions for Bucket and Hashmap, with a forward declaration of ObjString:

// hashmap_types.h
struct ObjString; // forward declaration only

typedef struct {
    struct ObjString* key;
    Value value;
} Bucket;

typedef struct {
    Bucket* buckets;
    int count;
    int capacity;
} Hashmap;

Then:

object.h includes hashmap_types.h — gets the Hashmap struct without pulling in hashmap implementation
hashmap.h includes both hashmap_types.h and object.h — gets everything it needs

No cycles. The key insight is separating struct layout from function declarations. You only need the layout to embed a Hashmap inside ObjClass; you only need the functions when you're actually calling them.

Problem 2: Two-Pass Compilation for Forward References

In most simple interpreters, you can't call a function before you declare it. I wanted Oli-Nat to support forward references — call a function defined later in the file, or have two classes reference each other — without requiring explicit forward declarations from the user.

The solution is a two-pass compiler.

First pass (declareFunction): scan the entire source for function, class, field, and method declarations. Don't parse bodies. Just populate the type checker's symbol table with signatures:

// First pass sees this:
make int add(int a, int b) { ... }

// And records: "add" → returns int, takes (int, int)
// Body is skipped entirely

Second pass: full parse, type check, and bytecode emission. By this point the type checker already knows about every function and class in the file, so forward calls type-check correctly and the compiler can emit the right call instructions.

The tricky part is class declarations. The first pass needs to record not just the class name but all its field types and method signatures, so that the type checker can validate instance.field and instance.method(...) expressions in the second pass. That means the first pass has to parse class bodies partially — just enough to extract the metadata, without emitting any bytecode.

The payoff is that the user never thinks about declaration order. Functions and classes can be used anywhere in the file.

Problem 3: GC Safepointing Hazards

This one bit me multiple times and is easy to miss if you're not thinking about it.

The GC runs during allocation. If you allocate something and it triggers a collection, any object you were holding in a local C variable but hadn't yet anchored to the VM's roots (stack, globals, etc.) will be swept as garbage — even though you're actively using it.

The classic bad pattern:

ObjString* key = copyString(vm, "name", 4);   // allocation #1
Value val = makeInstance(vm, someClass);       // allocation #2 — might trigger GC
                                               // key is now potentially freed!
mapSet(&table, key, val);

The fix is to push temporaries onto the VM's value stack before any allocation that might trigger GC, then pop after:

ObjString* key = copyString(vm, "name", 4);
push(vm, OBJECT_VAL(key));    // anchor it as a GC root

Value val = makeInstance(vm, someClass);  // GC safe now — key is on the stack

mapSet(&table, key, val);
pop(vm);                       // done with it

The mental model I use now: any call that allocates is a potential safepoint. Before crossing a safepoint, anchor everything you care about.

This shows up in subtle places — string interning during compilation, building the methods hashmap while the class object is being constructed, initializing field default values. Every time I added a new feature that involved multiple allocations in sequence, I had to audit the ordering.

What's Next

The language is working and I'm happy with where it is. The next big things on the roadmap are:

Constructors with parameters — right now Player() creates an instance with default field values; I want Player("Alice", 50) to work
Inheritance — ObjClass already has a superclass pointer reserved for this
A 2D game library — the whole reason I wanted a class system in the first place

The full source is on GitHub: https://github.com/NateTheGrappler/OliNat-Programming-Language

Happy to answer questions about any of the implementation details in the comments.

Top comments (1)

algorhymer • Jun 29

Thank you for sharing! I gave it a like, it was a great read. :)

Also, I'd like to applaud you for pushing through with the C part of the book!
The Java part is okay, but the C part is where the real meat is.
I also think it's awesome that you went off the beaten path.
The C part sometimes can cause endless headaches and debugging (at least to me it was a bit 'crazy magic bug' sometimes), but you managed it incredibly well.

The most important thing in my opinion is that by doing this,
you actually lived it and learned it through trial n' error,
and that’s where the truly valuable experience comes from.
I think Bob Nystrom was aiming at exactly this kind of target audience when writing the book:
people who want to have fun with it!
Not just make it, but truly have that "aha!" moment.

All in all, great job! I gave you a star on GitHub too, because honestly: you are downplaying it, but there's a metric ton of work in this.

So, that's kind of what I was able to say.

Oh, oh my, just one thing!

In the book there's a part that I really like:

Enough history, let’s jazz up our language.

I'm fully onboard with that sentence.

What was done is done, I say, no point thinking about it now.

German, French, Spanish.
ja-ja, oui-oui, sí-sí...
It's nonsense!
Everyone picks the outermost redex anyway, and if they don't, they ought to!

So no homework tonight!

But I want you to watch a lot of Hask... Rust newtypes, don't neglect your Wadl... JS prettier, and I'll see you in the morning.
Should we say... 10, 10:30, Glasgow time?
No point evaluating it too eagerly, is there?

P.S.: In case one day you become interested in what the Scottish Korriban gibberish this was, you can message me, Jedi. We might even talk about that the time, when type check failed for me, because I am a dumbo, and made a mistake in the proof code. TLDR: Languages are one of the deepest rabbit holes. For me though Haskell and Agda kind of undirtied the concepts a tiny bit, because... normal language courses always went too deep into lexing/parsing or they presented call-by-value as 'the way'. Until I tried the other side, I did not understand why certain things must be the way they are. Well, I'm still dumb, but at least now I've seen other ways. And that outside perspective was a - really tiny bit - useful, at least for me.