DEV Community

TheRemyyy
TheRemyyy

Posted on

I Built a Programming Language with an LLVM Backend at 15. Here's How It Actually Works

I wanted something with Java's clean syntax, Rust's memory safety, and C's raw speed. Nothing out there gave me all three so I just built it myself.

That's Apex. A compiled, statically typed language that compiles to native machine code through LLVM. Not an interpreter, not a transpiler. A real compiler, written in Rust, finished in about a month.

Here's how the whole thing works and what actually hurt building it.

--

The Pipeline

Every compiler is just a chain of transformations. You keep transforming source code until you get something the machine can run.

Source (.apex) → Lexer → Parser → AST → Type Checker → Borrow Checker → LLVM Codegen → Native Binary
Enter fullscreen mode Exit fullscreen mode

Each stage does one thing and hands off to the next.

Lexer

Reads raw text, turns it into tokens. KEYWORD_IF, INTEGER_LITERAL, LBRACE. Nothing clever, just recognition. Smallest stage, but if you get it wrong everything downstream breaks in confusing ways.

Parser and AST

Takes the tokens and builds an Abstract Syntax Tree. A tree representing the actual structure of the program. A function call becomes a node with children for each argument. An if statement becomes a node with a condition and branches.

I wrote it as a hand-rolled recursive descent parser instead of using a generator. More work but full control over error messages, which matters when someone actually tries to use the language.

function add(a: Integer, b: Integer): Integer {
    return a + b;
}
Enter fullscreen mode Exit fullscreen mode

That becomes a function node with the name, typed parameters, return type, and a body with a return statement wrapping a binary add expression.

Type Checker

The biggest file in the compiler, around 2200 lines. Walks the AST and verifies every expression has a valid type. Infers types where it can, errors where things don't line up. Generics and interface checking both live here.

Borrow Checker

This was the hardest part for me honestly. Apex has Rust-inspired ownership and borrowing and getting the memory safety rules right, making sure everything allocates and deallocates correctly, tracking what's borrowed where, that took the most brain power out of everything.

The syntax ended up cleaner than Rust though:

function update(borrow mut player: Player): None {
    player.score = player.score + 1;
    return None;
}
Enter fullscreen mode Exit fullscreen mode

All enforced at compile time, no runtime cost.

LLVM Codegen

LLVM is the backend used by Clang, Rust, Swift and others. You emit an intermediate representation and it handles optimization and compilation to native code.

The codegen is around 7500 lines. Every language feature needs its own translation logic. Classes become structs with vtables for dynamic dispatch. Generics use monomorphization. Async/await transforms into a state machine. String interpolation like "Hello {name}" lowers into runtime format calls.

The namespace and class system gave me real problems here. The way I had globals set up, the import resolution kept grabbing the wrong things. I had to rewrite and test that part several times before it actually worked correctly. Not fun but eventually it did.


The Moment I Almost Gave Up

At some point I ran a benchmark and Apex was doing arithmetic about 2 seconds slower than Rust and C. I thought the performance was just fundamentally broken.

Turned out I was benchmarking a debug build. No optimizations, no inlining, nothing. Once I added the proper optimization flags the performance ended up very close to Rust and C. That was a relief.


What Apex Looks Like

import std.io.*;

class Calculator {
    mut lastResult: Float;

    constructor() {
        this.lastResult = 0.0;
    }

    function add(a: Float, b: Float): Float {
        result: Float = a + b;
        this.lastResult = result;
        return result;
    }
}

function main(): None {
    calc: Calculator = Calculator();
    result: Float = calc.add(3.0, 4.0);
    println("Result: {result}");
    return None;
}
Enter fullscreen mode Exit fullscreen mode

Classes, generics, interfaces, pattern matching, async/await, string interpolation, file I/O. All compiling to a native binary with no runtime overhead.


What I Actually Learned

You can not fake understanding when building a compiler. Every shortcut in the type checker causes wrong behavior three stages later. Every ambiguity in your language design shows up as a bug eventually.

The thing that helped most was keeping each stage focused on exactly one job. That pattern shows up everywhere once you start seeing it.

If you want to really understand how programming languages work, build one. Even something tiny. The concepts transfer to everything else.


Source on GitHub at github.com/TheRemyyy/apex-compiler. Questions about any specific part, drop them in the comments.

Top comments (2)

Collapse
 
trinhcuong-ast profile image
Kai Alder

Hand-rolling a recursive descent parser instead of using a generator was a smart call, especially for error messages. That's the kind of decision that separates "I built a toy" from "I built something someone could actually use." Most people reach for yacc/ANTLR and then spend forever fighting the generated error output.

The debug build benchmarking story made me laugh — I've done the exact same thing and panicked before realizing I forgot -O2. Classic rite of passage.

Curious about the borrow checker implementation — did you go with a graph-based approach for tracking lifetimes, or something closer to how rustc does it with NLL (non-lexical lifetimes)? That's usually where the complexity explodes. 2200 lines for the type checker sounds about right for generics + interfaces, but I'd expect the borrow checker to rival it if you're handling edge cases like borrows across branches.

Solid write-up. The pipeline breakdown is clear enough that someone could use this as a roadmap to build their own.

Collapse
 
theremyyy profile image
TheRemyyy

Not graph-based, not NLL. It's scope-based. Borrows are BorrowInfo structs with a scope_depth field and lifetimes are tied directly to lexical scopes. The core is an OwnershipState enum: Owned, Moved(Span), Borrowed(usize), MutBorrowed(Span).
NLL needs a full CFG with dataflow analysis across basic blocks. I didn't go that far. The checker is an AST walk, not a CFG pass, so borrows across branches and early returns in loops are more conservative than what rustc does. Valid code occasionally gets rejected in those edge cases.
973 lines total, so smaller than the type checker. The gap closes fast once you add proper CFG-based lifetime inference. That's the obvious next step if I want to match what rustc actually does.