Verbose is a small experimental language I'm building. Its compiler proves properties about your code — like termination — and emits tiny, readable x86-64 machine code: no runtime, no GC, no libc. This post stands on its own (you don't need the rest of the series). What it's about: I'm now writing a Verbose compiler in Verbose itself, and this is the foundation brick — how you represent a program as data so a compiler can work on it.
(English version of an article from my French series, originally on arcker.org.)
After cryptography, we take on something more vertiginous: a Verbose compiler written in Verbose. The language starting to describe itself.
Let's be honest up front — it matters. This is not (yet) verbosec compiling the entirety of its own source. What exists today is a complete front end — tokenizer, parser, analyses, interpreter, type checker — written in Verbose, for a toy subset of the language. The whole thing compiled by verbosec to native machine code. Not an interpreted demo: a ~60 KB ELF binary that reads your program and tells you what's wrong with it. That's examples/vexprparse.verbose: 102 concepts, 219 rules.
We'll walk through it brick by brick. This chapter lays the foundation without which nothing else exists: how to represent a program.
Why write the compiler in Verbose?
The question deserves an answer, because it isn't just an exercise — it touches Verbose's whole thesis.
Today, the compiler (verbosec) is written in Rust. And some of the logic — certain primitives — is Rust that emits x86-64 directly, with no Verbose source. The concrete consequence: to audit a Verbose binary, to really understand what it does, at some point you have to read Rust. And trust that Rust — and whoever, or whatever, wrote it.
That's precisely what Verbose refuses. The whole series rests on four words: you don't trust, you verify. You read the source, declared and proven. If the path from source to binary runs through unverifiable Rust, trust leaks out there.
Writing the front end in Verbose moves that logic into the language itself: the tokenizer, the parser, the analyses become a .verbose file, verified under Verbose's proof regime, then compiled native. The auditor reads Verbose, not Rust. The remaining Rust shrinks to a small, stable, trusted-once base (the verifier). Per-binary trust moves from Rust to the proven source.
And it's the ultimate dogfooding: a compiler is the hardest thing to express. If Verbose can describe its own front end, under its own proof regime, then the language isn't a toy — it holds up on the most demanding task there is.
From text to a tree
A compiler can do nothing with flat text. x + y * 2, to a human, is a string of characters; to a compiler, it's a structure — a tree, where the multiplication nests under the addition (operator precedence):
The text "x + y * 2" is really a tree:
( + )
/ \
x ( * )
/ \
y 2
Everything starts there. Before evaluating, type-checking, or catching an undefined variable — you first have to turn the text into that tree. That's the AST (Abstract Syntax Tree). And to build it, you need a way to represent a tree as data.
A tree is a recursive sum type
This is where the earlier chapters pay off. A tree is declared in Verbose as a sum type — a type that can take several shapes — some of whose shapes reference themselves:
concept Ast
variants:
AstNum of (value : number)
AstVar of (start : number, len : number)
AstBin of (op : number, lhs : Ast, rhs : Ast)
AstNeg of (inner : Ast)
AstIf of (cond : Ast, thn : Ast, els : Ast)
AstCall of (callee_start : number, callee_len : number, args : ArgList)
...
Read AstBin: a binary operation holds an operator, a left subtree Ast, and a right subtree Ast. The type contains itself. That's the recursion of a tree: an addition whose two sides are, themselves, expressions. AstIf holds three (condition, then branch, else branch). AstNum and AstVar are leaves — they hold no other Ast.
Our example then becomes, exactly:
AstBin( + ,
AstVar(x),
AstBin( * , AstVar(y), AstNum(2)) )
The tree drawn above, written as a value. And a.b.c? AstField(AstField(AstVar(a), b), c) — the nesting follows the structure.
No pointers: an index arena
One problem remains. Verbose has no heap and no pointers — one of the reasons its binaries are so small and so verifiable. So how do you build a tree of arbitrary size?
The answer: an arena. All nodes live in a single bounded space, and a node points to its children by their index, not by a pointer.
concept_group VExpr [max_depth: 4096, max_nodes: 65535]
arena: [0] AstVar(x)
[1] AstVar(y)
[2] AstNum(2)
[3] AstBin( * , lhs=1, rhs=2) ← references indices 1 and 2
[4] AstBin( + , lhs=0, rhs=3) ← the root
The tree is built bottom-up: leaves first, then the nodes that link them. max_depth: 4096, max_nodes: 65535 aren't decorative — they're the static bounds the verifier needs to prove everything stays finite. No dynamic allocation, no possible overflow, and yet a tree of any shape.
In the same group live the tokens, the environments, and the diagnostics — all variants of VExpr, all linked by index. One arena for the whole front end.
Why this brick first
Because everything else plugs into it. The tokenizer will produce Tokens in this arena. The parser will consume them to build Asts. The analyses will walk the tree to find your mistakes. The interpreter will descend it to compute a result. Without a way to represent the tree — recursive, bounded, verifiable — there's no compiler at all.
And it's the direct payoff of what we built before: the recursion of chapter 1, the termination proofs of chapter 3. An AST is the recursive structure par excellence — and Verbose represents it under the same guarantees as everything else: bounded, pointerless, proven finite.
The program has become data. The next chapter builds it from raw text: the tokenizer.
Originally published on arcker.org, where the full series lives.
Top comments (0)