DEV Community

arcker
arcker

Posted on • Originally published at arcker.org

A compiler written in its own language just emitted a standalone Linux executable

Verbose is a small experimental language I'm building. Its compiler proves properties about your code — like termination — and emits tiny, readable x86-64 machine code: no runtime, no GC, no libc. This post stands on its own (you don't need the rest of the series). What it's about: a compiler written in Verbose itself reached the point where a single rule emits a standalone Linux ELF that runs and prints its answer.

Honesty first: this lives on a working branch — not merged, not tagged. It's verified (green test suite, real in-memory execution, a measured benchmark), but it isn't in a released version, and it covers a sub-language (scalar arithmetic), not all of Verbose. More on that at the end.

Two halves of a compiler

A compiler has a half that reads (text → tokens → tree) and a half that writes (tree → machine code). The reading half is the usual story. The new part is the writing half — and the thing that makes it interesting here is that the code generator is itself a Verbose rule. The Rust compiler underneath (verbosec) only runs it; the machine code is emitted by rules written in the language.

Follow factorial all the way down

@proof termination decreasing : n
rule fact (n : number [0, 20]) -> number {
  if n == 0 then 1 else n * fact(n - 1)
}
Enter fullscreen mode Exit fullscreen mode

Note the guardrails that survive all the way to the bytes: @proof termination decreasing : n (the compiler requires proof the recursion ends) and [0, 20] (the declared domain of n). The generator walks this tree and emits, per node:

  n == 0              cmp ; sete            → boolean in a register
  if … then … else …  jz over_then ; <then> ; jmp end ; over_then: <else> ; end:
  n * fact(n - 1)     <compute fact(n-1)> ; imul
  fact(n - 1)         call <offset>         → it calls itself
Enter fullscreen mode Exit fullscreen mode

The jumps carry computed offsets: you only know a jump's distance once both branches are emitted, so the generator runs twice — measure, then write — the way you number a book's pages only after laying it out.

The milestone is that call/ret. When fact calls itself, it's a stack of plates you build up, then unwind while multiplying:

  build up                   unwind (ret)
  fact(3) n=3                3 × 2 = 6   ◄── final
   └ fact(2) n=2             2 × 1 = 2
      └ fact(1) n=1          1 × 1 = 1
         └ fact(0) n=0 ─► 1  (base case)
Enter fullscreen mode Exit fullscreen mode

This is the first time a rule written in Verbose emits machine code that references itself.

From a byte blob to a real executable

Bytes in a buffer don't run on their own — you have to wrap them so Linux will launch them. That wrapper is the ELF format, and the rule now generates the whole thing, like a ready-to-ship parcel:

  ELF64 header         ← "this is a Linux executable"
  LOAD program header  ← "map this into memory, r+x"
  _start               ← call main ; rax → itoa ; sys_write ; sys_exit
  code for main, fact  ← the computation itself
Enter fullscreen mode Exit fullscreen mode

No sections, no symbols, no linked library. A few hundred bytes. The last brick added itoa (int → decimal ASCII) in machine code — without it the binary computed 120 but couldn't show it. Now:

$ ./verbosec examples/factorial.verbose --run lower factorial 5 > a.out
$ chmod +x a.out
$ ./a.out
120
Enter fullscreen mode Exit fullscreen mode

Standalone — no verbosec needed to run the result.

The honest benchmark

The long-standing question — "what does a binary built this way actually buy versus C/Rust/Go?" — finally has a measured answer (Ubuntu 24.04, gcc 13.3, rustc 1.90, go 1.25.4):

  trivial (print a constant, exit)
              size        wall(ms)  RSS(kB)  syscalls
  verbose        512       0.19      352        2
  gcc        706 584       0.24      664       17
  rustc    3 871 984       0.57    2 112       62
  go       2 254 335       1.28    2 132      228

  fib(40) = 102334155  (deep recursion — raw compute)
              size        wall(ms)  RSS(kB)  syscalls
  verbose        635      ~713       352        2
  gcc        706 584      ~135       664       17
  rustc    3 872 832      ~224     2 216       62
  go       2 254 624      ~380     2 232      406
Enter fullscreen mode Exit fullscreen mode

Read it both ways. Wins, by construction: ~1100–1380× smaller than gcc -static, ~6000× vs rustc; two syscalls vs 17–228; RSS down 2–6×. Because there's no runtime — nothing to initialize, no global allocator, no dynamic library. These are audit-defensible: there's almost nothing to audit. Loss, by design: on fib(40), Verbose is ~5× gcc's wall time. The emitted blob has seen no optimizer — no register allocation, no inlining, no constant folding past the AST. That was a decision (a direct, line-by-line auditable emitter, not an LLVM backend), and the compute cost is the price of that legibility — published, not hidden.

What isn't there yet

  • It's a sub-language: scalar arithmetic (+ − × ÷ %, comparisons, and/or, if/else, let, recursive calls). No strings, no I/O, no services, no TLS.
  • It's branch-only, not merged, not tagged.
  • No optimization — deliberately deferred to keep every byte auditable.

The point isn't "Verbose beats gcc." It's "here's exactly what Verbose buys, and what it costs, measured." The read half and the write half of a self-hosting compiler just met one notch closer.

Full walk-through (in French) on arcker.org.

Top comments (0)