delimitter

Posted on Apr 4

Compilation for LLMs: Why a Language for Models Needs Native Code

#rust #llm #programming #compiler

Cranelift JIT, 2.8--5.9x Faster Than Python, and Why It Matters for AI Agents

Who this is for. If you're building AI agents that generate and execute code, or want to understand why compiled LLM output isn't science fiction but working technology -- read on. All terms explained inline and in the glossary.

In previous articles, we showed how to cut tokens by 46% and guarantee syntactic correctness. But there's a third problem: generated code must not only be short and correct -- it must be fast.

Context: LLM Agents Write and Run Code

Claude Code, Cursor, Devin, OpenAI Codex -- these tools don't just generate code. They execute it: run tests, process data, call APIs. The cycle "generate -> run -> analyze result -> repeat" is the foundation of agentic workflows.

Agentic workflows -- an approach where an LLM acts as an autonomous "agent": receives a task, breaks it into steps, writes code, runs it, analyzes the result, and adjusts.

The problem: almost all agents generate Python. And Python is interpreted.

Interpreted language -- a language whose code is executed "line by line" by an interpreter, without prior compilation to machine code. Interpreted languages are simpler but 10--100x slower than compiled ones.

This means: every run goes through the CPython interpreter (slow, single-threaded), no code optimization (Python doesn't know types until runtime via duck typing), and serious computation requires C-based libraries (NumPy, pandas).

Duck typing -- Python's principle: "if it walks like a duck and quacks like a duck, it's a duck." Type errors are discovered only at runtime.

The Solution: JIT Compilation

What if LLM-generated code compiles to native machine code in milliseconds and runs at C speed?

JIT (Just-In-Time) compilation -- compiling code to machine instructions immediately before execution, "on the fly." No separate build step. LLM generates code -> JIT compiles in milliseconds -> native execution speed.

LLM generates code (.sno)
    |
Parser -> AST -> Type Check -> Core IR
    |
Cranelift JIT -> native x86-64 machine code
    |
Execution at C/Rust speed (no interpreter)

The entire cycle -- from text to native code -- takes < 100 ms.

Why Cranelift, Not LLVM

LLVM -- the industry standard. Used in Clang (C/C++), Rust, Swift, Julia. Generates very fast code but compiles slowly. Written in C++, pulls gigabytes of dependencies.

Cranelift -- written in pure Rust. Compiles 10x faster than LLVM. Generates code ~86% the quality of LLVM. Ideal for JIT.

Criterion	LLVM	Cranelift
Language	C++	Rust
Compilation speed	1x	10x
Code quality	100%	~86%
Dependencies	Gigabytes	`cargo build`
Ideal for	AOT compilation	JIT compilation

Benchmarks: Synoema JIT vs Python vs C++

Methodology: median of 5 runs, 3 warm-up discarded. All times include process startup; Synoema times include JIT compilation.

Full Suite (10 tasks)

Task	C++	Synoema JIT	Python	Synoema vs Python
quicksort	1.4 ms	6.0 ms	16.7 ms	2.8x
mergesort	2.1 ms	6.6 ms	17.4 ms	2.6x
binary_search	2.1 ms	7.4 ms	16.7 ms	2.3x
tree_traverse	1.5 ms	6.5 ms	17.0 ms	2.6x
filter_map	2.3 ms	5.2 ms	16.6 ms	3.2x
collatz	2.3 ms	5.7 ms	16.4 ms	2.9x
gcd	2.4 ms	5.6 ms	16.8 ms	3.0x
fizzbuzz	1.7 ms	5.7 ms	16.8 ms	3.0x
matrix_mult	1.5 ms	8.4 ms	17.6 ms	2.1x
string_ops	2.0 ms	5.1 ms	16.3 ms	3.2x
Average	1.9 ms	6.2 ms	16.8 ms	2.8x

Compute-Heavy Tasks

Task	Python	Synoema JIT	Speedup
fib(30)	277 ms	47 ms	5.9x
collatz (10K)	505 ms	90 ms	5.6x
gcd (100K)	143 ms	83 ms	1.7x
Average			4.4x

What the Numbers Mean

Micro-benchmarks: 2.1--3.2x faster. Startup overhead dominates.

Compute-heavy tasks: up to 5.9x faster. JIT-compiled native code pulls ahead as startup cost amortizes.

C++ context: C++ runs 3x faster than Synoema JIT on average -- expected since Cranelift generates ~86% quality code vs LLVM. Trade-off: Synoema compiles in < 100 ms (no build step).

Architecture Pipeline

Source code (.sno)
  |
  +-- Lexer (735 lines, 82 tests)
  +-- Parser (1,672 lines, 43 tests) -- Pratt parser -> AST
  +-- Type Checker (1,908 lines, 61 tests) -- Hindley-Milner
  +-- Core IR (1,536 lines, 44 tests) -- System F
  +-- Diagnostics -- structured errors, LLM hints
  +-- Backend:
      +-- Interpreter (1,894 lines, 119 tests)
      +-- Cranelift JIT (3,044 lines, 126 tests)

8 crates, ~12,000 lines of Rust, 890+ tests, 0 errors.

What This Means for AI Agents

With Python: LLM generates script (200 tokens, 1.5s) -> Python processes (12s) -> total ~15 seconds.

With Synoema: LLM generates sno code (108 tokens, 0.8s) -> JIT (50ms) -> native (3s) -> total ~4 seconds.

Savings: 73% time, 46% tokens, zero dependencies.

What's Changed Since We Started

890+ tests (from 264), all passing, 0 warnings
JIT supports: closures, records, ADTs, pattern matching, modules, TCO, string stdlib, float arithmetic, type class dispatch
Prelude: Result type with combinators (map_ok, unwrap, and_then)
MCP server: npx synoema-mcp integrates into LLM toolchains
Region inference: memory management without GC
Diagnostics: structured errors with LLM-friendly hints

Try It

cargo build --release -p synoema-repl
cargo run -p synoema-repl -- jit examples/quicksort.sno
cargo run -p synoema-repl -- eval "6 * 7"

Source: github.com/Delimitter/synoema

What's Next

Next: Hindley-Milner -- 100% type safety with zero annotations. This is what makes type-guided constrained decoding possible.

Part of Token Economics of Code series by @andbubnov.

DEV Community