DEV Community

Cover image for Compilation for LLMs: Why a Language for Models Needs Native Code
delimitter
delimitter

Posted on

Compilation for LLMs: Why a Language for Models Needs Native Code

Cranelift JIT, 2.8--5.9x Faster Than Python, and Why It Matters for AI Agents


Who this is for. If you're building AI agents that generate and execute code, or want to understand why compiled LLM output isn't science fiction but working technology -- read on. All terms explained inline and in the glossary.


In previous articles, we showed how to cut tokens by 46% and guarantee syntactic correctness. But there's a third problem: generated code must not only be short and correct -- it must be fast.

Context: LLM Agents Write and Run Code

Claude Code, Cursor, Devin, OpenAI Codex -- these tools don't just generate code. They execute it: run tests, process data, call APIs. The cycle "generate -> run -> analyze result -> repeat" is the foundation of agentic workflows.

Agentic workflows -- an approach where an LLM acts as an autonomous "agent": receives a task, breaks it into steps, writes code, runs it, analyzes the result, and adjusts.

The problem: almost all agents generate Python. And Python is interpreted.

Interpreted language -- a language whose code is executed "line by line" by an interpreter, without prior compilation to machine code. Interpreted languages are simpler but 10--100x slower than compiled ones.

This means: every run goes through the CPython interpreter (slow, single-threaded), no code optimization (Python doesn't know types until runtime via duck typing), and serious computation requires C-based libraries (NumPy, pandas).

Duck typing -- Python's principle: "if it walks like a duck and quacks like a duck, it's a duck." Type errors are discovered only at runtime.

The Solution: JIT Compilation

What if LLM-generated code compiles to native machine code in milliseconds and runs at C speed?

JIT (Just-In-Time) compilation -- compiling code to machine instructions immediately before execution, "on the fly." No separate build step. LLM generates code -> JIT compiles in milliseconds -> native execution speed.

LLM generates code (.sno)
    |
Parser -> AST -> Type Check -> Core IR
    |
Cranelift JIT -> native x86-64 machine code
    |
Execution at C/Rust speed (no interpreter)
Enter fullscreen mode Exit fullscreen mode

The entire cycle -- from text to native code -- takes < 100 ms.

Why Cranelift, Not LLVM

LLVM -- the industry standard. Used in Clang (C/C++), Rust, Swift, Julia. Generates very fast code but compiles slowly. Written in C++, pulls gigabytes of dependencies.

Cranelift -- written in pure Rust. Compiles 10x faster than LLVM. Generates code ~86% the quality of LLVM. Ideal for JIT.

Criterion LLVM Cranelift
Language C++ Rust
Compilation speed 1x 10x
Code quality 100% ~86%
Dependencies Gigabytes cargo build
Ideal for AOT compilation JIT compilation

Benchmarks: Synoema JIT vs Python vs C++

Methodology: median of 5 runs, 3 warm-up discarded. All times include process startup; Synoema times include JIT compilation.

Full Suite (10 tasks)

Task C++ Synoema JIT Python Synoema vs Python
quicksort 1.4 ms 6.0 ms 16.7 ms 2.8x
mergesort 2.1 ms 6.6 ms 17.4 ms 2.6x
binary_search 2.1 ms 7.4 ms 16.7 ms 2.3x
tree_traverse 1.5 ms 6.5 ms 17.0 ms 2.6x
filter_map 2.3 ms 5.2 ms 16.6 ms 3.2x
collatz 2.3 ms 5.7 ms 16.4 ms 2.9x
gcd 2.4 ms 5.6 ms 16.8 ms 3.0x
fizzbuzz 1.7 ms 5.7 ms 16.8 ms 3.0x
matrix_mult 1.5 ms 8.4 ms 17.6 ms 2.1x
string_ops 2.0 ms 5.1 ms 16.3 ms 3.2x
Average 1.9 ms 6.2 ms 16.8 ms 2.8x

Compute-Heavy Tasks

Task Python Synoema JIT Speedup
fib(30) 277 ms 47 ms 5.9x
collatz (10K) 505 ms 90 ms 5.6x
gcd (100K) 143 ms 83 ms 1.7x
Average 4.4x

What the Numbers Mean

Micro-benchmarks: 2.1--3.2x faster. Startup overhead dominates.

Compute-heavy tasks: up to 5.9x faster. JIT-compiled native code pulls ahead as startup cost amortizes.

C++ context: C++ runs 3x faster than Synoema JIT on average -- expected since Cranelift generates ~86% quality code vs LLVM. Trade-off: Synoema compiles in < 100 ms (no build step).

Architecture Pipeline

Source code (.sno)
  |
  +-- Lexer (735 lines, 82 tests)
  +-- Parser (1,672 lines, 43 tests) -- Pratt parser -> AST
  +-- Type Checker (1,908 lines, 61 tests) -- Hindley-Milner
  +-- Core IR (1,536 lines, 44 tests) -- System F
  +-- Diagnostics -- structured errors, LLM hints
  +-- Backend:
      +-- Interpreter (1,894 lines, 119 tests)
      +-- Cranelift JIT (3,044 lines, 126 tests)
Enter fullscreen mode Exit fullscreen mode

8 crates, ~12,000 lines of Rust, 890+ tests, 0 errors.

What This Means for AI Agents

With Python: LLM generates script (200 tokens, 1.5s) -> Python processes (12s) -> total ~15 seconds.

With Synoema: LLM generates sno code (108 tokens, 0.8s) -> JIT (50ms) -> native (3s) -> total ~4 seconds.

Savings: 73% time, 46% tokens, zero dependencies.

What's Changed Since We Started

  • 890+ tests (from 264), all passing, 0 warnings
  • JIT supports: closures, records, ADTs, pattern matching, modules, TCO, string stdlib, float arithmetic, type class dispatch
  • Prelude: Result type with combinators (map_ok, unwrap, and_then)
  • MCP server: npx synoema-mcp integrates into LLM toolchains
  • Region inference: memory management without GC
  • Diagnostics: structured errors with LLM-friendly hints

Try It

cargo build --release -p synoema-repl
cargo run -p synoema-repl -- jit examples/quicksort.sno
cargo run -p synoema-repl -- eval "6 * 7"
Enter fullscreen mode Exit fullscreen mode

Source: github.com/Delimitter/synoema

What's Next

Next: Hindley-Milner -- 100% type safety with zero annotations. This is what makes type-guided constrained decoding possible.


Part of Token Economics of Code series by @andbubnov.

Top comments (0)