Cranelift JIT, 2.8--5.9x Faster Than Python, and Why It Matters for AI Agents
Who this is for. If you're building AI agents that generate and execute code, or want to understand why compiled LLM output isn't science fiction but working technology -- read on. All terms explained inline and in the glossary.
In previous articles, we showed how to cut tokens by 46% and guarantee syntactic correctness. But there's a third problem: generated code must not only be short and correct -- it must be fast.
Context: LLM Agents Write and Run Code
Claude Code, Cursor, Devin, OpenAI Codex -- these tools don't just generate code. They execute it: run tests, process data, call APIs. The cycle "generate -> run -> analyze result -> repeat" is the foundation of agentic workflows.
Agentic workflows -- an approach where an LLM acts as an autonomous "agent": receives a task, breaks it into steps, writes code, runs it, analyzes the result, and adjusts.
The problem: almost all agents generate Python. And Python is interpreted.
Interpreted language -- a language whose code is executed "line by line" by an interpreter, without prior compilation to machine code. Interpreted languages are simpler but 10--100x slower than compiled ones.
This means: every run goes through the CPython interpreter (slow, single-threaded), no code optimization (Python doesn't know types until runtime via duck typing), and serious computation requires C-based libraries (NumPy, pandas).
Duck typing -- Python's principle: "if it walks like a duck and quacks like a duck, it's a duck." Type errors are discovered only at runtime.
The Solution: JIT Compilation
What if LLM-generated code compiles to native machine code in milliseconds and runs at C speed?
JIT (Just-In-Time) compilation -- compiling code to machine instructions immediately before execution, "on the fly." No separate build step. LLM generates code -> JIT compiles in milliseconds -> native execution speed.
LLM generates code (.sno)
|
Parser -> AST -> Type Check -> Core IR
|
Cranelift JIT -> native x86-64 machine code
|
Execution at C/Rust speed (no interpreter)
The entire cycle -- from text to native code -- takes < 100 ms.
Why Cranelift, Not LLVM
LLVM -- the industry standard. Used in Clang (C/C++), Rust, Swift, Julia. Generates very fast code but compiles slowly. Written in C++, pulls gigabytes of dependencies.
Cranelift -- written in pure Rust. Compiles 10x faster than LLVM. Generates code ~86% the quality of LLVM. Ideal for JIT.
| Criterion | LLVM | Cranelift |
|---|---|---|
| Language | C++ | Rust |
| Compilation speed | 1x | 10x |
| Code quality | 100% | ~86% |
| Dependencies | Gigabytes | cargo build |
| Ideal for | AOT compilation | JIT compilation |
Benchmarks: Synoema JIT vs Python vs C++
Methodology: median of 5 runs, 3 warm-up discarded. All times include process startup; Synoema times include JIT compilation.
Full Suite (10 tasks)
| Task | C++ | Synoema JIT | Python | Synoema vs Python |
|---|---|---|---|---|
| quicksort | 1.4 ms | 6.0 ms | 16.7 ms | 2.8x |
| mergesort | 2.1 ms | 6.6 ms | 17.4 ms | 2.6x |
| binary_search | 2.1 ms | 7.4 ms | 16.7 ms | 2.3x |
| tree_traverse | 1.5 ms | 6.5 ms | 17.0 ms | 2.6x |
| filter_map | 2.3 ms | 5.2 ms | 16.6 ms | 3.2x |
| collatz | 2.3 ms | 5.7 ms | 16.4 ms | 2.9x |
| gcd | 2.4 ms | 5.6 ms | 16.8 ms | 3.0x |
| fizzbuzz | 1.7 ms | 5.7 ms | 16.8 ms | 3.0x |
| matrix_mult | 1.5 ms | 8.4 ms | 17.6 ms | 2.1x |
| string_ops | 2.0 ms | 5.1 ms | 16.3 ms | 3.2x |
| Average | 1.9 ms | 6.2 ms | 16.8 ms | 2.8x |
Compute-Heavy Tasks
| Task | Python | Synoema JIT | Speedup |
|---|---|---|---|
| fib(30) | 277 ms | 47 ms | 5.9x |
| collatz (10K) | 505 ms | 90 ms | 5.6x |
| gcd (100K) | 143 ms | 83 ms | 1.7x |
| Average | 4.4x |
What the Numbers Mean
Micro-benchmarks: 2.1--3.2x faster. Startup overhead dominates.
Compute-heavy tasks: up to 5.9x faster. JIT-compiled native code pulls ahead as startup cost amortizes.
C++ context: C++ runs 3x faster than Synoema JIT on average -- expected since Cranelift generates ~86% quality code vs LLVM. Trade-off: Synoema compiles in < 100 ms (no build step).
Architecture Pipeline
Source code (.sno)
|
+-- Lexer (735 lines, 82 tests)
+-- Parser (1,672 lines, 43 tests) -- Pratt parser -> AST
+-- Type Checker (1,908 lines, 61 tests) -- Hindley-Milner
+-- Core IR (1,536 lines, 44 tests) -- System F
+-- Diagnostics -- structured errors, LLM hints
+-- Backend:
+-- Interpreter (1,894 lines, 119 tests)
+-- Cranelift JIT (3,044 lines, 126 tests)
8 crates, ~12,000 lines of Rust, 890+ tests, 0 errors.
What This Means for AI Agents
With Python: LLM generates script (200 tokens, 1.5s) -> Python processes (12s) -> total ~15 seconds.
With Synoema: LLM generates sno code (108 tokens, 0.8s) -> JIT (50ms) -> native (3s) -> total ~4 seconds.
Savings: 73% time, 46% tokens, zero dependencies.
What's Changed Since We Started
- 890+ tests (from 264), all passing, 0 warnings
- JIT supports: closures, records, ADTs, pattern matching, modules, TCO, string stdlib, float arithmetic, type class dispatch
- Prelude: Result type with combinators (map_ok, unwrap, and_then)
-
MCP server:
npx synoema-mcpintegrates into LLM toolchains - Region inference: memory management without GC
- Diagnostics: structured errors with LLM-friendly hints
Try It
cargo build --release -p synoema-repl
cargo run -p synoema-repl -- jit examples/quicksort.sno
cargo run -p synoema-repl -- eval "6 * 7"
Source: github.com/Delimitter/synoema
What's Next
Next: Hindley-Milner -- 100% type safety with zero annotations. This is what makes type-guided constrained decoding possible.
Part of Token Economics of Code series by @andbubnov.
Top comments (0)