Your AI Agent Writes Python. What If It Compiled to Native?
Who this is for. If you're building agentic workflows where LLMs generate and execute code — the execution speed of that code directly affects your agent's throughput. This article measures it.
Token efficiency is half the story. The other half: how fast does the generated code actually run? We benchmarked Synoema's Cranelift JIT against Python, Node.js, TypeScript (tsx), and C++ (-O2) across 12 algorithmic tasks.
Part of Token Economics of Code series.
Methodology
Hardware: Apple Silicon (macOS Darwin 25.3.0)
Runtimes: Synoema JIT (Cranelift, --release), CPython 3.12, Node.js (V8), TypeScript via tsx, C++ (g++ -O2)
Measurement: 3 warm-up runs discarded, 5 measured runs, median reported with p5/p95 percentiles.
Fairness: Identical algorithms across all languages. No language-specific optimizations.
cargo run --manifest-path benchmarks/runner/Cargo.toml -- run --phases runtime -v
Results: Overview
| Language | Avg median (ms) | vs Synoema |
|---|---|---|
| C++ (-O2) | 2.0 | 2.5x faster |
| Synoema JIT | 5.2 | baseline |
| Python 3.12 | 27.6 | 5.3x slower |
Results: Per-Task (12 tasks)
| Task | C++ (ms) | Synoema (ms) | Python (ms) | Synoema vs Python |
|---|---|---|---|---|
| binary_search | 2.1 | 7.4 | 16.7 | 2.3x faster |
| collatz | 2.3 | 5.7 | 16.4 | 2.9x faster |
| factorial | 1.4 | JIT fail | 17.2 | -- |
| fibonacci | 3.7 | JIT fail | 145.6 | -- |
| filter_map | 2.3 | 5.2 | 16.6 | 3.2x faster |
| fizzbuzz | 1.7 | 5.7 | 16.8 | 3.0x faster |
| gcd | 2.4 | 5.6 | 16.8 | 3.0x faster |
| matrix_mult | 1.5 | 8.4 | 17.6 | 2.1x faster |
| mergesort | 2.1 | 6.6 | 17.4 | 2.6x faster |
| quicksort | 1.4 | 6.0 | 16.7 | 2.8x faster |
| string_ops | 2.0 | 5.1 | 16.3 | 3.2x faster |
| tree_traverse | 1.5 | 6.5 | 17.0 | 2.6x faster |
factorial and fibonacci fail in JIT mode (known limitation -- being addressed).
Analysis
JIT Compilation Overhead
Synoema's times include Cranelift JIT compilation (10-50ms one-time cost). For short tasks, this overhead is visible. For longer computations, it's negligible.
Key insight: JIT overhead is constant, interpreter overhead is proportional to work.
Where Synoema Wins
- Recursive algorithms: no interpreter loop overhead
- Tight numeric loops (collatz, gcd): native integer operations
- Pattern matching: compiled to jump tables
Where Synoema Loses
- String-heavy operations: Python's C-implemented string library is highly optimized
- Very short programs: JIT overhead dominates when computation < 10ms
- vs C++ always: Cranelift generates ~86% quality code vs LLVM/GCC
Honest Comparison
The comparison that matters for AI agents:
Synoema (JIT, type-safe, fewer tokens on functional code)
vs
Python (interpreted, duck-typed, dominant in LLM generation)
Implications for AI Agents
Python: generate (1.5s) -> interpret (Nms)
Synoema: generate (0.8s, fewer tokens) -> JIT (50ms) -> native (N/4 ms)
The real question: what's the total cost of the generate -> execute -> analyze cycle? Token efficiency + compilation speed + type guarantees create compound savings.
Try It
git clone https://github.com/synoema/synoema
cd synoema
cargo run --manifest-path benchmarks/runner/Cargo.toml -- run --phases runtime -v
What's Next
Next: we sent the same prompts to 10 LLM models and measured who generates correct Synoema code.
*Part of Token Economics of Code series by @andbubnov.*llmprogrammingrust
Top comments (0)