How I cut my multi-turn LLM API costs by 90% (O(N²) → O(N))
If you build multi-turn AI agents, you know the pain: API costs don't grow linearly, they grow quadratically.
Every turn in a standard agent loop replays the full conversation history. Token cost on turn N is proportional to N, so total cost across N turns is Θ(N²). I hit a wall where a single heavy day of coding consumed 97% of my weekly Anthropic quota.
So I built Burnless — an open protocol and orchestration layer that flips the cost curve from O(N²) to O(N).
The result: It cut my real-world API consumption by ~16x, and benchmarks at 90.3% cheaper against naive Claude Opus.
Rudekwydra
/
burnless
Multi-turn agent loops cost O(N²). Burnless makes them O(N). 88% cheaper at turn 10. MIT, provider-agnostic.
Burnless
Intent-compressed intelligence orchestration.
A maestro that orchestrates any LLM from any vendor. Multi-turn agent loops cost O(N²) — Burnless makes them O(N).
Burnless is a vendor-agnostic orchestration layer for multi-agent workflows. You pick the model that conducts the orchestra (Maestro / Brain) — Claude, GPT, Gemini, Mistral, a local Llama, anything — and the models that execute each task (Workers). Tiers are quality/cost bands, not vendors: gold/silver/bronze map to whatever CLI you put in config.yaml. Mix providers freely. Run encoder and decoder on a local Ollama model for zero marginal cost on the cheap stages.
On top of that independence, Burnless flips the cost curve. Every turn in a standalone agent loop replays the full conversation as input — token cost on turn N is proportional to N, so total cost across N turns is Θ(N²). Burnless keeps only short capsules in…
How it works
The math relies on two specific mechanisms working together:
-
Shared Prefix Cache: The persistent system prompt (which can be 20k+ tokens) is cached using Anthropic's prompt caching (
ttl: 1h). Switching models from the same provider mid-session does not invalidate this cache if the prefix is byte-identical. - Capsule History: Instead of keeping raw transcripts in the agent's memory, the "Maestro" model only holds ~80-character compressed "capsules" of prior turns.
The result is that your quadratic history term collapses into a tiny linear one, while the massive system prompt is billed at cache-read prices (which is roughly 10x cheaper than fresh input on Anthropic).
The Benchmark (No Mocks)
If you want the formal derivation, I published a reproducible benchmark that uses the Anthropic SDK directly and reads raw response.usage.
Against Claude 3 Opus on a 10-turn session:
- Standalone (no cache): $4.66
- Standalone (+ cache): $0.65
- Burnless Maestro: $0.45 (-90.3%)
And this math applies to any provider that exposes prompt caching and charges per input token.
Vendor Agnostic Orchestration
Burnless isn't a wrapper for a single API. Tiers are quality/cost bands, not vendors. You can mix and match any CLI you already have installed:
agents:
gold: { command: "claude --model claude-sonnet-4-6 -p" } # The Brain
silver: { command: "codex exec --sandbox workspace-write" } # Execution
bronze: { command: "ollama run qwen2.5-coder" } # Local, zero marginal cost
The CLI works today. If you're building agents and burning through tokens, give it a try:
pip install burnless
burnless setup
Would love to hear your thoughts on the architecture, especially if you're working on local encoding/decoding for privacy!
Top comments (0)