Why Your AI Coding Agent Gets Exponentially More Expensive (and What to Do About It)

#ai #programming #llm #productivity

If you're using Claude Code, Cursor, or any LLM-based coding agent, there's a cost pattern you should know about: your sessions get quadratically more expensive as they grow.

A detailed analysis from exe.dev breaks it down.

The problem

Every time the agent makes an API call, it reads the entire conversation history from the cache. The cost of those cache reads grows with both the context length AND the number of calls. That's not linear growth. It's quadratic.

The numbers

At 27,500 tokens: cache reads = 50% of total cost
At 100,000 tokens: cache reads = 87% of total cost
A single "ho-hum" feature implementation cost $12.93

The formula is roughly: total_cost = output_tokens * num_calls + cache_read_price * context_length * num_calls. That second term grows quadratically because both context_length and num_calls increase together.

What actually helps

1. Start fresh conversations more often

It feels wasteful to lose context, but re-establishing context is almost always cheaper than the growing cache read tax. A fresh session with a clear prompt costs a fraction of continuing a bloated conversation.

2. Use spec-bounded sessions

This is exactly why I built SpecWeave: each task has a clear spec with acceptance criteria, and the AI works within that boundary. Short, focused sessions instead of open-ended marathons.

When each task has a defined scope, you naturally keep conversations short. The AI knows when it's done because the spec tells it.

3. Delegate to sub-agents

Work done in a separate context window doesn't add to your main conversation's cache. If your agent framework supports sub-agents (Claude Code does), use them. The overhead of spawning a new context is almost always less than the cost of an ever-growing main context.

4. Let tools return large outputs in one call

Splitting a file read into five smaller reads is actually MORE expensive because each one adds another cache read of the full history. Batch your tool calls when possible.

The meta-lesson

Context management, cost management, and agent orchestration are all the same problem. The developers building workflows that respect these constraints will ship faster and cheaper than those who let agents run unbounded.

The teams that figure this out early have a real advantage. Not because they're smarter, but because they're spending 3x less per feature while shipping at the same velocity.

Full analysis: Why AI Agents Are Expensively Quadratic

What cost patterns have you noticed in your AI coding workflows?

Top comments (1)

Harjot Singh • May 30

The exponential part is the insight most cost posts miss: it's the growing context window. Every turn the agent re-sends the whole conversation + file context, so token cost per step climbs as the session goes on, even though each individual ask feels small. The bill isn't linear in your effort, it's quadratic-ish in session length.

Two levers fall out of that: (1) keep the prefix stable and prune aggressively so you ride the cache instead of re-paying for context, and (2) don't run the whole bloated context through a frontier model when a cheap model would handle the step. Route by difficulty and reset context often, and the curve flattens hard. Really good explainer - the "what to do about it" section is the part people actually need.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.