Originally published at claudeguide.io/how-to-limit-claude-agent-costs
How to Limit Claude Agent Costs: 7 Concrete Strategies
The fastest way to control Claude agent costs: route simple tasks to Haiku ($1.00/MTok input), cache repeated system prompts (up to 90% off cached tokens), and set a hard token budget per session that triggers a graceful shutdown. Combined, these three strategies alone can cut a typical agentic workload bill by 70–85% before touching anything else. The remaining five strategies below push further. For the full pricing table and model-by-model breakdown, see Claude API pricing 2026.
Why Agent Costs Can Spiral
A single-turn Claude API call is easy to cost-estimate: tokens in × price + tokens out × price. Agents are different.
In an agentic loop, every tool call appends more tokens to the conversation history. A 10-step agent run on Opus with a 50K-token context doesn't cost 10× a single call — it costs roughly 10 + 9 + 8 + ... + 1 = 55× the base input price for the accumulated context alone. Add tool outputs that include long JSON responses or file contents, and a single agent run can consume millions of tokens.
The compounding factors:
- Recursive tool calls: Each tool result is appended to context before the next call
- Large context windows: Passing full conversation history on every turn multiplies input costs
- Expensive model for all steps: Using Opus for a task that only needed Haiku
- Unstructured output: Claude asks clarifying questions instead of following a schema, adding extra round-trips
- No stopping criteria: An agent with no budget cap will keep calling tools until it runs out of ideas
The strategies below address each of these root causes directly.
Strategy 1: Model Routing
Savings potential: 60–90% on model cost
Claude's three production tiers as of April 2026:
| Model | Input price | Output price | Best for |
|---|---|---|---|
| claude-haiku-4-5 | $1.00/MTok | $5.00/MTok | Classification, extraction, simple Q&A |
| claude-sonnet-4-5 | $3.00/MTok | $15.00/MTok | Code generation, analysis, multi-step reasoning |
| claude-opus-4-5 | $5.00/MTok | $25.00/MTok | Complex judgment, novel problems, creative work |
Most production agents do 80% of their work at a Haiku level. Routing correctly means using Haiku by default and escalating only when needed. For a detailed comparison of when each model earns its cost, see Haiku vs Sonnet vs Opus: which model to use.
Routing heuristics:
python
def select_model(task_type: str, complexity_score: int) -
Complete, runnable Python and TypeScript code throughout.
[→ Get Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&utm_medium=article&utm_campaign=how-to-limit-claude-agent-costs)
*30-day money-back guarantee. Instant download.*
Top comments (0)