AI agents make a lot of API calls. Most of them are cheap tasks disguised as expensive ones.
Here's the breakdown and the fix.
What an agent session actually costs
A typical agent loop for "add error handling to this function":
- Read system prompt + context → $0.04
- Parse the task → $0.02
- Read the target file → $0.01
- Plan the changes → $0.04
- Write the edit → $0.08
- Verify the output → $0.04
- Report back → $0.01
Total: ~$0.24 at Opus pricing for one small task.
Do this 20 times in a day: $4.80. Scale to a team of 5: $24/day, $720/month, just for one agent on small tasks.
The math gets worse for agents with tool use, multi-step reasoning, and retrieval loops. GPT-5.2 or Opus on every step of a 10-step agent workflow = $2-5 per workflow execution.
The problem: most agent calls aren't complex
Look at what an agent actually does step-by-step:
| Step | What it requires | Cheapest capable model |
|---|---|---|
| Parse user intent | Basic NLP | Gemini Flash ($0.0001/call) |
| Read a file | No reasoning needed | Gemini Flash ($0.0001/call) |
| Check if task is done | Simple comparison | Gemini Flash ($0.0001/call) |
| Write a function | Code generation | Sonnet-class ($0.01/call) |
| Debug complex logic | Deep reasoning | Opus-class ($0.08/call) |
| Plan a multi-step refactor | Architecture thinking | Opus-class ($0.08/call) |
| Confirm with user | Conversation | Gemini Flash ($0.0001/call) |
The expensive model (Opus) is right for 2-3 steps. The cheap model handles 4-5 steps fine. But agents typically use one model for everything.
The fix: route by step type
Two approaches depending on how your agent is built.
Approach 1: Manual routing in your agent code
from openai import OpenAI
client = OpenAI(api_key="your-key")
def run_agent_step(step_type: str, messages: list):
"""Route agent steps to appropriate models."""
# Simple steps: fast, cheap
simple_steps = {"parse", "check_done", "read_file", "confirm"}
# Complex steps: need capability
complex_steps = {"plan", "debug", "architect", "refactor"}
if step_type in simple_steps:
model = "google/gemini-3-flash-preview"
elif step_type in complex_steps:
model = "anthropic/claude-opus-4-6"
else:
model = "google/gemini-3-pro-preview" # default
return client.chat.completions.create(
model=model,
messages=messages
)
Cost reduction: ~60-70% depending on your step distribution.
Downside: You maintain the routing logic. Every new step type needs a decision.
Approach 2: Let the routing layer decide
Point your agent at an auto-routing endpoint. The router classifies each call and picks the model:
client = OpenAI(
base_url="https://www.komilion.com/api/v1",
api_key="ck_your_key"
)
# Every step uses the same model string
# The router reads the prompt and picks the right model
response = client.chat.completions.create(
model="neo-mode/balanced", # classifier picks frugal/balanced/premium
messages=messages
)
# See what was actually used:
# response["komilion"]["neo"]["brainModel"]
Cost reduction: Similar to manual routing (60-80%), but automatic.
Downside: You don't control which exact model runs. You see it in the response metadata but can't predict it in advance.
Override for critical steps
For steps where quality absolutely matters — final output, user-facing decisions — override to premium:
# Most steps: auto-route
response = client.chat.completions.create(
model="neo-mode/balanced",
messages=messages
)
# Critical final output: pin to Opus
if step_type == "final_output":
response = client.chat.completions.create(
model="neo-mode/premium", # always Opus 4.6
messages=messages
)
This hybrid approach uses cheap models for scaffolding and reserves Opus for output that users actually see.
Real numbers for a 10-step agent
Agent workflow: parse → plan → read files (×3) → implement (×2) → test → review → output
| Approach | Cost/run | 100 runs/month |
|---|---|---|
| Opus everything | $2.40 | $240 |
| Manual routing | $0.72 | $72 |
| Auto-routing (balanced) | $0.58 | $58 |
| Flash everything | $0.03 | $3 (quality degrades) |
The 240x difference between Opus-everything and Flash-everything is real — but quality degrades on Flash for the hard steps. The sweet spot is routing: $58-72/month vs $240, with Opus still handling the complex steps.
What to watch out for
Context window bleed. Agents often append conversation history to every call. A 10-step agent where each step adds 1K tokens to the context = 55K total input tokens on the final step. Your routing decision about step complexity matters less than your context management.
Tool call overhead. Every tool call is a round-trip API call. An agent that calls 5 tools per step at Opus pricing = 5× the cost per step. Use cheap models for tool parsing, expensive models for reasoning.
Retry loops. If an agent retries a failed step 3 times, you've paid 3× for one step. Add exponential backoff AND downgrade the model on retry (if it failed once, trying a different model is more useful than the same expensive model again).
The two-line change
If you're using any OpenAI-compatible agent framework (LangChain, AutoGen, CrewAI, custom), the change is:
# Before:
client = OpenAI(api_key="sk-...")
# After:
client = OpenAI(
base_url="https://www.komilion.com/api/v1",
api_key="ck_your_komilion_key"
)
Same code. Different cost profile. The routing happens transparently.
$5 free at komilion.com — no card, start testing immediately.
Top comments (0)