Most developers wildly underestimate what they spend on AI APIs until they see the month-end bill.
Here's the actual cost math for common workflows, and where the money is quietly going.
The baseline numbers (Feb 2026)
| Model | Input | Output | Effective per 1K tokens |
|---|---|---|---|
| Claude Opus 4.6 | $15/M | $75/M | ~$0.045 |
| Claude Sonnet 4.6 | $3/M | $15/M | ~$0.009 |
| GPT-4o | $2.5/M | $10/M | ~$0.006 |
| Gemini 3 Pro | $3.5/M | $10.5/M | ~$0.007 |
| Gemini 3 Flash | $0.075/M | $0.30/M | ~$0.00019 |
| DeepSeek V3.2 | $0.27/M | $1.10/M | ~$0.00069 |
These are list prices. The effective cost per useful task is what actually matters.
(Padme: replaced GPT-5.2 — banned, throws 400 errors on OpenRouter, fabricated model name. Used GPT-4o pricing instead. Harvey: verify GPT-4o Feb 2026 pricing before publish — $2.5/M input, $10/M output is the last confirmed rate but may have changed.)
Real cost by workflow
Coding assistant (Cline, Cursor, Aider, Claude Code)
A typical coding session: 30 API calls, mix of file reads, edits, confirmations, debug loops.
Average tokens per call: ~3,000 (input 2,500 + output 500)
30 calls × 3,000 tokens = 90,000 tokens
At Opus 4.6: 90K × $0.045/1K = $4.05 per session
At Sonnet 4.6: 90K × $0.009/1K = $0.81 per session
At Gemini Flash: 90K × $0.00019/1K = $0.017 per session
10 sessions/month on Opus = $40.50. On Sonnet = $8.10.
The issue: most of those 30 calls don't need Opus. File reads, "what does this do?" questions, formatting — these work fine on cheap models. Only the hard debugging, architecture work, and complex edits actually need the top model.
Chat / Question-answering
Average tokens per call: ~800 (input 600 + output 200)
100 calls/day × 800 = 80,000 tokens/day
At Opus 4.6: $3.60/day → $108/month
At Gemini 3 Pro: $0.56/day → $16.80/month
At Gemini Flash: $0.015/day → $0.46/month
For general Q&A, you're paying 230× more for Opus vs Flash. Most Q&A does not need that 230× quality premium.
Long-context summarization (PDFs, docs, codebase analysis)
This is where costs spike:
Large document: 100K input tokens + 2K output
At Opus 4.6: $1.50 + $0.15 = $1.65 per document
At Gemini 3 Pro: $0.35 + $0.021 = $0.371 per document
At Gemini Flash: $0.0075 + $0.0006 = $0.0081 per document
Summarization is often context-heavy but output-simple. A 100K-token document summary doesn't usually need Opus.
Where the money quietly goes
Agentic loops. When an AI tool makes 15 API calls to complete one task, 12 of them are probably tool confirmations, status checks, or retries. All at full model price.
Repo-map reads. Aider, Cline, and similar tools send a repo map on many calls to provide context. If your repo is 50K tokens and you're billed per call, 20 calls = 1 million input tokens just for context, regardless of the actual task complexity.
Failed attempts. Every error loop, retry, and "I need more context" followup costs the same as a successful call.
Tab completions. If you use Continue.dev or a tool with active completions, those fire constantly. 300 completions/day × Opus pricing = potentially $165/day.
The fix: match the model to the task
The model that's right for "what does this function return?" is not the same model that's right for "redesign this authentication architecture."
Two approaches:
Manual switching. Change your config when task complexity changes. Works, but people don't actually do this consistently.
Automatic routing. Use an API layer that classifies each request and selects the model. The request goes in, the right model handles it, the response comes out. No config changes.
For automatic routing: Komilion does this with a tier system — neo-mode/frugal for simple, balanced for standard, premium for complex. The classifier reads each request and routes it. Mixed workloads typically see 60-85% cost reduction vs Opus-on-everything.
For explicit control: OpenRouter lets you specify exactly which model per call, with 300+ options.
Quick self-audit
Check your API dashboard for last month's spend. Then ask:
- What % of my calls were "simple" (< 100 tokens, common question patterns)?
- Am I using the same model for tab completions as for architecture work?
- Do I have any agentic tools running that I'm not actively watching?
If you can't answer these, you're probably overpaying.
The Komilion compare page shows the same prompts through all three tiers side-by-side — if you want to see the cost difference before committing.
Top comments (0)