Three months ago I was spending about $240/month on Claude API calls through OpenClaw. Every task — from reading files to designing system architecture — went through Sonnet.
Then I realized something embarrassing: most of what I do every day does not actually need a $15/million-token model.
What Changed
I started routing tasks by complexity:
- File reads, grep, simple refactors, test generation → DeepSeek-V3 (1/8th the cost)
- Summarization → Gemini Flash (fastest and cheapest for this)
- Code review → GPT-4o (catches different issues than Claude)
- Multi-file architecture, complex debugging → Claude Sonnet (nothing else matches)
The Result
Monthly bill: $240 → $140. Same output quality on the tasks that matter.
The routine work that makes up 60% of my day runs perfectly fine on cheaper models. I was paying a premium for zero quality improvement on simple tasks.
How I Set It Up
Manually switching models was annoying, so I use a routing gateway called TeamoRouter. One API key, auto-picks the cheapest model that handles each task well. Installs in OpenClaw in 2 seconds.
Routing modes:
-
teamo-best: always highest quality -
teamo-balanced: auto-picks best value per task -
teamo-eco: always cheapest
What Surprised Me
- DeepSeek-V3 is genuinely good for routine coding tasks. 80% as good as Sonnet at 1/8th the price.
- Gemini Flash is absurdly fast for summarization. Nothing else comes close for that specific use case.
- Rate limits disappeared — spreading requests across providers means no single one sees enough traffic to throttle you.
If you want to try a similar setup, here are some resources:
- TeamoRouter — the routing gateway I use
- Discord — where we share routing configs and help each other set up
Curious if anyone else has done similar cost breakdowns. What does your model usage look like?
Top comments (0)