Decided to actually log every API call for a month. The results were embarrassing.
Out of roughly 3000 calls:
- 1800 (60%) were simple: file reads, grep, reformatting, basic Q&A
- 750 (25%) were medium: code refactors, test generation, summarization
- 450 (15%) were complex: architecture decisions, multi-file debugging
I was sending ALL of them to Claude Sonnet at $15/million tokens.
The simple 60% runs identically on DeepSeek-V3 at $1.80/million. The medium 25% works fine on GPT-4o at $5/million. Only the complex 15% actually needs Sonnet.
Before and After
- Before routing: ~$240/month (everything on Sonnet)
- After routing by task type: ~$140/month
- Saved: ~$100/month for zero quality loss
The Surprising Part
I had no idea 60% of my daily work was basically file reads and simple edits until I actually logged it. We all think we are doing complex reasoning all day, but the data says otherwise.
How I Automated It
Manually switching models was annoying. I use TeamoRouter to auto-pick the cheapest model per task. One API key, installs in OpenClaw in 2 seconds.
Routing modes:
-
teamo-balanced— auto-picks best value per task (my default) -
teamo-best— always highest quality -
teamo-eco— always cheapest
My Routing Config
I shared my full task-to-model routing table (with exact per-task costs) in our Discord. Too detailed to format in a blog post, but the short version:
| Task Type | Model | Cost/1K tokens |
|---|---|---|
| File reads, grep | DeepSeek-V3 | $0.0014 |
| Simple refactors | DeepSeek-V3 | $0.0014 |
| Code review | GPT-4o | $0.005 |
| Summarization | Gemini Flash | $0.0005 |
| Architecture | Claude Sonnet | $0.015 |
| Complex debugging | Claude Sonnet | $0.015 |
Curious if anyone else has done this exercise. Does the 60/25/15 split hold for you, or is your workflow different?
Top comments (0)