After months of manually deciding which model to use for each task, I wrote down my actual decision process. Sharing because it might save you time.
The Decision Tree
Is this task reading or writing?
- Reading (file reads, grep, search) → cheapest model (DeepSeek-V3 or free MiniMax M2.7)
Is it modifying existing code?
- Simple modification (rename, format, extract function) → cheap model (DeepSeek-V3)
- Complex modification (refactor across 3+ files) → expensive model (Claude Sonnet)
Is it generating new code?
- Boilerplate (tests, docs, type definitions) → cheap model
- Architecture or design → expensive model
Is it debugging?
- Simple error (typo, missing import, syntax) → cheap model
- Complex (race condition, state management, async flow) → expensive model
Is it analysis or review?
- Summarization → Gemini Flash (fastest for this)
- Code review → GPT-4o (catches different things than Claude)
- Security audit → Claude Sonnet (most thorough)
The Numbers
Applying this tree to my actual usage:
- 60% of tasks → cheap model ($0.0014/1K tokens)
- 25% of tasks → mid-tier ($0.005/1K tokens)
- 15% of tasks → premium ($0.015/1K tokens)
Monthly cost: $240 → $140. Same output quality on every task that matters.
Automating the Decision
Following this tree manually lasted about a week before I gave up. I use TeamoRouter to automate it — teamo-balanced mode does roughly what this decision tree describes.
Also has a free tier (teamo-free) with unlimited MiniMax M2.7 calls if you just want to try offloading simple tasks.
Discord where we compare routing strategies and share configs.
Top comments (0)