Everyone building agents hits the same wall eventually.
You start with the most capable model. It handles everything beautifully. But the costs add up fast. You switch to a faster, cheaper model. Now you are missing edge cases.
The answer is not choosing one. It is building infrastructure that routes intelligently between them.
The Pattern in Practice
Teams shipping agents at scale converged on a three-tier approach:
Tier 1: Fast triage. A lightweight model handles the initial request. It classifies intent, extracts entities, and decides if escalation is needed.
Tier 2: Capable execution. Complex reasoning, code generation, and multi-step planning get routed to the heavyweight.
Tier 3: Human review. Anything that falls through the cracks surfaces for manual handling.
Why This Works
The cost difference is dramatic. A fast model might cost 0.10 per million tokens. A capable model can run 10x or 20x that. If 80 percent of requests can be handled by tier 1, you cut your inference bill by the same margin.
But cost is not the only factor. Speed matters. Users notice latency. A routing system lets you give instant responses for simple queries while reserving the slow model for work that actually requires it.
The Implementation Detail Most Miss
The routing logic itself needs to be cheap. If you burn half a second deciding which model to use, you have defeated the purpose.
The best routers I have seen use simple heuristics: request length, keyword matching, confidence thresholds from the fast model. Not another ML model. A few if statements that run in milliseconds.
Progressive summarization helps too. Instead of feeding the capable model your entire context, summarize down to what matters. The model does less work, responds faster, costs less.
What Changes When You Think This Way
The routing mindset shifts everything:
- Model selection becomes a runtime decision, not a design choice
- You measure success by cost per successful outcome, not by raw capability
- The architecture separates concerns: triage, execution, review
The teams winning with agents are not choosing between fast and smart. They are building systems that use both at the right time.
Top comments (0)