Originally published at claudeguide.io/claude-model-routing-automatic
Automatic Model Routing: Route Claude Requests to the Right Model at Runtime
Model routing sends each request to the cheapest model that can handle it correctly — instead of sending everything to Opus (expensive) or everything to Haiku (insufficient for complex tasks). A well-tuned router captures 70-80% of the cost savings available from model selection without manual per-task decisions.
This guide shows three routing patterns, from simple to sophisticated, with full code.
What a router does
A router is a fast, cheap classifier that runs before the main model call. It reads the request and assigns it to a tier:
User request
↓
[Router: classify complexity]
├── Simple → Haiku ($1/$5 per 1M)
├── Medium → Sonnet ($3/$15 per 1M)
└── Complex → Opus ($5/$25 per 1M)
The router must be:
- Fast: adds < 100ms to p50 latency ideally
- Cheap: cheaper than the savings it generates
- Accurate enough: occasional misclassifications are acceptable; systematic ones are not
Pattern 1: Rule-based router (zero cost, instant)
The simplest router uses token count and keyword heuristics. No API call required.
python
import re
from enum import Enum
class Model(str, Enum):
HAIKU = "claude-haiku-4-5"
SONNET = "claude-sonnet-4-6"
OPUS = "claude-opus-4-7"
# Token count estimator (rough: 4 chars per token)
def estimate_tokens(text: str) -
PDF guide + Excel cost calculator.
[→ Get Cost Optimization Masterclass — $59](https://shoutfirst.gumroad.com/l/msjkda?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-model-routing-automatic)
*30-day money-back guarantee. Instant download.*
Top comments (0)