Automatic Claude Model Routing: Pick the Right Model at Runtime

#haiku #sonnet #opus

Originally published at claudeguide.io/claude-model-routing-automatic

Automatic Model Routing: Route Claude Requests to the Right Model at Runtime

Model routing sends each request to the cheapest model that can handle it correctly — instead of sending everything to Opus (expensive) or everything to Haiku (insufficient for complex tasks). A well-tuned router captures 70-80% of the cost savings available from model selection without manual per-task decisions.

This guide shows three routing patterns, from simple to sophisticated, with full code.

What a router does

A router is a fast, cheap classifier that runs before the main model call. It reads the request and assigns it to a tier:

User request
    ↓
[Router: classify complexity]
    ├── Simple → Haiku ($1/$5 per 1M)
    ├── Medium → Sonnet ($3/$15 per 1M)
    └── Complex → Opus ($5/$25 per 1M)

The router must be:

Fast: adds < 100ms to p50 latency ideally
Cheap: cheaper than the savings it generates
Accurate enough: occasional misclassifications are acceptable; systematic ones are not

Pattern 1: Rule-based router (zero cost, instant)

The simplest router uses token count and keyword heuristics. No API call required.


python
import re
from enum import Enum

class Model(str, Enum):
    HAIKU = "claude-haiku-4-5"
    SONNET = "claude-sonnet-4-6"
    OPUS = "claude-opus-4-7"

# Token count estimator (rough: 4 chars per token)
def estimate_tokens(text: str) -

PDF guide + Excel cost calculator.

[→ Get Cost Optimization Masterclass — $59](https://shoutfirst.gumroad.com/l/msjkda?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-model-routing-automatic)

*30-day money-back guarantee. Instant download.*