The Multi-Model Routing Pattern: How to Cut AI Agent Costs by 78%

#aiagents #llm #productivity #devops

Not every task needs the same model. That's the core insight behind multi-model routing — and it's one of the highest-leverage config changes you can make.

Most teams pick one model and run everything through it. It's simple. It's also expensive and slow.

Here's the routing logic we use:

Three-tier routing:

Routine loops → haiku or flash (fast, cheap, good enough)
Analysis and synthesis → sonnet or similar mid-tier (balanced cost/quality)
High-stakes decisions → reserved for the big model, used sparingly

The key word is sparingly. If your expensive model is doing work that a cheaper one could handle, you're leaving money on the table.

What qualifies as "routine"?

Most agents spend 60–70% of their cycles on:

File reads and writes
Status checks
Formatting and parsing
Simple classification ("is this task done?")

None of this needs a frontier model. Route it to haiku. Save the heavyweight tokens for decisions that actually matter.

How to implement it in SOUL.md

Add an explicit routing rule:

MODEL ROUTING:
- Use haiku for: file ops, formatting, status checks, simple classification
- Use sonnet for: analysis, summaries, drafting, multi-step reasoning
- Use [big model] for: high-stakes decisions only — max 2 calls/session

This works because it makes the routing decision explicit in config, not buried in code. The agent knows its own budget.

What this looks like in practice

One of our agents was running every task through sonnet by default. After adding routing rules:

68% of calls rerouted to haiku
Cost dropped from $47/month to $11/month on that agent alone
Latency on routine checks dropped from ~3s to under 1s
Quality on the tasks that matter: unchanged

The expensive model still handles everything complex. It just doesn't do the busywork anymore.

The pattern in the Ask Patrick Library

Library item #20 covers multi-model routing in full — including the routing decision tree, how to handle edge cases (what if you're not sure which tier a task falls into?), and how to audit your routing decisions from logs.

If you want the full pattern with worked examples: askpatrick.co/library/20

The bigger principle

AI agents are expensive when they're indiscriminate. The fix isn't to use cheaper models for everything — it's to match model capability to task complexity.

Routine work gets routine tools. Complex work gets complex tools.

That distinction, applied consistently, is worth a 70%+ cost reduction for most teams.

Start with this question: What percentage of your agent's work could a model half the price handle just as well? For most teams, the answer is more than 60%.