Three Ways to Handle AI Model Routing in 2026 (And the Trade-offs Nobody Talks About)
By Hossein Shahrokni | 2026-03-18
If you're building on top of AI models, you've probably hit the same wall: you have 400 models available and no principled way to decide which one handles which request. Defaulting to Opus on everything works, but it's expensive. Defaulting to Gemini Flash on everything is cheap but breaks on complex tasks.
The routing problem is real. Here are the three patterns I see in production, with honest trade-offs for each.
Approach 1: You route manually (OpenRouter / direct API)
The simplest setup: you pick the model per request, or per endpoint, or per environment. OpenRouter makes this easy — one API, 400+ models, you decide what goes where.
What it looks like in code:
# Explicit model selection per request
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key="...")
response = client.chat.completions.create(
model="anthropic/claude-opus-4-6", # You decide this
messages=[...]
)
When this is right:
- You have a small, stable prompt library where you know exactly what each prompt needs
- Your team has strong opinions about specific models for specific use cases
- You want to experiment across models and control the comparison
The honest cost: Every time you add a new prompt type or the model landscape changes, someone has to review the routing rules. "Anthropic changed the pricing for Haiku" is a maintenance event. "GPT-5 is better for code" is another. Manual routing is a configuration that goes stale.
Approach 2: You self-host a router
Open-source routing layers let you deploy automated model selection on your own infrastructure. You define classification rules, it routes accordingly, zero markup.
The value proposition:
- No markup on model costs — you pay provider rates directly
- Full control over the routing logic
- Your prompts never leave your infrastructure
When this is right:
- Enterprise or regulated environments where data residency matters
- Teams with the ops capacity to maintain a routing layer
- High volume where even a small markup compounds significantly
The honest cost: You own the operational overhead. When a model goes down, you handle the fallback. When the classification logic needs updating, that's engineering time. "Free" in dollars is not free in hours.
Approach 3: You use a managed router (Komilion / Martian)
A managed routing layer handles classification automatically. You set a quality floor — frugal, balanced, premium — and the service picks the cheapest capable model for each request. One URL change from your current setup.
What it looks like in code:
# Same SDK, different base_url and model string
client = OpenAI(base_url="https://www.komilion.com/api/v1", api_key="...")
response = client.chat.completions.create(
model="neo-mode/balanced", # Router decides the actual model
messages=[...]
)
When this is right:
- You want cost optimization without maintaining routing logic
- Your prompt mix is diverse and hard to classify manually
- You'd rather pay a markup than own the infrastructure
The honest cost: You're paying a markup (~25% on model costs in Komilion's case) for the automation. And you're trusting someone else's classification. We publish our routing decisions and benchmark data at komilion.com/compare-v2 — every output, every judge score, JSON download — because "trust our router" is a weak argument and "here's what it actually picked and why" is a stronger one.
The question that matters
None of these approaches is universally right. The question is: what's the cost of a wrong routing decision in your stack?
If you're routing a customer-facing chatbot and a wrong tier degrades the response quality noticeably, manual routing with explicit model selection makes sense — the stakes are high enough to justify the maintenance.
If you're routing developer tooling (Cline sessions, internal code review, CI pipeline summaries), the wrong tier mostly means "slightly less thorough output on that one request." Managed routing's occasional miss is worth the cost savings.
If you process millions of requests and the markup compounds to real money, self-hosted is worth the ops cost. At 10K calls/month, the math doesn't work out that way — but at 10M calls/month, it does.
What I actually use
For Komilion's own internal tooling (Cline sessions, benchmark scripts, documentation drafts), we use Balanced tier by default. File reads and summaries route to Frugal automatically.
The benchmark result that drove this split: Balanced beats Opus on 6 of 10 real developer tasks at $0.08/task vs Opus's $0.17. Frugal matches Opus on summarization and code explanation at ~57x lower cost (8.3/10 vs 8.6/10). Full outputs at komilion.com/compare-v2.
komilion.com — Sign up free, no card required. Drop a comment if you want test credits.
Top comments (0)