DEV Community

Robin
Robin

Posted on • Edited on

Why I Built an AI Model Router (And Why You Are Probably Overpaying 100x Right Now)

It started with a $47 API bill.

Not for anything impressive. Not for training a model or processing a million documents. Just... a chatbot. A customer support bot for a side project that handled maybe 200 conversations a day.

$47. For a chatbot that mostly answered "what are your opening hours?" and "how do I reset my password?"

I stared at the Anthropic billing dashboard and had that feeling every developer knows — the one where you realize you've been doing something incredibly stupid for weeks and nobody told you.

The Ferrari Problem

Here's what I was doing: sending every single user query to Claude Opus. The best model. The most expensive model. For everything.

"What are your opening hours?" — $0.025
"How do I reset my password?" — $0.025
"Can you architect a distributed event-driven payment system with CQRS and saga patterns?" — $0.025

Same price. Same model. Every time.

That's like taking a Ferrari to buy milk. Sure, it gets you there. But a bicycle works just as well and costs nothing.

The Spreadsheet Moment

Being an engineer, I did what engineers do. I exported my API logs and categorized every single query from the past month.

209,000 API calls. I went through a representative sample of 2,000 and classified them by actual complexity.

The results made me physically uncomfortable:

  • 71% were simple tasks. Translations, summaries, Q&A, formatting. A $0.0002 Flash model handles these identically to Opus.
  • 19% were medium complexity. Code generation, analysis, content writing. A $0.01 Pro model handles these well.
  • 10% were genuinely complex. Multi-step reasoning, research, architecture. These actually needed a frontier model.

I did the math. I was overpaying by 87%. Not 10%. Not 30%. Eighty-seven percent.

The Obvious Fix (That Nobody Builds)

The solution seemed obvious: route different queries to different models based on complexity. Simple queries go to cheap models. Complex queries go to expensive ones.

So why wasn't everyone doing this?

I asked around. Talked to 30+ developers building with AI APIs. The answers were always the same:

"It's not worth the engineering time." Fair. Building a routing layer means maintaining a classifier, tracking benchmark data, handling failover logic, managing model configs across providers. That's a side project on top of your side project.

"What if the cheap model messes up?" Also fair. Nobody wants to explain to their boss why the AI gave a wrong answer because they tried to save $0.02.

"I'll optimize later." The classic. Later never comes. You're busy building features, not optimizing API costs. Meanwhile, the bill keeps growing.

So I Built It

I spent three months building what I now call Komilion. The name comes from "chameleon" — because it adapts to whatever you throw at it.

The architecture evolved through three iterations:

v1: Regex only. I hard-coded patterns. "Translate X to Y" → cheap model. "Summarize" → cheap model. This caught about 40% of simple queries but missed everything that didn't match my patterns. Fragile and annoying to maintain.

v2: LLM classifier. I added a cheap LLM (Gemini Flash) to classify queries the regex couldn't catch. This bumped classification accuracy to ~85% but added 200-400ms of latency. For some use cases, that mattered.

v3: Hybrid with fast-path. The current version. A regex fast-path catches ~60% of requests with zero added latency (<5ms). The LLM classifier handles the remaining ambiguous cases. Deterministic model selection uses published benchmarks (LMArena ELO scores, Artificial Analysis quality/speed/price indices) rather than trained ML models.

Why benchmark-based and not ML-trained? Because I'm one person. Training and maintaining a routing model is a full-time ML engineering job. Benchmarks update automatically when new models launch. Good enough beats perfect-but-unmaintainable.

The Three Modes

Through testing with early users, three usage patterns emerged:

Neo Mode — "Just pick for me."
You send a prompt to neo-mode/balanced and Komilion picks the best model for that specific request. Most users start here. Three sub-tiers: frugal (prioritize cost), balanced (cost/quality), premium (prioritize quality).

Pinned Mode — "I want this specific model."
You lock a specific model for your application. When a newer version drops within the same provider family (e.g., Claude Sonnet 4.5 → 5.0), Komilion auto-upgrades. You get improvements without changing code.

Dashboard analytics — "Show me where my money went."
Every API call is logged with cost, model used, and tier. You can see exactly which task types are costing the most without setting anything up.

The Honest Numbers

I'm going to be transparent about what Komilion does and doesn't do, because developer trust is everything.

What it does:

  • Routes across 400+ models from all major providers through one API key
  • Analyzes each request and picks the right model for the job
  • Saves 60-90% on simple tasks (which are 70% of most apps' traffic)
  • OpenAI SDK compatible — change your base URL, not your code
  • Shows exact cost in every API response (komilion.cost field)
  • Handles provider failover automatically

What it doesn't do:

  • Guarantee the absolute cheapest model for every request
  • Replace your judgment on complex use cases
  • Work magic if 100% of your queries are complex

What I Learned Building This

1. Developers are more price-sensitive than they admit.
Every dev I talked to said "cost doesn't matter, quality matters." Then I showed them their monthly bill breakdown and they went quiet. Cost matters. It just matters differently — nobody wants to sacrifice quality, but everyone wants to stop overpaying on the 70% of queries that don't need quality.

2. Integration friction kills adoption.
My first version required a custom SDK. Zero adoption. The moment I made it OpenAI SDK compatible (literally change base_url), people actually tried it. Lesson: don't ask developers to learn your API. Speak their language.

3. Transparency beats perfection.
I show the exact provider cost and routing decision in every API response. Counter-intuitive, but it builds trust. Developers who see exactly what they're paying are more likely to stay than developers who feel tricked.

4. The best marketing is a curl command.
No landing page, no demo video, no sales call has ever converted a developer as effectively as:

curl https://www.komilion.com/api/v1/chat/completions \
  -H "Authorization: Bearer ck_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"neo-mode/balanced","messages":[{"role":"user","content":"What is the capital of France?"}]}'
Enter fullscreen mode Exit fullscreen mode

Show them it works. Show them the cost. Let them do the math.

Where We Are Now

Komilion is live. 400+ models. Three routing modes. Full OpenAI SDK compatibility. Free credits to try.

I'm building this solo, bootstrapped. Total launch cost: ~$150 (domain, hosting, initial API credits). No VC. No team. Just an engineer who got tired of overpaying.

The roadmap:

  • Now: Core routing, three modes, dashboard with usage stats
  • Next: Streaming cost estimation, batch API support, team accounts
  • Later: Custom routing rules, webhook notifications, dedicated enterprise endpoints

Try It

If any of this resonated:

  1. Sign up at komilion.com — free credits, no credit card
  2. Get your API key
  3. Change one line of code
  4. Watch your costs drop

Or don't. Build your own router using the classifier code I published. The insight — that most AI queries are simple and don't need frontier models — is more important than any specific tool.

But if you'd rather just change a base URL and let someone else maintain the routing... I built that too.


Questions? Find me on Twitter @BannerRobi10895 or email hossein@komilion.com.

Top comments (0)