DEV Community

Gerus Lab
Gerus Lab

Posted on

Claude Model Routing for Cost-Conscious Teams: When to Use Opus, Sonnet, and Haiku

Here's a number that should make you uncomfortable: most teams running Claude burn 40-60% of their token budget on the wrong model for the task.

They route everything through Opus because "it's the best." Or they default to Sonnet for everything because someone set it up that way six months ago. Or worse — they're on Haiku for cost reasons, then wonder why their complex reasoning tasks return garbage.

Model routing isn't a nice-to-have optimization. It's the single biggest lever you have for controlling AI costs without sacrificing quality. And if you're running Claude through a proxy layer (which you should be), it's surprisingly easy to implement.

Let's break down exactly when to use each model, how to build a routing strategy, and why flat-rate pricing changes the math entirely.


The Three Claude Models (And What They're Actually Good At)

Before we talk routing, let's be honest about what each model does well — and where it falls flat.

Claude Opus: The Heavy Hitter

Best for:

  • Complex multi-step reasoning
  • Code architecture and system design
  • Nuanced analysis with ambiguity
  • Long-form content that requires coherent argumentation
  • Tasks where getting it wrong has real consequences

Overkill for:

  • Simple Q&A
  • Data extraction and formatting
  • Template-based content generation
  • Classification tasks
  • Summarization of straightforward documents

Token cost: ~15x Haiku, ~5x Sonnet (on Anthropic's API pricing)

Opus is brilliant, but it's also expensive. Using it to extract email addresses from a CSV is like hiring a PhD mathematician to do your grocery list.

Claude Sonnet: The Workhorse

Best for:

  • Everyday coding tasks (write functions, debug, refactor)
  • Content generation (blog posts, emails, documentation)
  • Moderate reasoning and analysis
  • Conversational AI with context
  • Most business automation workflows

Falls short on:

  • Extremely complex multi-step logic puzzles
  • Tasks requiring exceptional nuance
  • Edge cases in code architecture decisions

Token cost: ~3x Haiku, ~1/5 Opus

Sonnet is where 70% of your traffic should probably live. It's the Honda Civic of AI models — reliable, efficient, gets the job done without drama.

Claude Haiku: The Speed Demon

Best for:

  • Classification and categorization
  • Simple data extraction
  • Routing and triage (deciding what to do with a request)
  • Quick summarization
  • High-volume, low-complexity tasks
  • Real-time applications where latency matters

Falls short on:

  • Complex reasoning
  • Nuanced content generation
  • Tasks requiring deep context understanding

Token cost: baseline (cheapest)

Haiku is fast, cheap, and surprisingly capable for structured tasks. If you're doing classification or extraction, it's criminal to use anything else.


The Real Cost of Bad Routing

Let's do some math. Say your team processes 1,000 tasks per day across different categories:

Task Type Count Right Model Wrong Model Cost Diff
Email classification 300 Haiku ($0.30) Sonnet ($0.90) 3x overspend
Code generation 250 Sonnet ($7.50) Opus ($37.50) 5x overspend
Data extraction 200 Haiku ($0.20) Sonnet ($0.60) 3x overspend
Complex analysis 150 Opus ($22.50) Sonnet ($4.50) Quality loss
Content writing 100 Sonnet ($3.00) Opus ($15.00) 5x overspend

Without routing: Everything on Sonnet = ~$16.50/day = ~$500/month
With smart routing: Right model per task = ~$10.50/day = ~$315/month

That's a 37% reduction. And this is a conservative example. Teams running heavy Opus usage see 50-60% savings from proper routing.

But here's the catch: on pay-per-token pricing, you need to build and maintain this routing logic yourself. You need monitoring. You need to track which model is handling what. You need fallback logic when a model is slow or unavailable.


Building a Routing Strategy: The Practical Guide

Step 1: Audit Your Current Usage

Before routing, you need to know what you're routing. Catalog your Claude usage:

  1. List every task type your team sends to Claude
  2. Classify complexity: Low / Medium / High
  3. Measure current model assignment: What's going to what?
  4. Note quality requirements: Where does accuracy matter most?

Most teams discover they're sending 40-50% of their requests to a model that's overpowered for the task.

Step 2: Define Your Routing Rules

Simple decision tree:

Is this a classification, extraction, or simple formatting task?
  → YES → Haiku

Does this require multi-step reasoning, complex code architecture,
or nuanced analysis where errors have significant consequences?
  → YES → Opus

Everything else?
  → Sonnet
Enter fullscreen mode Exit fullscreen mode

That's it. Start here. Don't overcomplicate it.

Step 3: Implement at the Proxy Layer

This is where your access layer matters. If you're hitting the Anthropic API directly, you need to build routing into your application code. Every. Single. Integration.

If you're running through a proxy (like ShadoClaw), routing happens at the infrastructure level:

  • Set default models per workspace or use case
  • Override per-request with headers
  • Automatic fallback if primary model is overloaded
  • Usage tracking per model per client

The proxy layer is the natural place for routing decisions because it sits between your applications and the API. One configuration change routes all your email classification to Haiku — no code changes in your apps.

Step 4: Monitor and Adjust

Routing isn't set-and-forget. Track:

  • Quality scores per model per task type: Is Haiku actually good enough for your classification?
  • Latency per model: Haiku's speed advantage matters for real-time apps
  • Cost per task type: Validate your routing saves what you projected
  • Failure rates: Are you hitting rate limits on specific models?

Review monthly. Adjust quarterly. New model releases change the math.


Common Routing Mistakes

Mistake 1: Defaulting to Opus "Just in Case"

The "better safe than sorry" approach costs 5x more than necessary for most tasks. Sonnet handles 70% of workloads with comparable quality. Test before you assume Opus is required.

Mistake 2: Using Haiku for Everything to Save Money

Penny-wise, pound-foolish. If Haiku's output quality drops below your threshold, you'll spend more time on retries, manual corrections, and angry users than you saved on tokens.

Mistake 3: Not Having Fallback Logic

Models have rate limits. Models have outages. If your routing sends everything to one model and it's throttled, your entire pipeline stalls.

Good routing includes fallbacks:

  • Haiku overloaded → temporarily route to Sonnet
  • Opus rate-limited → queue and retry, don't downgrade critical tasks
  • Sonnet down → split traffic: simple tasks to Haiku, complex to Opus

Mistake 4: Routing by Department Instead of Task Type

"Engineering gets Opus, marketing gets Sonnet" is lazy routing. An engineer writing a commit message needs Haiku. A marketer crafting a product launch strategy might need Opus. Route by task complexity, not org chart.

Mistake 5: Ignoring Context Length in Routing

A task that seems simple might have a 50K token context window. Haiku can handle it, but the cost advantage shrinks as context grows. Factor in input token count, not just task complexity.


The Flat-Rate Alternative

Here's where the math gets interesting.

With Anthropic's API, model routing directly affects your bill. Every token counts. Every routing decision is a cost decision. You need monitoring, alerting, and constant optimization.

With flat-rate pricing through ShadoClaw, the equation simplifies: you pay a fixed monthly rate regardless of which models you use or how much.

Solo plan ($29/mo): One account, unlimited model access. Route however you want — Opus for everything if that's what works. No penalty for "wrong" routing decisions.

Pro plan ($79/mo, 5 accounts): Multi-user routing without per-user cost anxiety. Let each team member use the model that fits their workflow.

Team plan ($179/mo, 20 accounts): Agency-scale routing with per-client isolation. Route different clients to different models without billing complexity.

This doesn't mean routing doesn't matter on flat-rate. You still want to route for quality and speed. Haiku is faster than Opus — if speed matters, route to Haiku. Opus is better at complex reasoning — if quality matters, route to Opus.

But you remove the cost anxiety from routing decisions. No more "I should probably use Sonnet even though Opus would be better for this" mental math.


A Practical Routing Config

Here's what a sane routing setup looks like for a 5-person team:

routing:
  default: sonnet

  rules:
    - match:
        task_type: [classification, extraction, formatting, triage]
      model: haiku
      fallback: sonnet

    - match:
        task_type: [architecture, complex_analysis, legal_review]
      model: opus
      fallback: sonnet
      retry: true

    - match:
        task_type: [coding, content, email, documentation]
      model: sonnet
      fallback: haiku

  monitoring:
    track_per_model: true
    alert_on_fallback_rate: 0.1
    quality_sample_rate: 0.05
Enter fullscreen mode Exit fullscreen mode

Start with these three buckets. Refine from there.


The Bottom Line

Model routing is the difference between a $500/month Claude bill and a $300/month one. Or between mediocre quality (everything on Haiku) and targeted excellence (right model for right task).

Three rules:

  1. Haiku for simple, high-volume tasks (classification, extraction, formatting)
  2. Sonnet for the 70% middle (coding, content, standard workflows)
  3. Opus for the 15% that really matters (complex reasoning, architecture, high-stakes analysis)

Or skip the cost optimization entirely: go flat-rate with ShadoClaw and route purely for quality and speed. Free 3-day trial — test all three models without watching your meter.

Built by Gerus-lab, the engineering studio behind ShadoClaw. We've been running Claude at scale since before it was cool (and definitely since before it was cheap).


Your Claude proxy should make model routing simple, not another thing to worry about. Try ShadoClaw free for 3 days and see what smart routing looks like without the billing anxiety.

Top comments (0)