DEV Community

Arthur Palyan
Arthur Palyan

Posted on

Stop Sending Opus to Do Haiku's Job: Task-Aware Model Routing via MCP

Every Claude Max user has the same problem. $200/month, and they burn Opus tokens on tasks that Haiku could handle in milliseconds.

You ask Claude to rename a variable. Opus handles it. You ask it to list files in a directory. Opus again. You ask it to architect a distributed system. Opus. Every task gets the same model, the same cost, the same latency.

No layer exists between you and the model that says: "this task is simple, route it to the cheap model."

Until now.

The Problem: One Model Fits All

Claude Code picks the model. You don't. And it has no concept of task weight. A typo fix and a system redesign both get Opus. That's like hiring a surgeon to put on a band-aid.

The cost difference is real:

  • Haiku: ~$0.001 per task
  • Sonnet: ~$0.01 per task
  • Opus: ~$0.10 per task

If 60% of your daily tasks are Haiku-grade, you're burning 100x more than you need to on those tasks.

The Solution: classify_task_complexity

We built an MCP tool that scores every task across 6 dimensions before it touches a model:

  1. Reasoning depth - Does this need multi-step logic?
  2. Context window - How much code/data is involved?
  3. Creativity - Is this generative or mechanical?
  4. Domain expertise - Does it need specialized knowledge?
  5. Accuracy stakes - What breaks if it's wrong?
  6. Multi-step planning - Does it need to coordinate actions?

Each dimension scores 1-5. The composite maps to three tiers:

Tier Score Range Model Example
simple 1.0 - 2.0 Haiku "Rename this variable to camelCase"
moderate 2.1 - 3.5 Sonnet "Add error handling to this API endpoint"
complex 3.6 - 5.0 Opus "Design a rate limiter with sliding window"

Try it right now

curl -X POST https://api.100levelup.com/mcp-ns/tools/classify_task_complexity \
  -H "Content-Type: application/json" \
  -d '{
    "task_description": "fix the typo in line 42",
    "context": { "file_count": 1, "language": "javascript" }
  }'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "tier": "simple",
  "composite_score": 1.3,
  "recommended_model": "haiku",
  "confidence": 0.94,
  "reasoning": "Single-line mechanical fix, no logic changes"
}
Enter fullscreen mode Exit fullscreen mode

That one call just saved you ~$0.099.

The Companion: parse_user_intent

Routing is not enough if you don't understand what the person actually asked.

parse_user_intent breaks down any request into structured directives:

curl -X POST https://api.100levelup.com/mcp-ns/tools/parse_user_intent \
  -H "Content-Type: application/json" \
  -d '{
    "user_message": "Update the login page and make sure tests pass",
    "context": { "role": "developer" }
  }'
Enter fullscreen mode Exit fullscreen mode

It returns:

  • Primary intent: code_change
  • Deliverables: ["updated login page", "passing tests"]
  • Urgency: medium
  • Expertise level: intermediate
  • Suggested format: step-by-step execution

The key insight: it catches deliverables the model would otherwise drop. "Update the login page and make sure tests pass" has TWO deliverables. Without parsing, models often do the first and forget the second.

The 80% Confidence Rule

Both tools return a confidence score. The rule is simple:

  • Above 80%: Execute. The classification is solid.
  • Below 80%: Clarify. Ask the user before proceeding.

This prevents the worst failure mode in AI: confidently doing the wrong thing.

Real Results

We ran both tools through a test suite:

  • 5/5 classifier tests passing - simple, moderate, and complex tasks all routed correctly
  • Intent parser caught deliverables that raw model prompting missed in 3 out of 5 multi-part requests
  • Cost projection: a typical 100-task day drops from ~$10 (all Opus) to ~$2.50 (properly routed)

That's 75% savings with zero quality loss on the tasks that matter.

Why This Matters Now

Claude Code's auto mode launches March 12. Auto mode decides what Claude can do. But it doesn't govern how it behaves while doing it.

That's what the MCP Nervous System is for. classify_task_complexity and parse_user_intent are the pre-flight layer: which model handles this, and what exactly does the user need?

Auto mode + Nervous System = Claude that picks the right tool AND uses it correctly.

Get Started

Both tools are free. No API key needed for the first 10 calls per day.

Install via MCP:

npx mcp-nervous-system
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/levelsofself/mcp-nervous-system

Hosted API: api.100levelup.com/mcp-ns/

The Nervous System currently has 14 tools covering governance, safety, communication, and now task routing. These two are just the latest - but they might save you the most money.

Stop sending Opus to do Haiku's job.

Top comments (0)