Why Automatic Prompt Classification Beats Manual Routing Rules

#ai #llm #opensource #costoptimization

Why Automatic Prompt Classification Beats Manual Routing Rules

Disclaimer: I'm the author of NadirClaw, the tool discussed below.

Most LLM cost optimization tools ask you to write routing rules by hand. Config files. If-then statements. "Route this to GPT-5, that to Haiku."

I tried that. It sucked.

Here's why automatic classification wins, and what I learned building NadirClaw after ditching the config-file approach.

The Config File Trap

The typical manual routing setup looks like this:

routes:
  - pattern: "translate.*"
    model: "gpt-5-mini"
  - pattern: ".*code.*"
    model: "claude-sonnet-4"
  - pattern: ".*complex.*"
    model: "gpt-5"
  - default: "gpt-5-mini"

Seems clean. But three things kill it:

1. You can't predict prompts.

Your coding assistant might send: "Refactor this function to handle edge cases better"

Does that match .*code.*? Sure. But is it simple enough for a cheap model? Maybe. Maybe not. The regex has no idea.

2. Maintenance nightmare.

Every new use case needs a new rule. Your config file grows to 200 lines. Rules conflict. You're debugging YAML instead of writing code.

3. False confidence.

You think you're saving money. In reality, half your "simple" prompts hit expensive models because your patterns are too broad, and the other half fail because they're routed to models that can't handle them.

How Auto Classification Works

NadirClaw uses a lightweight classifier (DistilBERT, 200 lines of Python) that learns to predict model performance from prompt text.

Training data: thousands of real prompts, each labeled with which models succeeded and which failed.

At runtime:

# User sends prompt to proxy
prompt = "Fix the bug in this sorting algorithm"

# Classifier scores it
scores = classifier.predict(prompt)
# {'haiku': 0.82, 'sonnet': 0.91, 'opus': 0.95}

# Route to cheapest model above threshold
if scores['haiku'] > 0.8:
    route_to('claude-haiku-4')  # $0.25/M tokens
else:
    route_to('claude-sonnet-4')  # $3/M tokens

Overhead: 10ms. No config files. No regex debugging.

Real-World Comparison

I ran both approaches on 1,000 coding assistant prompts (Claude Code + Cursor mix):

Manual rules (best I could write):

58% routed to cheap models
12% failures (cheap model couldn't handle it)
47% effective cost reduction

Auto classifier:

71% routed to cheap models
3% failures
64% effective cost reduction

The difference? The classifier learns patterns humans miss. Things like:

Sentence structure (imperative vs exploratory)
Token count (longer != harder)
Domain vocabulary (ML terms → harder, CRUD terms → easier)

You can't encode that in YAML.

When Manual Rules Win

Two cases where config files make sense:

Known, static workflows. If you're running the same 10 prompts on a schedule, hardcode them. No point in ML overhead.
Compliance requirements. If certain data must hit certain models for legal reasons, don't leave it to a classifier.

For everything else, auto wins.

The Setup Tax

Manual routing: 30 minutes to write rules, forever to maintain.

Auto classification: 2 hours to train the classifier once, zero maintenance.

NadirClaw ships with a pre-trained model. Drop-in OpenAI proxy. Works with Claude Code, Cursor, aider, anything that speaks OpenAI API.

# Install
npm install -g nadirclaw

# Run (proxy on :8080)
nadirclaw start

# Point your tool at localhost:8080
export OPENAI_BASE_URL=http://localhost:8080/v1

No config files. The classifier figures it out.

What I'd Do Differently

If I were rebuilding this:

Online learning. Right now, the classifier is static. Ideally it learns from your failures in production.
Multi-model fallback. If Haiku fails, retry on Sonnet. Currently you have to handle that in your app.
Cost/accuracy sliders. Let users tune the threshold (0.8 = aggressive savings, 0.95 = safe).

All of these are doable. Just haven't shipped them yet.