Why Automatic Prompt Classification Beats Manual Routing Rules
Disclaimer: I'm the author of NadirClaw, the tool discussed below.
Most LLM cost optimization tools ask you to write routing rules by hand. Config files. If-then statements. "Route this to GPT-5, that to Haiku."
I tried that. It sucked.
Here's why automatic classification wins, and what I learned building NadirClaw after ditching the config-file approach.
The Config File Trap
The typical manual routing setup looks like this:
routes:
- pattern: "translate.*"
model: "gpt-5-mini"
- pattern: ".*code.*"
model: "claude-sonnet-4"
- pattern: ".*complex.*"
model: "gpt-5"
- default: "gpt-5-mini"
Seems clean. But three things kill it:
1. You can't predict prompts.
Your coding assistant might send: "Refactor this function to handle edge cases better"
Does that match .*code.*? Sure. But is it simple enough for a cheap model? Maybe. Maybe not. The regex has no idea.
2. Maintenance nightmare.
Every new use case needs a new rule. Your config file grows to 200 lines. Rules conflict. You're debugging YAML instead of writing code.
3. False confidence.
You think you're saving money. In reality, half your "simple" prompts hit expensive models because your patterns are too broad, and the other half fail because they're routed to models that can't handle them.
How Auto Classification Works
NadirClaw uses a lightweight classifier (DistilBERT, 200 lines of Python) that learns to predict model performance from prompt text.
Training data: thousands of real prompts, each labeled with which models succeeded and which failed.
At runtime:
# User sends prompt to proxy
prompt = "Fix the bug in this sorting algorithm"
# Classifier scores it
scores = classifier.predict(prompt)
# {'haiku': 0.82, 'sonnet': 0.91, 'opus': 0.95}
# Route to cheapest model above threshold
if scores['haiku'] > 0.8:
route_to('claude-haiku-4') # $0.25/M tokens
else:
route_to('claude-sonnet-4') # $3/M tokens
Overhead: 10ms. No config files. No regex debugging.
Real-World Comparison
I ran both approaches on 1,000 coding assistant prompts (Claude Code + Cursor mix):
Manual rules (best I could write):
- 58% routed to cheap models
- 12% failures (cheap model couldn't handle it)
- 47% effective cost reduction
Auto classifier:
- 71% routed to cheap models
- 3% failures
- 64% effective cost reduction
The difference? The classifier learns patterns humans miss. Things like:
- Sentence structure (imperative vs exploratory)
- Token count (longer != harder)
- Domain vocabulary (ML terms → harder, CRUD terms → easier)
You can't encode that in YAML.
When Manual Rules Win
Two cases where config files make sense:
Known, static workflows. If you're running the same 10 prompts on a schedule, hardcode them. No point in ML overhead.
Compliance requirements. If certain data must hit certain models for legal reasons, don't leave it to a classifier.
For everything else, auto wins.
The Setup Tax
Manual routing: 30 minutes to write rules, forever to maintain.
Auto classification: 2 hours to train the classifier once, zero maintenance.
NadirClaw ships with a pre-trained model. Drop-in OpenAI proxy. Works with Claude Code, Cursor, aider, anything that speaks OpenAI API.
# Install
npm install -g nadirclaw
# Run (proxy on :8080)
nadirclaw start
# Point your tool at localhost:8080
export OPENAI_BASE_URL=http://localhost:8080/v1
No config files. The classifier figures it out.
What I'd Do Differently
If I were rebuilding this:
Online learning. Right now, the classifier is static. Ideally it learns from your failures in production.
Multi-model fallback. If Haiku fails, retry on Sonnet. Currently you have to handle that in your app.
Cost/accuracy sliders. Let users tune the threshold (0.8 = aggressive savings, 0.95 = safe).
All of these are doable. Just haven't shipped them yet.
Try It
NadirClaw is open source: github.com/doramirdor/NadirClaw
Currently ~240 stars. Goal is 1K. If you're burning money on Claude/GPT calls, give it a shot.
Or build your own. The core idea (use ML to classify prompts) beats config files every time.
Questions? Find me on X @amir_dor or open an issue on GitHub.
Top comments (0)