Every week there's a new "Top 5 AI Gateways" roundup. Bifrost, Cloudflare, Vercel, LiteLLM, Kong. They all do roughly the same thing: load balance, failover, cache, rate limit. Important stuff, but they're solving the wrong problem.
The biggest cost lever isn't caching or failover. It's sending the right prompt to the right model.
The math
A dev.to article this week showed a 600x cost spread between the cheapest and most expensive LLM APIs. Even among production-grade models, you're looking at 20x differences.
If 60% of your prompts are simple (formatting, classification, extraction, short Q&A), and you route those to a model that costs 10x less, you just cut your bill by 54%. No caching magic. No complex infrastructure. Just not using a $5/M-token model to answer "what's 2+2."
What gateways actually do
| Feature | Traditional gateway | Smart router |
|---|---|---|
| Load balancing | Yes | Yes |
| Failover | Yes | Yes |
| Caching | Yes | Optional |
| Cost tracking | Yes | Yes |
| Model selection per prompt | No | Yes |
| Complexity classification | No | Yes |
| Automatic downgrade for simple tasks | No | Yes |
Gateways are plumbing. Routing is intelligence.
How NadirClaw works
NadirClaw sits between your app and your LLM providers as an OpenAI-compatible proxy. Every incoming prompt gets classified in ~10ms:
- Simple prompt? Route to your cheapest model (local Ollama, GPT-5-mini, whatever)
- Complex prompt? Send to your premium model (Claude Opus, GPT-5, o3)
No code changes. Point your OPENAI_BASE_URL at NadirClaw and you're done. Works with Claude Code, Cursor, aider, any OpenAI-compatible client.
Real savings
In testing across mixed workloads (coding assistance, chat, data extraction):
- 40-70% cost reduction vs sending everything to a premium model
- <10ms classification overhead
- Zero quality degradation on complex tasks (they still go to the best model)
The "trick" is that most prompts don't need the best model. They need a good-enough model, fast.
When to use a gateway vs a router
Use a gateway (LiteLLM, Bifrost) when:
- You need multi-provider failover
- Caching is your main cost lever
- You want centralized API key management
Use NadirClaw when:
- Model cost is your main lever
- You have a mix of simple and complex prompts
- You want automatic optimization without changing code
Or use both. NadirClaw can sit in front of LiteLLM.
Try it
pip install nadirclaw
nadirclaw serve --config config.yaml
GitHub: https://github.com/doramirdor/NadirClaw
NadirClaw is open source (MIT). I built it because I was spending $400/month on Claude API calls and realized half of them didn't need Claude.
Top comments (0)