NadirClaw is an open-source LLM router that cuts your AI API costs by 40-70%. It routes simple prompts to cheap models and complex ones to premium models, automatically. Zero code changes.
This guide gets you running in under 5 minutes.
What You'll Need
- Python 3.10+
- A Gemini API key (free tier works, 20 requests/day: https://aistudio.google.com/apikey)
That's it. No Docker, no database, no extra services.
Install
pip install nadirclaw
Or from source:
curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | sh
Configure
Set your Gemini API key:
nadirclaw auth add --provider google --key AIza...
Or export it:
export GEMINI_API_KEY=AIza...
Start the Router
nadirclaw serve --verbose
NadirClaw starts on http://localhost:8856 with sensible defaults:
- Simple prompts → Gemini 2.5 Flash (cheap, fast)
- Complex prompts → Gemini 2.5 Pro (powerful)
You'll see logs like this:
[NadirClaw] Starting on http://localhost:8856
[NadirClaw] Simple model: gemini-2.5-flash
[NadirClaw] Complex model: gemini-2.5-pro
[NadirClaw] Ready.
Test It
Send a request:
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}'
NadirClaw classifies the prompt in ~10ms and routes it to the right model. Simple question? Gemini Flash. Complex refactoring? Gemini Pro.
Use It with Your Tools
NadirClaw is OpenAI-compatible. Point any tool at it:
Claude Code:
export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local
claude
Cursor:
In Cursor settings, add a custom model:
- Base URL:
http://localhost:8856/v1 - Model:
auto
OpenClaw:
nadirclaw openclaw onboard
Python:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8856/v1",
api_key="local",
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Explain async/await"}],
)
print(response.choices[0].message.content)
Check Your Savings
After using NadirClaw for a bit, run:
nadirclaw report
You'll see:
- Total requests
- Tier distribution (how many were simple vs. complex)
- Cost breakdown
- Token usage
Then:
nadirclaw savings
This shows exactly how much money you saved compared to routing everything to the expensive model.
What Just Happened?
Every request goes through a lightweight classifier:
- Prompt comes in
- NadirClaw computes a sentence embedding (~10ms)
- Routes to the right model based on complexity
- Forwards the request and returns the response
Simple prompts (reading files, quick questions, small edits) hit the cheap model. Complex prompts (refactoring, architecture, multi-step changes) hit the premium model. You get the savings without compromising quality.
Next Steps
-
Add more providers:
nadirclaw auth add --provider anthropic --key sk-ant-... -
Use better models: Set
NADIRCLAW_COMPLEX_MODEL=claude-sonnet-4-5in~/.nadirclaw/.env -
Go fully local: Install Ollama, then
NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b nadirclaw serve -
Monitor in real time: Run
nadirclaw dashboardfor a live terminal dashboard
Troubleshooting
"Rate limit exceeded" on Gemini free tier?
You hit the 20 requests/day limit. Either:
- Wait a day
- Add another provider:
nadirclaw auth add --provider openai --key sk-... - Use Ollama (local, free):
NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b nadirclaw serve
Classifier taking too long on first request?
The first request downloads the embedding model (~80 MB) and loads it into memory. Takes 2-3 seconds once. After that, classification is ~10ms per request.
Want to force a specific model for a request?
Set model in your request:
-
model: "premium"→ always use the complex model -
model: "eco"→ always use the simple model -
model: "sonnet"→ use Claude Sonnet (model alias)
Why NadirClaw?
Most LLM usage doesn't need a $15/M-token model. 60-70% of prompts in typical coding sessions are simple enough for a $0.50/M-token model. But without classification, everything hits the expensive default.
NadirClaw fixes that. It's a local proxy, not a middleman service. Your API keys never leave your machine. No third-party tokens, no subsidized pricing that disappears in six months, no platform risk.
You keep control. You cut costs.
Full disclosure: I'm the author. NadirClaw is open source (MIT) and lives at https://github.com/doramirdor/NadirClaw. If you find it useful, give it a star.
Top comments (0)