DEV Community

Dor Amir
Dor Amir

Posted on

NadirClaw: Getting Started in 5 Minutes

NadirClaw is an open-source LLM router that cuts your AI API costs by 40-70%. It routes simple prompts to cheap models and complex ones to premium models, automatically. Zero code changes.

This guide gets you running in under 5 minutes.

What You'll Need

That's it. No Docker, no database, no extra services.

Install

pip install nadirclaw
Enter fullscreen mode Exit fullscreen mode

Or from source:

curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Configure

Set your Gemini API key:

nadirclaw auth add --provider google --key AIza...
Enter fullscreen mode Exit fullscreen mode

Or export it:

export GEMINI_API_KEY=AIza...
Enter fullscreen mode Exit fullscreen mode

Start the Router

nadirclaw serve --verbose
Enter fullscreen mode Exit fullscreen mode

NadirClaw starts on http://localhost:8856 with sensible defaults:

  • Simple prompts → Gemini 2.5 Flash (cheap, fast)
  • Complex prompts → Gemini 2.5 Pro (powerful)

You'll see logs like this:

[NadirClaw] Starting on http://localhost:8856
[NadirClaw] Simple model: gemini-2.5-flash
[NadirClaw] Complex model: gemini-2.5-pro
[NadirClaw] Ready.
Enter fullscreen mode Exit fullscreen mode

Test It

Send a request:

curl http://localhost:8856/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}'
Enter fullscreen mode Exit fullscreen mode

NadirClaw classifies the prompt in ~10ms and routes it to the right model. Simple question? Gemini Flash. Complex refactoring? Gemini Pro.

Use It with Your Tools

NadirClaw is OpenAI-compatible. Point any tool at it:

Claude Code:

export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local
claude
Enter fullscreen mode Exit fullscreen mode

Cursor:

In Cursor settings, add a custom model:

  • Base URL: http://localhost:8856/v1
  • Model: auto

OpenClaw:

nadirclaw openclaw onboard
Enter fullscreen mode Exit fullscreen mode

Python:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8856/v1",
    api_key="local",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain async/await"}],
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Check Your Savings

After using NadirClaw for a bit, run:

nadirclaw report
Enter fullscreen mode Exit fullscreen mode

You'll see:

  • Total requests
  • Tier distribution (how many were simple vs. complex)
  • Cost breakdown
  • Token usage

Then:

nadirclaw savings
Enter fullscreen mode Exit fullscreen mode

This shows exactly how much money you saved compared to routing everything to the expensive model.

What Just Happened?

Every request goes through a lightweight classifier:

  1. Prompt comes in
  2. NadirClaw computes a sentence embedding (~10ms)
  3. Routes to the right model based on complexity
  4. Forwards the request and returns the response

Simple prompts (reading files, quick questions, small edits) hit the cheap model. Complex prompts (refactoring, architecture, multi-step changes) hit the premium model. You get the savings without compromising quality.

Next Steps

  • Add more providers: nadirclaw auth add --provider anthropic --key sk-ant-...
  • Use better models: Set NADIRCLAW_COMPLEX_MODEL=claude-sonnet-4-5 in ~/.nadirclaw/.env
  • Go fully local: Install Ollama, then NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b nadirclaw serve
  • Monitor in real time: Run nadirclaw dashboard for a live terminal dashboard

Troubleshooting

"Rate limit exceeded" on Gemini free tier?

You hit the 20 requests/day limit. Either:

  • Wait a day
  • Add another provider: nadirclaw auth add --provider openai --key sk-...
  • Use Ollama (local, free): NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b nadirclaw serve

Classifier taking too long on first request?

The first request downloads the embedding model (~80 MB) and loads it into memory. Takes 2-3 seconds once. After that, classification is ~10ms per request.

Want to force a specific model for a request?

Set model in your request:

  • model: "premium" → always use the complex model
  • model: "eco" → always use the simple model
  • model: "sonnet" → use Claude Sonnet (model alias)

Why NadirClaw?

Most LLM usage doesn't need a $15/M-token model. 60-70% of prompts in typical coding sessions are simple enough for a $0.50/M-token model. But without classification, everything hits the expensive default.

NadirClaw fixes that. It's a local proxy, not a middleman service. Your API keys never leave your machine. No third-party tokens, no subsidized pricing that disappears in six months, no platform risk.

You keep control. You cut costs.


Full disclosure: I'm the author. NadirClaw is open source (MIT) and lives at https://github.com/doramirdor/NadirClaw. If you find it useful, give it a star.

Top comments (0)