How I Cut My LLM Costs 60% With a Local Router (Open Source)

#ai #llm #devops #opensource

I run AI coding agents all day. Claude Code, Codex, various tools through OpenClaw. My API bill was climbing fast.

The problem: every prompt, no matter how simple, was hitting expensive models. "List the files in this directory" costs the same as "refactor this authentication system" when you're sending everything to Claude Opus.

So I built NadirClaw. It's an open source LLM router that classifies prompts in ~10ms and sends them to the right model. Simple stuff goes to cheap or free models. Complex stuff goes to premium.

How it works

NadirClaw runs as a local proxy on port 8856. Any tool that speaks the OpenAI API can use it. You just change your base URL.

pip install nadirclaw
nadirclaw serve

Under the hood, a sentence-embedding classifier scores prompt complexity in about 10ms. No LLM call for the classification itself.

What we just shipped in v0.6.0

Fallback chains - When a model fails (rate limit, server error, timeout), NadirClaw cascades through a list of fallback models automatically. Your agent never sees the failure.

NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5,gemini-2.5-flash

Budget alerts - Set daily and monthly spend limits. Get warnings at 80%. Every single request logs its cost.

NADIRCLAW_DAILY_BUDGET=5.00
NADIRCLAW_MONTHLY_BUDGET=50.00
nadirclaw budget  # See spend by model

Prompt caching - LRU cache with configurable TTL. Identical prompts skip the API entirely. Big win for CI/testing where you repeat prompts constantly.

Web dashboard - Visit localhost:8856/dashboard for real-time routing stats. Dark theme, auto-refreshes, zero dependencies.

Docker - One command to get NadirClaw + Ollama running locally:

docker compose up

Fully local, zero API costs.

Real numbers

On a typical coding session with mixed prompts:

~40% classified as simple (routed to Gemini Flash or Ollama)
~50% classified as complex (routed to Claude/GPT)
~10% agentic detection override (tool-calling requests forced to complex)

The 40% simple routing is where the savings come from. If you're paying $3/M tokens for everything and 40% of your traffic could use a $0.15/M model, that's a big difference over a month.

Who it's for

Anyone running AI tools locally. Works with:

OpenClaw
Codex CLI
Claude Code
Continue
Cursor
Any OpenAI-compatible client

If you're paying for OpenRouter or a similar routing service, NadirClaw does the same thing but runs on your machine with your own keys. No platform dependency.

Try it

pip install nadirclaw
nadirclaw setup  # Interactive wizard
nadirclaw serve

Or Docker:

git clone https://github.com/doramirdor/NadirClaw
cd NadirClaw
docker compose up

GitHub: github.com/doramirdor/NadirClaw

If this saves you money, a star on GitHub helps others find it too.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.