I run a lot of AI coding tools. Claude Code, Cursor, Continue. They all burn through API credits fast. Simple tasks like "read this file" or "what does this function do?" hit the same expensive models as complex refactoring requests.
NadirClaw is an LLM router I built to fix this. It classifies prompts and routes simple ones to cheap (or free) models, complex ones to premium models. The result: 40-70% lower API bills.
This tutorial shows you how to run NadirClaw with Ollama in Docker for completely free local routing. No API keys, no costs, no external dependencies.
What You'll Build
By the end of this guide, you'll have:
- NadirClaw running in Docker as an OpenAI-compatible proxy
- Ollama running locally with free models (Llama, Qwen, DeepSeek)
- A setup that routes simple prompts to local models and complex prompts to your choice of cloud provider (or keep it fully local)
Total cost: $0/month for simple requests. Pay only for the complex prompts that need premium models.
Prerequisites
- Docker and Docker Compose installed
- 16GB RAM minimum (32GB recommended for larger models)
- Basic terminal familiarity
That's it. No API keys required for the fully local setup.
Quick Start: Fully Local Setup
Clone the NadirClaw repo and start both services:
git clone https://github.com/doramirdor/NadirClaw.git
cd NadirClaw
docker compose up
This starts:
- Ollama on port 11434
- NadirClaw on port 8856
Once running, pull a model:
docker compose exec ollama ollama pull llama3.1:8b
Now point your AI tool at http://localhost:8856/v1 and you're routing. Zero cost.
The docker-compose.yml Breakdown
Here's what's running under the hood:
services:
ollama:
image: ollama/ollama:latest
container_name: nadirclaw-ollama
ports:
- "11434:11434"
volumes:
- ollama_models:/root/.ollama
restart: unless-stopped
nadirclaw:
build: .
container_name: nadirclaw-router
ports:
- "8856:8856"
environment:
- NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b
- NADIRCLAW_COMPLEX_MODEL=ollama/qwen3:32b
- OLLAMA_API_BASE=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_models:
What this does:
- Spins up Ollama with persistent model storage
- Runs NadirClaw configured to use Ollama for both tiers
- Sets up internal Docker networking so NadirClaw can talk to Ollama
Recommended Models for Routing
Not all models are created equal. Here's what works well for routing:
Simple tier (fast, good enough):
-
llama3.1:8b(4.7 GB) - Fast, handles most simple tasks -
gemma2:9b(5.4 GB) - Good for quick questions
Complex tier (local, capable):
-
qwen3:32b(19 GB) - Strong reasoning, good for refactoring -
deepseek-r1:14b(9 GB) - Reasoning-optimized, slower but thorough
Pull what you need:
docker compose exec ollama ollama pull llama3.1:8b
docker compose exec ollama ollama pull qwen3:32b
Hybrid Setup: Local Simple, Cloud Complex
The sweet spot for most developers: route simple prompts to free local models, complex prompts to premium cloud models.
Create a .env file in the NadirClaw directory:
# .env
NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b
NADIRCLAW_COMPLEX_MODEL=claude-sonnet-4-5-20250929
ANTHROPIC_API_KEY=sk-ant-your-key-here
OLLAMA_API_BASE=http://ollama:11434
Update docker-compose.yml to load the env file:
services:
nadirclaw:
build: .
env_file: .env
ports:
- "8856:8856"
depends_on:
- ollama
Restart:
docker compose down
docker compose up -d
Now:
- Simple prompts (60-70% of requests) hit Ollama (free)
- Complex prompts (30-40% of requests) hit Claude (paid)
You're paying only for the prompts that actually need a premium model.
Using It with Claude Code
Point Claude Code at NadirClaw:
export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local
claude
Or make a shell alias:
# Add to ~/.zshrc or ~/.bashrc
alias claude-routed='ANTHROPIC_BASE_URL=http://localhost:8856/v1 ANTHROPIC_API_KEY=local claude'
Now claude-routed gives you cost-optimized routing automatically.
Using It with Cursor or Continue
In your AI tool's settings:
Base URL:
http://localhost:8856/v1
Model:
auto
API Key:
local
That's it. NadirClaw handles routing behind the scenes.
Verify It's Working
Make a test request:
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'
You should see a response. Check the logs to see which model was used:
docker compose logs nadirclaw | tail -20
Look for lines like:
[2026-03-02 11:15:22] classify | tier=simple confidence=0.2848 model=ollama/llama3.1:8b
How Much Does This Save?
Real numbers from my coding sessions:
Without routing (all requests to Claude Sonnet):
- 147 requests in 8 hours
- Total cost: $24.18
With NadirClaw (simple to Ollama, complex to Claude):
- Simple tier (62%): $0.00 (local)
- Complex tier (31%): $7.32 (Claude)
- Direct (7%): $1.12
- Total cost: $8.44
Savings: $15.74 (65% reduction)
Your mileage will vary, but 40-70% savings is typical once you route simple prompts to local models.
Customizing the Routing
NadirClaw classifies prompts automatically using sentence embeddings. Classification takes ~10ms.
Simple prompts:
- "What does this function do?"
- "Read the file at src/main.py"
- "Add a docstring"
Complex prompts:
- "Refactor this module to use dependency injection"
- "Design a caching layer for this API"
- "Explain the tradeoffs between these architectures"
The router also detects:
- Agentic requests (tool use, multi-step loops) → forces complex model
- Reasoning tasks (2+ markers like "step by step") → uses reasoning model
- Long context → swaps to a model with a larger context window
Monitoring and Reports
Check your routing stats:
docker compose exec nadirclaw nadirclaw report
Example output:
NadirClaw Report
==================================================
Total requests: 147
Tier Distribution
------------------------------
simple 83 (62.9%)
complex 41 (31.1%)
direct 8 (6.1%)
Model Usage
------------------------------------------------------------
Model Reqs Tokens
ollama/llama3.1:8b 83 48210
claude-sonnet-4-5-20250929 41 127840
See how much you've saved:
docker compose exec nadirclaw nadirclaw savings
Troubleshooting
NadirClaw can't reach Ollama:
Make sure you're using http://ollama:11434 as the OLLAMA_API_BASE (Docker service name, not localhost).
Models are slow:
Larger models need more RAM. Start with llama3.1:8b and upgrade once you confirm it's working.
Routing isn't working:
Check the logs:
docker compose logs nadirclaw
Look for classification decisions. If prompts are all going to one tier, you might need to adjust NADIRCLAW_CONFIDENCE_THRESHOLD.
Production Tweaks
For a more robust setup, add health checks and resource limits:
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_models:/root/.ollama
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3
deploy:
resources:
limits:
memory: 16G
nadirclaw:
build: .
ports:
- "8856:8856"
env_file: .env
depends_on:
ollama:
condition: service_healthy
restart: unless-stopped
What's Next
This setup gives you free local routing for simple prompts. From here you can:
- Add more Ollama models for specific tasks (coding, reasoning, etc.)
- Set up budget alerts with
NADIRCLAW_DAILY_BUDGETandNADIRCLAW_MONTHLY_BUDGET - Enable cost tracking dashboards with
nadirclaw dashboard - Add fallback chains for automatic model failover
The full setup guide is in the NadirClaw repo.
Full disclosure: I'm the author of NadirClaw. I built it because I was burning through Claude credits on prompts that didn't need a premium model. The Docker + Ollama setup makes it easy to route most requests locally and pay only for the complex stuff.
If you're hitting quota limits or want to cut your AI API bills, this is the setup. Zero cost for simple prompts, automatic routing, no code changes.
GitHub: doramirdor/NadirClaw
Top comments (0)