DEV Community

Dor Amir
Dor Amir

Posted on

How to Set Up NadirClaw with Docker + Ollama for Zero-Cost Local LLM Routing

I run a lot of AI coding tools. Claude Code, Cursor, Continue. They all burn through API credits fast. Simple tasks like "read this file" or "what does this function do?" hit the same expensive models as complex refactoring requests.

NadirClaw is an LLM router I built to fix this. It classifies prompts and routes simple ones to cheap (or free) models, complex ones to premium models. The result: 40-70% lower API bills.

This tutorial shows you how to run NadirClaw with Ollama in Docker for completely free local routing. No API keys, no costs, no external dependencies.

What You'll Build

By the end of this guide, you'll have:

  • NadirClaw running in Docker as an OpenAI-compatible proxy
  • Ollama running locally with free models (Llama, Qwen, DeepSeek)
  • A setup that routes simple prompts to local models and complex prompts to your choice of cloud provider (or keep it fully local)

Total cost: $0/month for simple requests. Pay only for the complex prompts that need premium models.

Prerequisites

  • Docker and Docker Compose installed
  • 16GB RAM minimum (32GB recommended for larger models)
  • Basic terminal familiarity

That's it. No API keys required for the fully local setup.

Quick Start: Fully Local Setup

Clone the NadirClaw repo and start both services:

git clone https://github.com/doramirdor/NadirClaw.git
cd NadirClaw
docker compose up
Enter fullscreen mode Exit fullscreen mode

This starts:

  • Ollama on port 11434
  • NadirClaw on port 8856

Once running, pull a model:

docker compose exec ollama ollama pull llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

Now point your AI tool at http://localhost:8856/v1 and you're routing. Zero cost.

The docker-compose.yml Breakdown

Here's what's running under the hood:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: nadirclaw-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_models:/root/.ollama
    restart: unless-stopped

  nadirclaw:
    build: .
    container_name: nadirclaw-router
    ports:
      - "8856:8856"
    environment:
      - NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b
      - NADIRCLAW_COMPLEX_MODEL=ollama/qwen3:32b
      - OLLAMA_API_BASE=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_models:
Enter fullscreen mode Exit fullscreen mode

What this does:

  1. Spins up Ollama with persistent model storage
  2. Runs NadirClaw configured to use Ollama for both tiers
  3. Sets up internal Docker networking so NadirClaw can talk to Ollama

Recommended Models for Routing

Not all models are created equal. Here's what works well for routing:

Simple tier (fast, good enough):

  • llama3.1:8b (4.7 GB) - Fast, handles most simple tasks
  • gemma2:9b (5.4 GB) - Good for quick questions

Complex tier (local, capable):

  • qwen3:32b (19 GB) - Strong reasoning, good for refactoring
  • deepseek-r1:14b (9 GB) - Reasoning-optimized, slower but thorough

Pull what you need:

docker compose exec ollama ollama pull llama3.1:8b
docker compose exec ollama ollama pull qwen3:32b
Enter fullscreen mode Exit fullscreen mode

Hybrid Setup: Local Simple, Cloud Complex

The sweet spot for most developers: route simple prompts to free local models, complex prompts to premium cloud models.

Create a .env file in the NadirClaw directory:

# .env
NADIRCLAW_SIMPLE_MODEL=ollama/llama3.1:8b
NADIRCLAW_COMPLEX_MODEL=claude-sonnet-4-5-20250929
ANTHROPIC_API_KEY=sk-ant-your-key-here
OLLAMA_API_BASE=http://ollama:11434
Enter fullscreen mode Exit fullscreen mode

Update docker-compose.yml to load the env file:

services:
  nadirclaw:
    build: .
    env_file: .env
    ports:
      - "8856:8856"
    depends_on:
      - ollama
Enter fullscreen mode Exit fullscreen mode

Restart:

docker compose down
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Now:

  • Simple prompts (60-70% of requests) hit Ollama (free)
  • Complex prompts (30-40% of requests) hit Claude (paid)

You're paying only for the prompts that actually need a premium model.

Using It with Claude Code

Point Claude Code at NadirClaw:

export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local
claude
Enter fullscreen mode Exit fullscreen mode

Or make a shell alias:

# Add to ~/.zshrc or ~/.bashrc
alias claude-routed='ANTHROPIC_BASE_URL=http://localhost:8856/v1 ANTHROPIC_API_KEY=local claude'
Enter fullscreen mode Exit fullscreen mode

Now claude-routed gives you cost-optimized routing automatically.

Using It with Cursor or Continue

In your AI tool's settings:

Base URL:

http://localhost:8856/v1
Enter fullscreen mode Exit fullscreen mode

Model:

auto
Enter fullscreen mode Exit fullscreen mode

API Key:

local
Enter fullscreen mode Exit fullscreen mode

That's it. NadirClaw handles routing behind the scenes.

Verify It's Working

Make a test request:

curl http://localhost:8856/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'
Enter fullscreen mode Exit fullscreen mode

You should see a response. Check the logs to see which model was used:

docker compose logs nadirclaw | tail -20
Enter fullscreen mode Exit fullscreen mode

Look for lines like:

[2026-03-02 11:15:22] classify | tier=simple confidence=0.2848 model=ollama/llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

How Much Does This Save?

Real numbers from my coding sessions:

Without routing (all requests to Claude Sonnet):

  • 147 requests in 8 hours
  • Total cost: $24.18

With NadirClaw (simple to Ollama, complex to Claude):

  • Simple tier (62%): $0.00 (local)
  • Complex tier (31%): $7.32 (Claude)
  • Direct (7%): $1.12
  • Total cost: $8.44

Savings: $15.74 (65% reduction)

Your mileage will vary, but 40-70% savings is typical once you route simple prompts to local models.

Customizing the Routing

NadirClaw classifies prompts automatically using sentence embeddings. Classification takes ~10ms.

Simple prompts:

  • "What does this function do?"
  • "Read the file at src/main.py"
  • "Add a docstring"

Complex prompts:

  • "Refactor this module to use dependency injection"
  • "Design a caching layer for this API"
  • "Explain the tradeoffs between these architectures"

The router also detects:

  • Agentic requests (tool use, multi-step loops) → forces complex model
  • Reasoning tasks (2+ markers like "step by step") → uses reasoning model
  • Long context → swaps to a model with a larger context window

Monitoring and Reports

Check your routing stats:

docker compose exec nadirclaw nadirclaw report
Enter fullscreen mode Exit fullscreen mode

Example output:

NadirClaw Report
==================================================
Total requests: 147

Tier Distribution
------------------------------
simple      83 (62.9%)
complex     41 (31.1%)
direct       8 (6.1%)

Model Usage
------------------------------------------------------------
Model                          Reqs    Tokens
ollama/llama3.1:8b              83     48210
claude-sonnet-4-5-20250929      41    127840
Enter fullscreen mode Exit fullscreen mode

See how much you've saved:

docker compose exec nadirclaw nadirclaw savings
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

NadirClaw can't reach Ollama:

Make sure you're using http://ollama:11434 as the OLLAMA_API_BASE (Docker service name, not localhost).

Models are slow:

Larger models need more RAM. Start with llama3.1:8b and upgrade once you confirm it's working.

Routing isn't working:

Check the logs:

docker compose logs nadirclaw
Enter fullscreen mode Exit fullscreen mode

Look for classification decisions. If prompts are all going to one tier, you might need to adjust NADIRCLAW_CONFIDENCE_THRESHOLD.

Production Tweaks

For a more robust setup, add health checks and resource limits:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_models:/root/.ollama
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        limits:
          memory: 16G

  nadirclaw:
    build: .
    ports:
      - "8856:8856"
    env_file: .env
    depends_on:
      ollama:
        condition: service_healthy
    restart: unless-stopped
Enter fullscreen mode Exit fullscreen mode

What's Next

This setup gives you free local routing for simple prompts. From here you can:

  • Add more Ollama models for specific tasks (coding, reasoning, etc.)
  • Set up budget alerts with NADIRCLAW_DAILY_BUDGET and NADIRCLAW_MONTHLY_BUDGET
  • Enable cost tracking dashboards with nadirclaw dashboard
  • Add fallback chains for automatic model failover

The full setup guide is in the NadirClaw repo.


Full disclosure: I'm the author of NadirClaw. I built it because I was burning through Claude credits on prompts that didn't need a premium model. The Docker + Ollama setup makes it easy to route most requests locally and pay only for the complex stuff.

If you're hitting quota limits or want to cut your AI API bills, this is the setup. Zero cost for simple prompts, automatic routing, no code changes.

GitHub: doramirdor/NadirClaw

Top comments (0)