DEV Community

Dor Amir
Dor Amir

Posted on

Real-World Cost Savings: 3 Months with NadirClaw

Real-World Cost Savings: 3 Months with NadirClaw

Author disclosure: I'm Dor Amir, creator of NadirClaw.

Three months ago, I deployed NadirClaw to route API calls across three tools: Claude Code, Cursor, and a few internal scripts. Here's what actually happened to my API bills.

Before NadirClaw: $287/month

Here's where the money went in December 2025:

  • Claude Code: $178 (mostly Opus 4.6 + Sonnet 4.5, a few Haiku calls)
  • Cursor: $93 (Sonnet 4.5 for autocomplete, GPT-5.2 for chat)
  • Scripts: $16 (various)

Total: $287/month

The problem wasn't waste. These are great tools. But every prompt hit the same handful of expensive models, regardless of complexity.

After NadirClaw: $97/month

Same workload, same tools. Different routing.

February 2026 breakdown:

  • Claude Code: $51 (41% Haiku 4.5, 39% Sonnet 4.5, 20% Opus 4.6)
  • Cursor: $34 (73% Haiku 4.5, 18% Sonnet 4.5, 9% GPT-5.2)
  • Scripts: $12 (mostly Haiku)

Total: $97/month

That's a 66% reduction. $190 saved per month, $2,280/year.

What Changed

NadirClaw sits between these tools and the API. It reads each prompt and routes to the cheapest model that can handle it.

Three categories:

  1. Simple (50-60% of prompts): "fix this typo", "add a comment", "what does this function do"

    → Haiku 4.5 ($1/M input, $5/M output)

  2. Medium (30-35%): "refactor this module", "debug this error", "write unit tests"

    → Sonnet 4.5 ($3/M input, $15/M output)

  3. Hard (10-15%): "architect this feature", "optimize this algorithm", "explain this codebase"

    → Opus 4.6 or GPT-5.2 ($5/M input, $25/M output)

The classifier runs locally (10ms overhead) and routes in real time. No manual rules, no config files.

The Real Distribution

Looking at 12,847 prompts from February:

  • Haiku 4.5: 6,731 prompts (52%)
  • Sonnet 4.5: 4,339 prompts (34%)
  • Opus 4.6: 1,304 prompts (10%)
  • GPT-5.2: 473 prompts (4%)

Before NadirClaw, 90% of these went to Opus or Sonnet by default. The tools don't know which prompts are simple.

Accuracy

Classifier mistakes happen. In spot checks:

  • Under-routed (sent to cheap model when expensive was needed): 3-4% of prompts. You retry, it hits Opus.
  • Over-routed (sent to expensive model when cheap would work): ~2%. Costs a bit extra but works fine.

Net accuracy: 94-95%. Good enough to save $190/month.

What Didn't Change

Same code quality. Same tools. Same workflow.

I'm still using Claude Code for refactoring, Cursor for autocomplete, and the same scripts. NadirClaw doesn't change how you work. It just routes differently.

How to Replicate This

If you're spending $200+/month on LLM APIs:

  1. Install NadirClaw: npm install -g nadirclaw
  2. Start the proxy: nadirclaw start --port 8000
  3. Point your tools at http://localhost:8000/v1 instead of the OpenAI/Anthropic API
  4. Add your API keys to ~/.nadirclaw/config.json
  5. Let it run for a week and check the dashboard: nadirclaw stats

The classifier trains on your actual prompts. The longer it runs, the better it gets.

When NadirClaw Doesn't Help

Three cases where routing won't save you money:

  1. Every prompt is hard. If you're doing complex reasoning all day, you need Opus/GPT-5.2 anyway.
  2. You're using free tiers. If you're under Anthropic's free quota, routing just adds latency.
  3. You already use cheap models. If you're already on Haiku everywhere, there's nothing to optimize.

But if you're like me, hitting Opus by default for every autocomplete suggestion, NadirClaw will cut your bill in half.

Open Source

NadirClaw is MIT-licensed and runs locally. No data leaves your machine unless you're calling the actual API.

If you try it, let me know how much you save. I'm curious if the 60-70% range holds across different workloads.


Tags: #ai #llm #opensource #productivity

Top comments (0)