Dor Amir

Posted on Mar 7

Stop Watching Your LLM Costs. Start Cutting Them.

#llm #devtools #opensource #claude

There are at least six tools right now that will show you exactly how much money your LLM calls cost. Helicone gives you dashboards. Arize gives you traces. SigNoz plugs into OpenTelemetry. They're all good at the same thing: showing you the bill.

None of them make it smaller.

The Observability Trap

Here's the pattern I keep seeing. Team ships an AI feature. Costs creep up. Someone sets up an observability layer. Now you have gorgeous charts showing your Claude Sonnet spend going up and to the right. Everyone nods seriously in the meeting. Nothing changes.

Observation without action is just a nicer way to watch money leave.

The problem isn't visibility. You already know LLM calls are expensive. The problem is that every single prompt hits the same model, regardless of complexity. Your "what's the weather in Tokyo" query runs on the same $15/million-token model as your "analyze this contract for liability risks" query.

That's not an observability problem. That's an architecture problem.

What If the Proxy Just Fixed It?

NadirClaw is an open-source LLM proxy. It sits between your application and the LLM API. OpenAI-compatible, drop-in replacement. One line to install:

pip install nadirclaw && nadirclaw serve

Here's what it does differently from every observability tool out there: it classifies each prompt's complexity before it hits the API. Simple prompts (lookups, formatting, extraction) route to cheap models like Gemini Flash or Claude Haiku. Complex prompts (reasoning, analysis, generation) stay on Claude Sonnet or Opus.

You don't configure rules. You don't write routing logic. The classifier handles it.

The result: 40-70% cost reduction on real workloads. Not theoretical. Measured.

"But I Need Visibility Too"

You get it. NadirClaw ships with a built-in dashboard that shows every routing decision in real time. Which prompts went to which model, why, and what it cost. You see the savings as they happen, not after the invoice arrives.

The difference is that visibility is a byproduct of the thing actually saving you money. Not the other way around.

Compare that to the observability-first approach:

Install observability tool
See that costs are high
Manually figure out which calls could use cheaper models
Write routing logic yourself
Maintain it as models and pricing change
Set up the observability tool again to monitor your routing logic

Or:

Install NadirClaw
Done

Who This Is For

If you're running any AI application that makes more than a handful of LLM calls per day, you're overpaying. Agents are the worst offenders because they run loops of tool calls, memory lookups, and planning steps. Most of those intermediate calls are simple. They don't need your most expensive model.

But it's not just agents. RAG pipelines, chatbots, content generation workflows, code assistants. Anything with volume.

If you're already using an observability tool, NadirClaw doesn't replace it. It just makes the numbers on your dashboards less painful to look at.

The Part Where I Get Opinionated

The LLM observability space is crowded because it's easy to build. Wrap the API, log the calls, render some charts. Useful, sure. But it's a vitamin, not a painkiller.

Cost reduction is the painkiller. And right now, NadirClaw is the only open-source proxy that does it automatically.

I'd rather have a tool that saves me $500/month with a basic dashboard than a tool that shows me a beautiful breakdown of the $500 I just spent.

Try It

pip install nadirclaw
nadirclaw setup    # pick your models
nadirclaw serve    # localhost:8000, OpenAI-compatible

Point your app at localhost:8000 instead of the OpenAI or Anthropic API. Everything else stays the same.

Open source. No account needed. No vendor lock-in.

getnadir.com | GitHub

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.