DEV Community

paolomonasterolo
paolomonasterolo

Posted on

I built a CLI to audit Anthropic API spend. Here's what it detects.

Most teams I've talked to who run on the Anthropic API have a shared key, some kind of dashboard nobody really looks at, and a Slack ping when the monthly bill crosses a round number. That's not FinOps. That's hope.

I wanted a tool that does something different. Not "show me my spend" — more like "tell me the three things I'd change tomorrow morning to cut the bill, with a dollar number next to each one." That tool didn't exist, so over the past few weeks I built MigrateCore.

It's a CLI. Read-only. You give it your Anthropic Admin API key, it pulls 30 days of usage, and prints a ranked list:

MigrateCore — last 30 days
─────────────────────────────────────────────
  $4,217 / mo    addressable waste
   3 high-confidence migrations ready

  → cache    $2,140  /mo   (high)
  → model    $1,420  /mo   (medium)
  → tag        $657  /mo   (high)
Enter fullscreen mode Exit fullscreen mode

This post is about the choices behind that output.

Why aggregates, not requests

There were two ways to build this.

The first is to sit in front of the user's Anthropic client as a proxy and capture every request. You see prompts, you see responses, you can be precise about everything.

The second is to read what the Admin API already has — the same numbers Anthropic shows in the billing dashboard, but structured. Per day, per API key, per model: input tokens, output tokens, cache activity, metadata. No prompt content.

I picked the second.

The proxy approach adds latency to every production call. Even a couple of milliseconds is a tradeoff customers shouldn't have to make for an analysis tool. It also turns me into a trust surface in their request path — if MigrateCore goes down, what happens to their Claude calls? That's not a question I want users asking.

And honestly, that lane is taken. Helicone and Langfuse have years of head start on runtime LLM observability. Building a worse version of a category that already has good tools wasn't the wedge.

The Admin API gives less data — you can't see prompts. But it gives enough to detect three specific kinds of waste with confidence, which is what v0.1 ships.

The three heuristics

Cache

The cleanest signal in the dataset. Anthropic's prompt caching cuts cached-input cost to roughly 10% of the standard rate. For workloads with stable prefixes — system prompts, tool definitions, retrieved context — the math is enormous. 70 to 90% off cached input is not unusual.

I can't see prompts, so I can't prove a key has cacheable patterns. But I can prove a key isn't trying. The ratio of cache reads + writes to total input tokens is right there in the data. Below 1%, against millions of input tokens, means caching has not been turned on.

The savings estimate is intentionally pessimistic:

CACHE_ASSUMED_REUSE_RATE = 0.50  # cacheable fraction
CACHE_AVG_HIT_RATE = 0.70         # expected hit rate
Enter fullscreen mode Exit fullscreen mode

Real workloads with serious reuse push hit rates well above 70%. Defaulting below that is on purpose. The first time someone runs the tool, the dollar number either lands or it doesn't. I'd rather you say "this turned out to be bigger than predicted" than the opposite.

Model

This is the messy one.

A key spending $5K a month on Sonnet might be doing agentic reasoning that genuinely needs Sonnet, or it might be running classification prompts that Haiku would handle for one-tenth the cost. From the aggregates alone, I cannot tell.

What I can do is estimate the upper bound: if 30% of the high-tier traffic on a given key is suitable for Haiku, the savings are X. The CLI flags it as medium confidence — explicitly not a directive. The next-step text spells it out: sample some requests, build an eval set, run them through Haiku, migrate the categories that pass.

The medium label matters. It says: this finding needs your judgment, not just your checkbook. I'd rather flag five real model-migration opportunities and miss two than cause one quality regression in production.

Tag

If more than half your spend has no metadata field, MigrateCore flags it.

This isn't really a savings finding. Adding metadata.user_id to your API calls doesn't reduce tomorrow's bill by a dollar. But untagged spend is structurally invisible — you can't tell which feature, which customer, or which experiment is consuming it. Things that can't be measured tend to grow.

The tool puts a directional 5% number next to it, anchored to the rough observation that orgs which adopt per-feature attribution usually find at least that much waste in places they weren't watching. The real value is enabling everything else. You can't run a sensible model migration plan if you don't know which feature owns the high-tier calls.

What I'm not going to build

Every product is partly defined by what it refuses to do.

Multi-provider support is out. No OpenAI, no Bedrock, no Gemini. MigrateCore is Claude-native by design.

Runtime proxy mode is also out, and not just for v0.1. Sitting in the API request path is the architectural promise of the project, not a missing feature.

The CLI doesn't auto-apply migrations either. It prints recommendations and humans approve every change. Automating the migration itself is somebody else's tool.

It also doesn't run evals. I tell you when a model migration looks worth investigating; I don't run the tests that prove it's safe. That belongs to your quality bar, not mine.

On conservative thresholds

Look at the numbers again. Cache: 50% reuse, 70% hit rate. Model: 30% movable. Tag: 5% directional.

All of those are below what real workloads typically achieve. That's deliberate.

Calibration is harder than writing the heuristics. The first time a tool gives you a number, the number either earns trust or burns it. I've seen too many "save $X" tools where X turned out to be 40% of the real number, and I've stopped trusting them. So I picked thresholds I'd rather underestimate than overstate. v0.1 is a starting point — I'll tune from real usage data after launch.

Try it

pipx install migratecore
mc analyze --fixture sample
Enter fullscreen mode Exit fullscreen mode

The fixture is synthetic data shaped like the real Admin API response. Demonstrates all three migration types in about 30 seconds, no Anthropic account required.

For real usage:

export ANTHROPIC_ADMIN_KEY=sk-ant-admin01-...
mc analyze
Enter fullscreen mode Exit fullscreen mode

Admin keys live under Anthropic console → Settings → Admin Keys. Organization accounts only — Individual accounts can't create them, which is fine because the migrations MigrateCore detects are team-scale problems anyway.

Repo and issues: github.com/paolomonasterolo/migratecore. Apache 2.0.

If you have ideas for heuristics I'm missing — especially ones that work without seeing prompts — open an issue. Confidence calibration matters more than coverage.

Top comments (0)