Mukunda Rao Katta

Posted on May 25

claude-cost: Cache-Aware Claude Cost Calculator for Rust (Anthropic API + Bedrock)

#hermeschallenge #ai #rust #agents

The cache math was wrong

Turned on prompt caching. Expected the bill to drop 90% on cached calls. It dropped 40%.

Spent an hour staring at the Anthropic dashboard. The cached token count looked right. The total looked wrong. Then I looked at the cost formula in the code.

The code was treating cache_read_input_tokens the same as fresh input_tokens. The Anthropic API uses a different price for cache reads, roughly 10x cheaper than fresh input. The dashboard was reporting the token counts correctly, but the internal accounting was computing cost at the wrong rate.

I needed a small utility that did the math correctly in one place. That became claude-cost.

Shape of the fix

[dependencies]
claude-cost = "0.1"

Call it with the token counts from the API response:

use claude_cost::{compute_cost, ModelId};

// Tokens come from response.usage in the Anthropic API response.
let cost = compute_cost(
    ModelId::ClaudeSonnet47,
    response.usage.input_tokens,
    response.usage.output_tokens,
    response.usage.cache_creation_input_tokens.unwrap_or(0),
    response.usage.cache_read_input_tokens.unwrap_or(0),
)?;

println!("fresh input:      ${:.6}", cost.fresh_input_usd);
println!("cache creation:   ${:.6}", cost.cache_creation_usd);
println!("cache read:       ${:.6}", cost.cache_read_usd);
println!("output:           ${:.6}", cost.output_usd);
println!("total:            ${:.6}", cost.total_usd);

For Bedrock cross-region inference profile IDs:

use claude_cost::{compute_cost_by_id};

// Same model, Bedrock cross-region profile ID format.
let cost = compute_cost_by_id(
    "us.anthropic.claude-sonnet-4-7-v1:0",
    input_tokens,
    output_tokens,
    cache_creation_tokens,
    cache_read_tokens,
)?;

The CostBreakdown struct:

pub struct CostBreakdown {
    pub fresh_input_usd: f64,
    pub cache_creation_usd: f64,
    pub cache_read_usd: f64,
    pub output_usd: f64,
    pub total_usd: f64,
}

What it does NOT do

No HTTP calls. No Anthropic SDK dependency. Pass in the token counts, get back a cost. That is it.
No live price fetching. The price table is baked in at publish time. If Anthropic changes pricing, you update the crate version.
No usage aggregation. This is a per-call calculator. For run-level aggregation, compose with agenttrace-rs.
No OpenAI, Gemini, or generic provider support. This crate is Claude-only. For AWS Bedrock cross-vendor pricing (Llama, Mistral, Cohere, etc.), use bedrock-cost.

Inside the lib

The crate supports two ID formats for the same Claude models.

Anthropic direct API uses IDs like claude-sonnet-4-7. AWS Bedrock uses cross-region inference profile IDs like us.anthropic.claude-sonnet-4-7-v1:0. They refer to the same model. The pricing is the same. The identifier format is completely different.

Without normalization, you would need two separate lookup tables or two separate code paths. The crate normalizes both into the same internal pricing record. The normalization rules are:

Strip the region prefix (us., eu., ap.) if present.
Strip the version suffix (-v1:0, -v2:0, etc.) if present.
Map the result to the canonical model ID using an alias table.
Look up the canonical ID in the pricing table.

This means you can take a Bedrock response and feed the model ID directly from the response metadata without any transformation in your application code.

The cache token pricing:

Fresh input tokens: billed at the standard input rate.
Cache creation tokens: billed at 1.25x the input rate (Anthropic charges a premium to write the cache entry).
Cache read tokens: billed at 0.1x the input rate (roughly 10x cheaper than fresh input).
Output tokens: billed at the output rate regardless of caching.

If you track only input_tokens and output_tokens from the API response, you miss the distinction between fresh input and cache creation. The cache_creation_input_tokens field in the usage object is where the 1.25x cost lands. Ignoring it makes your cost estimate low by up to 25% of the input cost on cache-write turns.

When useful

Building a cost tracker for Claude API calls where you need the full breakdown, not just total.
Composing with agenttrace-rs to get per-run cost visibility.
Validating that prompt caching is actually saving money before trusting the dashboard aggregate.
Applications that route to Claude via both direct API and Bedrock and need a single cost calculation path.

When NOT

If you only want a rough total and do not use prompt caching, input_tokens * input_price + output_tokens * output_price is a one-liner. You do not need this crate.
If you are using a provider other than Claude (Anthropic or Bedrock-hosted), use a provider-specific crate.
If you need live pricing that auto-updates without a code change, build a service that fetches from the Anthropic pricing page and exposes an API. This crate does not do that.

Install

[dependencies]
claude-cost = "0.1"

Crates.io: claude-cost
GitHub: MukundaKatta/claude-cost

Siblings

Lib	Boundary	Repo
bedrock-cost	Cross-vendor Bedrock pricing (Anthropic/Llama/Mistral/Cohere/Titan/AI21)	MukundaKatta/bedrock-cost
agenttrace-rs	Aggregate call records into named runs with cost and latency reports	MukundaKatta/agenttrace-rs
cachebench	Measure prompt cache hit ratios across providers	MukundaKatta/cachebench
token-budget-pool	Concurrent shared token/USD budget cap	MukundaKatta/token-budget-pool
llm-budget-window	Time-windowed (per minute/hour/day) token/USD budget	MukundaKatta/llm-budget-window

What is next

The price table in v0.1 reflects Anthropic pricing as of May 2026. The main operational risk for this crate is price table drift. A checksum or last-verified date embedded in the crate would let consumers detect when the table is old.

Haiku 3.5 and Opus 4 pricing rows would round out the model coverage. Both models were added to the Anthropic API after the initial price table snapshot.

Part of the Hermes Agent Challenge sprint. All crates shipped on crates.io.

DEV Community