wartzar-bee

Posted on May 31 • Originally published at tokenscope.pages.dev

The Claude Code cost formula: why the same session can cost 10x more tomorrow

#ai #devtools #programming #claude

Here's a question most Claude Code users can't answer: what will the next turn in your current session cost?

Not the total session — just the next single turn. The answer is closer to calculable than you think, and once you understand the formula, the sessions that run 10× more expensive than you expected stop being mysterious.

The formula (it fits on one line)

turn_cost ≈ (context_tokens × 0.000003 × 0.1)  +  (output_tokens × 0.000015)
             └─ cache-read (re-sent context) ─┘    └─ output ─┘

That's it. Two lines on your bill, two terms in the formula. Let me unpack what each one means and why the first one is almost always the bigger number in a long session.

Pricing note: The numbers above use Sonnet 4's rates at time of writing ($3/M input, $15/M output; cache-read ~10% of input). Rates change — check Anthropic's pricing page for the current sheet. The structure of the formula stays the same regardless.

Term 1: the re-sent context

Claude is stateless. It has no memory between turns — so every single turn, the client sends the entire accumulated conversation back to the model: every message, every file you've opened, every tool result, every prior response. This isn't a Claude quirk; it's how stateless LLMs work.

Prompt caching softens the cost: if the server has recently seen that exact context prefix, the re-send is billed at the cache-read rate, roughly 10% of the normal input price. So re-sending 100,000 tokens costs about the same as freshly inputting 10,000 tokens.

But here's the trap: cheap per token, paid every turn, on the whole context.

In the formula: context_tokens × $0.000003 × 0.1

At 50,000 tokens of context, that's $0.015 per turn. At 500,000 tokens, it's $0.15 per turn — and if you're taking 500 more turns at that context size, that's $75 in re-sends alone, from one bloated context staying in-session.

This is why the median session (p50 context ~45,000 tokens, ~29 turns) looks so different from a p90 session: the context is bigger and there are more turns compounding on top of it.

Term 2: the output

This is what most people think they're paying for: the model writing code, giving explanations, thinking. Output tokens are priced higher per-token than input (~$15/M vs $3/M), but the model writes far fewer tokens per turn than the context it re-reads.

In a typical turn, the model might write 200–500 tokens of response. At $0.000015/token, that's $0.003–$0.0075 per turn. Meaningful, but a minority of a long session's bill.

From the 66-session benchmark: output was 15% of total pooled spend across 4,339 turns. Re-sent context was 60%.

Why the formula explains the 10x sessions

Let's work through it concretely. Suppose you have two otherwise similar sessions — same project, same kinds of tasks. The only difference is context management:

Session A: compact at ~50k tokens

Peak context: 50,000 tokens
Turns: 60
Re-send cost: 50,000 × $0.000003 × 0.1 × 60 turns = $0.90
Output cost (assume 300 tokens/turn): 300 × $0.000015 × 60 turns = $0.27
Total ≈ $1.17

Session B: let it run to 500k tokens

Peak context: 500,000 tokens (big files read, lots of tool output, long history)
Turns: 400 (more turns because you kept going)
Re-send cost: 500,000 × $0.000003 × 0.1 × 400 turns = $60
Output cost: 300 × $0.000015 × 400 turns = $1.80
Total ≈ $62

Same type of work. 53× cost difference. The output barely moved; the re-send compounded.

This is exactly the structural pattern in the real data. The one-session data study measured a real session at ~$1,278 over ~1,270 turns with ~998,000 peak context tokens. Re-sent context was 66% of the bill (~$843); output was 14% (~$179).

The compounding nobody mentions

Context doesn't just grow — it compounds your cost in two directions simultaneously:

Each turn costs more as context grows (the re-send term scales linearly with context size)
You take more turns in longer sessions

So if your context doubles and you also take twice as many turns, your re-send cost goes up 4× — not 2×. The turn count multiplies the context size that's already multiplying the per-turn cost.

At context sizes below ~100k tokens and turn counts below ~50, this compounding is mild. Once you cross both thresholds simultaneously, it accelerates fast.

The three decisions that move the formula

Given the formula, there are exactly three knobs that change your bill in a long session:

1. Context size (the biggest knob)
Everything else constant, halving your peak context halves the dominant term in your re-send cost. /compact in Claude Code summarizes and drops accumulated history. It reduces context at the cost of losing exact detail — worth it on most long-running sessions. The earlier you compact, the more future turns you protect.

2. Turn count
Starting a fresh session when your task changes eliminates the multiplier. If you've just finished a debugging session at 200k tokens and want to start a new feature, carrying that context forward means re-paying for 200k irrelevant tokens on every turn of the new work.

3. What you keep in context
Files read early in a session sit in context for every subsequent turn. A 30,000-token file read on turn 3 gets re-billed on turns 4 through 400. The cost of adding something to context isn't the first send — it's the sum of re-sends over all future turns. Avoid reading large files or outputting verbose tool results unless you need them for multiple subsequent turns.

What cache efficiency actually tells you (and doesn't)

Cache efficiency — the fraction of re-sent tokens that hit the cache — is commonly cited as the key metric. It's useful but incomplete. From the benchmark: the median session ran at ~83% cache efficiency; the pooled figure was ~98%.

High cache efficiency means you're efficiently re-sending context at the cheap rate. It says nothing about how much you're re-sending. You can have 98% cache efficiency and still be burning money, because you're re-sending an enormous context a thousand times.

The metric that actually tells you where the money is going: re-sent context as a share of spend. In the typical (median) session, that was ~24%; pooled across all sessions, 60%. The gap is the whole story — the expensive sessions are the ones where re-sent context dominates.

Cache efficiency tells you the rate. Re-sent context share tells you the volume. You need both.

Running the formula on your own logs

You can't easily see these numbers in the Claude Code dashboard (it shows costs but not the cache breakdown). The underlying data is in your local logs: ~/.claude/projects/**/*.jsonl. Every model turn records cache_read_input_tokens, cache_creation_input_tokens, input_tokens, and output_tokens.

To see the formula's terms on your own sessions:

npx @wartzar-bee/tokenscope

It reads those JSONL files locally — read-only, nothing uploaded, no telemetry — and shows you the cost split (re-sent context vs. cache-write vs. output), the per-turn context-growth curve, and where your sessions land against the 66-session reference set. --share emits a privacy-safe summary card (aggregate numbers only, no file paths or prompt content) if you want to compare.

(Disclosure: I maintain tokenscope. It's the tool that generated all the numbers in this article and the benchmark. You can replicate the analysis with the raw JSONL and the formula above — you don't need the tool.)

Honesty note on the numbers

All figures in this article come from one of two sources: (1) the 66-session benchmark — a single-user reference set, clearly labelled, not a census; (2) the one-session data study — a single real session, n=1. The formula structure (re-sent context = context_size × turns × price × cache_rate) is a mathematical consequence of how stateless LLMs work with prompt caching; it holds regardless of the specific numbers. The percentages and dollar figures are one user's real measured output and would shift for a different usage pattern or price sheet. Nothing is fabricated or adjusted.

The formula isn't complicated. What makes Claude Code sessions expensive isn't the model doing expensive work — it's context size × turn count × re-send rate, compounding in a session that runs longer than you realize. Once you see the formula, "my session cost $60 instead of $6" stops being mysterious and starts being explainable and avoidable.

The full percentile tables, charts, and methodology: https://tokenscope.pages.dev/benchmark/

DEV Community