n=1 note. This is the anatomy of one real session, not an average or benchmark. The specific numbers are this session's actual measured figures. The mechanic — re-sent context dominating long sessions — is general. Your numbers will differ with your workflow. The full dataset (percentile tables, methodology, charts) for n=66 sessions lives at the benchmark page; this article is about the one session that broke my mental model.
There's a session in my logs that cost $1,278.
Not $12.78. Not a typo. $1,278, across approximately 1,270 model turns, in a single Claude Code coding session. When I measured it properly, two things became obvious:
- I had completely the wrong mental model of where the money goes.
- Once you understand the mechanic, it stops being mysterious and becomes something you can actually control.
Here's the full honest breakdown.
The numbers
| Metric | Value |
|---|---|
| Total session cost | ~$1,278 |
| Model turns | ~1,270 |
| Cost per turn (average) | ~$1.01 |
| Peak context (tokens) | ~998,000 |
| Cache efficiency | ~98% |
The cost-per-turn number already tells you something is off. $1.01 per back-and-forth exchange sounds like the model is doing tremendous amounts of work on every turn. It wasn't. The session was long debugging and build work — not dramatically different from any other session in kind, only in length.
Where the money went
| Line item | Share of cost |
|---|---|
| Re-sent context (cache-read) | 66% (~$843) |
| New context written to cache (cache-write) | 20% (~$256) |
| Output — the model actually writing | 14% (~$179) |
| Fresh (uncached) input | ~0% |
The model writing code was 14% of the bill. Two-thirds of the session's cost was paying to re-send context the model had already seen, on every single turn, over and over.
That was my mental model failure. I'd thought of each turn as: ask → model thinks → model writes → charge. The actual cost structure is more like: ask → re-send the entire conversation → model writes → charge for all of it. And the re-sending part, even at the discounted cache-read rate, dominates in a long session.
Why this happens (the mechanics are simple)
Claude is stateless. It has no memory between turns. So on every single turn, the client sends the entire accumulated conversation — every prior message, every file you've read, every tool output — to give the model its context. This is the architectural reality, not a Claude quirk.
Prompt caching softens the blow: if the server has seen that exact prefix before, the re-send is billed at the cache-read rate, which is roughly 0.1× the normal input price. That's the "98% cache efficiency" number — almost all the re-sent tokens hit the cache and got the cheap rate.
Here's the problem: cheap per token, but paid on every turn, on the entire context.
The cost of re-sending context on a single turn is roughly:
cache_read_cost ≈ context_size × input_price × 0.1
And the session cost of re-sending is:
total_cache_read ≈ context_size × turns × input_price × 0.1
In this session: ~998,000 peak context tokens × ~1,270 turns × input price × 0.1. Even at 10% of the input rate, that's a huge number. A 98% cache hit rate means you're efficiently paying a small amount — on an enormous volume, repeatedly. The efficiency sounds impressive until you realize the denominator is "the whole accumulated history of a 20-hour session."
Meanwhile, the model's output — the code it actually writes, the explanations it gives — was ~14% of the bill. Output is priced higher per token than input. But the model writes far fewer tokens per turn than the context it re-reads. The output is the productive work; it's just not the biggest cost center.
The compounding problem
Context doesn't grow linearly. Early in the session: small context, cheap re-sends. As the session continues:
- Context grows as new files are read, tool outputs accumulate, and the conversation history lengthens.
- Each new turn re-sends a larger context than the last.
- And you're also taking more turns the longer the session runs.
So you're multiplying a growing per-turn cost by a growing turn count. That's why cost in a long session doesn't grow proportionally — it accelerates. By the time context peaks near 998k tokens, every single turn is a substantial re-send. There were turns in this session where the re-send cost alone was more than the cost of the model's entire response.
The counterintuitive takeaways
A 98% cache hit rate is not the same as "caching solved this." High cache efficiency means you're efficiently re-sending context at the cheap rate. It says nothing about how much you're re-sending. You can have near-perfect efficiency and still be burning money, because you're re-sending an enormous context a thousand times. Cache efficiency is a per-token metric; the bill is a product of that rate × total tokens re-sent × turns.
Output optimization is the wrong lever. If you want to reduce a long session's cost, targeting output tokens gets you to 14% of the bill at most. Everything below that line — model-side improvements, prompt compression tricks, fewer words in each response — doesn't move the cost much. The 66% (re-sent context) and 20% (cache-write) are where the money is.
The fix is mundane. The answer isn't a clever optimization — it's just keeping the context small:
-
/compactearlier in long sessions. It summarizes and drops accumulated context. The compacted version costs much less to re-send than the full history. Running it at turn 200 is far more effective than running it at turn 1,200 — because it eliminates the re-send cost on all subsequent turns while the context was large. - Start a fresh session when the task changes. Carrying 800k tokens of context from one task into an unrelated task means you pay to re-send 800k irrelevant tokens on every turn of the new work. A fresh session starts the re-send meter near zero.
- Watch context growth, not just the running total. The running total tells you what you've spent. The per-turn context size tells you what each future turn will cost. When I see context climbing past 200k tokens and I'm going to take a lot more turns, that's the signal to compact — not after the session, not when the bill arrives.
- Resident files and tools add up more than you think. A large file read on turn 3 gets re-billed on every subsequent turn. The cost of adding something to context early is the sum of its re-send cost over all future turns, not just the first one.
The honest limitation
This is n=1 — one real session. The $1,278 and the 66/20/14/0 split are the actual measured numbers for this session. Your sessions will differ depending on how long you run, how large your context gets, and what you're working on.
What generalizes is the mechanic: context size × turns drives re-sent context cost, and in any session long enough that context has grown large and turns have compounded, re-sent context will be the dominant line. The specific percentages will shift; the structure won't.
The session I measured was an extreme case — nearly a million tokens peak context, over a thousand turns. Most sessions aren't this long. In my set of 66 sessions, the median peaked at ~45k tokens and had ~29 turns; in those, re-sent context was the median session's minority (24% of spend). The lesson of the n=1 study and the n=66 benchmark together: typical sessions are fine; it's the long ones you need to watch, and in those the re-sent context is the bill.
If you want to see your own
I measured this session with a small open-source CLI that reads Claude Code's local logs:
npx @wartzar-bee/tokenscope
It runs locally — read-only, nothing uploaded, no telemetry — and shows the same breakdown: output vs. cache-read vs. cache-write, the per-turn context-growth curve, and which percentile your sessions land in against the 66-session reference set. --share emits a privacy-safe summary card (aggregate numbers only, no content or file paths) you can paste into a thread.
The full study — with the hand-coded SVG cost-split chart, context-growth curve, complete data table, and methodology — is at: https://tokenscope.pages.dev/study/
(Disclosure: I maintain tokenscope. I'm linking it because it's the tool that measured the session, not because you need it — Claude Code's JSONL logs are in ~/.claude/projects/ and the math is straightforward once you know to look at cache_read_input_tokens.)
The $1,278 session wasn't a bug or an accident. It was a long session working on a complex project with a large, growing context. Every mechanism that drove its cost is documented, predictable, and — once you know it — partly avoidable. The model writing code was 14% of it. The rest was infrastructure: paying to re-send, again and again, the memory the model doesn't have.
Top comments (0)