SingYee

Posted on Apr 3

I analyzed 187 Claude Code sessions. $6,744 worth of tokens. Here's where they actually went.

#ai #rust #claude #claudecode

I've been using Claude Code heavily for the past month. Building trading bots, automation tools, side projects.

I knew I was burning through tokens but never looked at the numbers.

So I built a small CLI to parse my local session data. The result: 187 sessions. 3.3 billion tokens. $6,744 equivalent API cost.

I'm on Max, so this is equivalent API cost, not what I actually paid. But the token patterns are what matter here.

97% of my tokens were something I couldn't control

That was the first surprise. 97% were cache reads. Every turn, Claude re-reads the entire conversation context. Think of it like re-reading an entire book every time you turn a page.

The good news: cache reads are cheap ($1.5/M tokens) and completely normal. The bad news: it means the part you can actually control is tiny.

Only 2.8% of my tokens were controllable. Of that, 92.5% was cache creation (CLAUDE.md, MCP tools, system prompt loading), 6.6% was Claude's actual output, 0.9% was my input.

What I wouldn't have caught from /cost

This was the most useful part:

86 sessions over 30 turns without /compact, each one letting context balloon to 2-3x what it needed to be
840 subagent calls, every single one duplicating the full conversation context just to do a search
35 anomaly sessions burning tokens at 2-3x my normal rate
Bash was 40% of all tool calls, pumping long command outputs back into context every time
Peak hours (Mon-Fri 5-11am PT) used 1.3x more tokens on average than off-peak

What I actually changed

After seeing the data, three things:

I use /compact after ~20 turns now instead of letting sessions run endlessly
I stopped defaulting to Agent for codebase searches and use Grep/Glob directly
I try to keep heavy sessions out of peak hours when possible

Small changes, but the anomaly sessions have mostly stopped showing up.

The tool

Open sourced it. Called ccwhy, written in Rust, runs completely offline on your local ~/.claude/ data. No API keys needed.

brew install SingggggYee/tap/ccwhy

Or: cargo install ccwhy

Or: grab the binary

It's not a replacement for ccusage. ccusage tells you how much you spent. ccwhy tells you why, and what to change.

GitHub

Curious what other people's breakdowns look like. Is 97% cache reads normal, or is my setup unusually heavy?

Top comments (3)

Delimit.ai • Apr 3

Fascinating breakdown on those 187 Claude Code sessions—it's eye-opening how much token spend goes into iterative building like trading bots and automation tools.

To optimize costs in similar workflows, focus on pre-planning your prompts with clear specs upfront, which can cut down on redundant iterations based on your analysis.

I've seen this approach halve token usage in my own projects without sacrificing output quality.

Henry Godnick • Apr 4

Great analysis — the 97% cache reads thing is eye-opening. Most people just see the monthly bill without ever understanding the breakdown like this.

For what it's worth, I've been using TokenBar (tokenbar.site) as a complementary tool — it's a macOS menu bar app that shows you live token counts and running cost as you work. More of a real-time "oh this session is getting expensive" alarm than a deep post-mortem tool like ccwhy. The two actually complement each other well: TokenBar catches runaway sessions as they happen, ccwhy explains why they happened afterwards.

Your point about 86 sessions going 30+ turns without /compact hits hard. That's exactly the kind of thing you'd catch with a live counter in your menu bar.

Harjot Singh • May 31

This is the post the whole ecosystem needs - actual data instead of vibes about where the money goes. My strong prior on your findings: the spend is dominated by input/context tokens, not output - re-reading files, re-sending conversation history, and tool-call round-trips - so the bill scales with how much context you drag around, not how much code you generate. If that's what your 187 sessions show, it reframes the whole optimization conversation.

The actionable conclusion I draw from exactly this kind of data: scope context per task and route the cheap steps to cheap models, because most of those expensive frontier tokens were spent on mechanical work. That's literally the architecture of Moonshift - a multi-agent pipeline (prompt to a shipped SaaS on your own GitHub + Vercel) where per-agent context scoping + routing is why a full build lands ~$3 flat instead of bleeding tokens. First run's free, no card. Genuinely want to know: what was the single biggest token sink in your data - context re-reads, retries, or something you didn't expect?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.