How I stopped burning tokens on CLAUDE.md (and built the tool that diagnoses it)

#ai #opensource #claude #python

Every time I hit Claude Code's token limit I had no idea where my tokens actually went. The usage counter just said I'd hit my limit. No breakdown, no diagnosis, nothing.

Turns out Claude Code writes detailed session logs to ~/.claude/projects/ after every session - every tool call, every message, every compaction event. The data was sitting on my machine the whole time. Nobody was reading it.

So I built PRISM.

What it found on my own machine

Running it for the first time was genuinely surprising:

A session where CLAUDE.md re-reads consumed 6738% of session tokens — a 237-line file being re-read on every single tool call. Silent. No warning. Just gone.
Rules in CLAUDE.md being silently ignored mid-session — the rule existed, Claude stopped following it after line 80
Migration file edits that violated my own project rules
Retry loops burning tokens in circles with no diagnosis
5 consecutive tool failures in a single session I never knew were happening

None of this was visible before. The token counter just said I'd hit my limit.

The CLAUDE.md re-read problem

Every tool call Claude Code makes re-reads your CLAUDE.md from the top of the context. A 200-line CLAUDE.md × 50 tool calls = 10,000 tokens spent on instructions per session.

If your CLAUDE.md has grown to include personality instructions, full documentation copies, or rules that only apply to one subdirectory — you're paying for all of it every time.

PRISM measures this exactly and tells you which lines are costing you the most.

The adherence problem

After line ~80 of your CLAUDE.md, Claude Code's instruction following drops significantly. LLMs follow a U-shaped attention curve — they pay most attention to the beginning and end of a prompt, least to the middle.

PRISM's attention curve scorer flags critical rules (NEVER/ALWAYS/DO NOT) buried in the middle 55% of your file where the model pays least attention. It then recommends moving them to the top or bottom.

What PRISM does

PRISM reads the JSONL files Claude Code already writes and gives you:

Health scores A-F across 5 dimensions:

Token Efficiency — CLAUDE.md re-read costs, compaction frequency
Tool Health — retry loops, edit-revert cycles, consecutive failures
Context Hygiene — compaction loss events, mid-task boundaries
CLAUDE.md Adherence — are your rules actually being followed?
Session Continuity — resume success rate, truncated sessions

Concrete recommendations:
Not just "your score is low" but specific diffs:

"Remove lines 120-148: personality instructions Claude's system prompt already handles"
"Move these 3 rules to src/CLAUDE.md — they only apply in that directory"
"Add: always use --non-interactive flags (14 retry loops detected)"

HTML dashboard:
Run prism dashboard to open a full browser view of all your project health scores. Expandable cards, grade distribution, advisor recommendations.

Install

pip install prism-cc
prism analyze

Or as a Claude Code plugin:
/plugin marketplace add jakeefr/prism
/plugin install prism@prism
/reload-plugins

Runs 100% locally. No API key. No telemetry. MIT license.

GitHub: https://github.com/jakeefr/prism

Curious what numbers other people find when they run it.
The 6738% number on my own machine genuinely shocked me —
I had no idea instructions were that expensive.