If you use Claude Code seriously — Max plan, 50+ skills, a CLAUDE.md that's grown organically over months — you've probably hit this moment:
You run claude /context and it says your system prompt is sitting at 14% of your context window before you've typed anything. And claude /cost tells you today's spend but doesn't say what inside your setup is expensive.
Tokens are real money. You can't optimize what you can't see. So I wrote cc-healthcheck — a single Python file, zero dependencies, zero network, that reads ~/.claude/ locally and answers three questions:
-
What auto-loads into every session? (
CLAUDE.md+ every@-reference +rules/*.md+ every skill frontmatter) - Are my hooks broken? (pipe-corruption bugs, missing timeouts, case-sensitivity traps)
- Where did the last session's tokens actually go? (per-model totals, cache hit ratio, system-reminder injection count)
Sample output on my own machine:
━━━ cc-healthcheck v0.1.0 ━━━
[1] Auto-Load Budget
CLAUDE.md chain: 12.0K (420 lines across 4 file(s))
rules/*.md (11 files): 7.9K
skills frontmatter (76): 3.6K (full bodies: 102.5K — loaded on invocation)
───────────────────────────────
Total auto-loaded: 23.4K (2.34% of 1M)
Status: ✅ HEALTHY (soft limit: 100.0K, hard: 200.0K)
[2] Hooks (20 total across 6 events)
Issues (5):
⚠️ [SessionStart] inline '|' without quoting — known Claude Code #1132 corruption risk
⚠️ [PreToolUse/Write] no timeout set — hook can hang indefinitely
...
[3] Latest Session X-Ray
Size: 1.11 MB, 365 records (147 assistant turns)
Cumulative API tokens: 29.4M (cache_read 90.6% — cache working)
⚠️ system-reminder injections: 13 occurrences
How it actually works
The code is ~500 lines of Python stdlib. No tiktoken, no requests, no external anything — just json, pathlib, re, argparse.
Counting tokens without tiktoken
For a health-check tool, exact tokenization is overkill. I use len(text) / 4.0 as the standard English-prose approximation (OpenAI/Anthropic both document this ratio). For JSON/code it drifts to ~3.5, but order-of-magnitude is what matters when you're asking "is my CLAUDE.md eating 3K or 30K?"
def est_tokens(s, ratio=4.0):
if not s:
return 0
return max(1, int(len(s) / ratio))
If you need exact numbers, pipe the --json output into a real tokenizer. I'd rather keep the tool install-free than 10% more accurate.
Following @ references
Claude Code's CLAUDE.md supports @~/path/to/file.md at the start of a line to include other files in the auto-loaded context. To count the whole tree:
AT_REF_RE = re.compile(r"^@(~[^\s]+|[^\s]+)", re.MULTILINE)
def include(p, via="root"):
if p in seen or not p.exists():
return
seen.add(p)
text = p.read_text(encoding="utf-8", errors="replace")
# count tokens, record path
...
for m in AT_REF_RE.finditer(text):
ref = m.group(1).strip()
target = resolve_at_ref(ref, p.parent)
if target:
include(target, via=f"@ from {p.name}")
The seen set prevents infinite loops if two files @-reference each other (I've seen this happen in real configs).
Linting hooks for known bugs
Claude Code hooks are a JSON structure in ~/.claude/settings.json. Three recurring issues:
-
Inline
|without quoting — tracked as anthropics/claude-code#1132, marked "not planned" for fix. The command string gets split on|before shell parses it, and your hook silently mangles. -
No
timeoutfield — hooks can hang indefinitely, freezing your Claude session. -
Lowercase matcher with capitalized tool name — matchers are case-sensitive but docs are ambiguous.
"edit"won't matchEdit.
The linter flags all three:
if isinstance(cmd, str) and "|" in cmd and '"' not in cmd and "'" not in cmd:
out["issues"].append({
"severity": "warn",
"msg": "inline '|' without quoting — known #1132 corruption risk",
})
X-raying the JSONL session
Claude Code writes every session to ~/.claude/projects/<id>/<uuid>.jsonl. Each assistant turn has this shape:
{
"type": "assistant",
"isSidechain": false,
"message": {
"model": "claude-opus-4-6",
"usage": {
"input_tokens": 3,
"output_tokens": 27,
"cache_creation_input_tokens": 59469,
"cache_read_input_tokens": 11530
}
}
}
Sum those fields across all type === "assistant" records (including isSidechain: true subagent calls, which is the bit that /cost misses) and you have the real API spend for that session.
A bonus finding: counting <system-reminder> occurrences in the raw JSONL is a useful metric. On Claude Code 2.1.x, the skill-trigger list gets re-broadcast inside a system-reminder on many user turns. Those blocks are inside the cached prefix so you're only billed once per 5-minute cache window, but they still count against the context window on every turn.
Why bother?
Two recent open issues on the Claude Code repo describe the same symptom:
- #46339 — "System prompt token consumption increased ~40-50% between v2.1.92 and v2.1.100 with zero changes to user configuration"
- #46917 — "v2.1.100 sends 978 fewer bytes than v2.1.98 but is billed 20,196 MORE tokens"
Both reporters had to set up HTTP proxies or manual diffs to investigate. cc-healthcheck won't solve the server-side inflation (only Anthropic can), but it lets you separate the two pools: is it your config that grew, or the platform? Without that, it's all vibes.
Install
Zero-install — run straight from GitHub:
curl -sSL https://raw.githubusercontent.com/Genie-J/cc-healthcheck/main/cc_healthcheck.py | python3 -
Or clone + run:
git clone https://github.com/Genie-J/cc-healthcheck
python3 cc-healthcheck/cc_healthcheck.py
Flags:
cc-healthcheck # text report
cc-healthcheck --json # JSON for CI
cc-healthcheck --verbose # per-file breakdown
cc-healthcheck --version
Repo
github.com/Genie-J/cc-healthcheck — MIT. Issues welcome, especially reconciliation cases where cc-healthcheck numbers don't match what /cost or your Anthropic billing shows.
If you like this flavor (small single-file local tools), I also wrote BurnCheck — same philosophy, different problem (predicting whether your weekly Opus cap is about to hit mid-task).
Keep your context tight. Your wallet will thank you.
Top comments (0)