Genie

Posted on Apr 16 • Originally published at github.com

What's eating your Claude Code context window? I wrote a 500-line Python script to find out

#claude #claudecode #python #devtools

If you use Claude Code seriously — Max plan, 50+ skills, a CLAUDE.md that's grown organically over months — you've probably hit this moment:

You run claude /context and it says your system prompt is sitting at 14% of your context window before you've typed anything. And claude /cost tells you today's spend but doesn't say what inside your setup is expensive.

Tokens are real money. You can't optimize what you can't see. So I wrote cc-healthcheck — a single Python file, zero dependencies, zero network, that reads ~/.claude/ locally and answers three questions:

What auto-loads into every session? (CLAUDE.md + every @-reference + rules/*.md + every skill frontmatter)
Are my hooks broken? (pipe-corruption bugs, missing timeouts, case-sensitivity traps)
Where did the last session's tokens actually go? (per-model totals, cache hit ratio, system-reminder injection count)

Sample output on my own machine:

━━━ cc-healthcheck v0.1.0 ━━━

[1] Auto-Load Budget
    CLAUDE.md chain:       12.0K  (420 lines across 4 file(s))
    rules/*.md (11 files):  7.9K
    skills frontmatter (76): 3.6K  (full bodies: 102.5K — loaded on invocation)
    ───────────────────────────────
    Total auto-loaded:     23.4K  (2.34% of 1M)
    Status: ✅ HEALTHY (soft limit: 100.0K, hard: 200.0K)

[2] Hooks (20 total across 6 events)
    Issues (5):
      ⚠️  [SessionStart] inline '|' without quoting — known Claude Code #1132 corruption risk
      ⚠️  [PreToolUse/Write] no timeout set — hook can hang indefinitely
      ...

[3] Latest Session X-Ray
    Size: 1.11 MB, 365 records (147 assistant turns)
    Cumulative API tokens: 29.4M  (cache_read 90.6% — cache working)
    ⚠️  system-reminder injections: 13 occurrences

How it actually works

The code is ~500 lines of Python stdlib. No tiktoken, no requests, no external anything — just json, pathlib, re, argparse.

Counting tokens without tiktoken

For a health-check tool, exact tokenization is overkill. I use len(text) / 4.0 as the standard English-prose approximation (OpenAI/Anthropic both document this ratio). For JSON/code it drifts to ~3.5, but order-of-magnitude is what matters when you're asking "is my CLAUDE.md eating 3K or 30K?"

def est_tokens(s, ratio=4.0):
    if not s:
        return 0
    return max(1, int(len(s) / ratio))

If you need exact numbers, pipe the --json output into a real tokenizer. I'd rather keep the tool install-free than 10% more accurate.

Following `@` references

Claude Code's CLAUDE.md supports @~/path/to/file.md at the start of a line to include other files in the auto-loaded context. To count the whole tree:

AT_REF_RE = re.compile(r"^@(~[^\s]+|[^\s]+)", re.MULTILINE)

def include(p, via="root"):
    if p in seen or not p.exists():
        return
    seen.add(p)
    text = p.read_text(encoding="utf-8", errors="replace")
    # count tokens, record path
    ...
    for m in AT_REF_RE.finditer(text):
        ref = m.group(1).strip()
        target = resolve_at_ref(ref, p.parent)
        if target:
            include(target, via=f"@ from {p.name}")

The seen set prevents infinite loops if two files @-reference each other (I've seen this happen in real configs).

Linting hooks for known bugs

Claude Code hooks are a JSON structure in ~/.claude/settings.json. Three recurring issues:

Inline | without quoting — tracked as anthropics/claude-code#1132, marked "not planned" for fix. The command string gets split on | before shell parses it, and your hook silently mangles.
No timeout field — hooks can hang indefinitely, freezing your Claude session.
Lowercase matcher with capitalized tool name — matchers are case-sensitive but docs are ambiguous. "edit" won't match Edit.

The linter flags all three:

if isinstance(cmd, str) and "|" in cmd and '"' not in cmd and "'" not in cmd:
    out["issues"].append({
        "severity": "warn",
        "msg": "inline '|' without quoting — known #1132 corruption risk",
    })

X-raying the JSONL session

Claude Code writes every session to ~/.claude/projects/<id>/<uuid>.jsonl. Each assistant turn has this shape:

{
  "type": "assistant",
  "isSidechain": false,
  "message": {
    "model": "claude-opus-4-6",
    "usage": {
      "input_tokens": 3,
      "output_tokens": 27,
      "cache_creation_input_tokens": 59469,
      "cache_read_input_tokens": 11530
    }
  }
}

Sum those fields across all type === "assistant" records (including isSidechain: true subagent calls, which is the bit that /cost misses) and you have the real API spend for that session.

A bonus finding: counting <system-reminder> occurrences in the raw JSONL is a useful metric. On Claude Code 2.1.x, the skill-trigger list gets re-broadcast inside a system-reminder on many user turns. Those blocks are inside the cached prefix so you're only billed once per 5-minute cache window, but they still count against the context window on every turn.

Why bother?

Two recent open issues on the Claude Code repo describe the same symptom:

#46339 — "System prompt token consumption increased ~40-50% between v2.1.92 and v2.1.100 with zero changes to user configuration"
#46917 — "v2.1.100 sends 978 fewer bytes than v2.1.98 but is billed 20,196 MORE tokens"

Both reporters had to set up HTTP proxies or manual diffs to investigate. cc-healthcheck won't solve the server-side inflation (only Anthropic can), but it lets you separate the two pools: is it your config that grew, or the platform? Without that, it's all vibes.

Install

Zero-install — run straight from GitHub:

curl -sSL https://raw.githubusercontent.com/Genie-J/cc-healthcheck/main/cc_healthcheck.py | python3 -

Or clone + run:

git clone https://github.com/Genie-J/cc-healthcheck
python3 cc-healthcheck/cc_healthcheck.py

Flags:

cc-healthcheck              # text report
cc-healthcheck --json       # JSON for CI
cc-healthcheck --verbose    # per-file breakdown
cc-healthcheck --version

Repo

github.com/Genie-J/cc-healthcheck — MIT. Issues welcome, especially reconciliation cases where cc-healthcheck numbers don't match what /cost or your Anthropic billing shows.

If you like this flavor (small single-file local tools), I also wrote BurnCheck — same philosophy, different problem (predicting whether your weekly Opus cap is about to hit mid-task).

Keep your context tight. Your wallet will thank you.

DEV Community

What's eating your Claude Code context window? I wrote a 500-line Python script to find out

How it actually works

Counting tokens without tiktoken

Following `@` references

Linting hooks for known bugs

X-raying the JSONL session

Why bother?

Install

Repo

Top comments (0)

How it actually works

Counting tokens without tiktoken

Following @ references

Linting hooks for known bugs

X-raying the JSONL session

Why bother?

Install

Repo

Following `@` references