If you read my earlier post on the Skill Vault pattern, you know I cut Claude Code's per-session overhead by 96%. After living with it for a few weeks, I went looking for what was still eating tokens — and found three more sinks worth killing.
This is the follow-up: smaller wins individually, but together they shaved another ~5,000 tokens off every single session with zero capability lost.
Where the Tokens Were Still Hiding
After the vault, my baseline was around 51K tokens per session. I dug into the system prompt to see what remained:
- Skill list (still): ~3K tokens
- Project
CLAUDE.md: ~2K tokens - claude-mem auto-injected timeline: ~2K tokens
- Plugin hook reminders: ~1K tokens (some recurring per turn)
Three of these were either unnecessary or way overweight. Here's how I killed each.
Sink #1: A Bloated Root CLAUDE.md (~1.5K tokens)
CLAUDE.md files auto-load into every session that touches the repo. Mine had grown to 326 lines / 8 KB with quickstart instructions for every component:
- Flutter dev commands
- Backend
npmscripts - React build steps
- ML service Python setup
- A duplicate gstack skill listing
The problem: I was loading every component's instructions on every turn, even when I was only working in one of them.
Fix: Hierarchical CLAUDE.md Files
Claude Code loads CLAUDE.md hierarchically — only the ones in your current working tree get pulled in. So instead of one fat root file, I split it:
khetisahayak/
├── CLAUDE.md # 55 lines — overview, ports, creds only
├── kheti_sahayak_app/CLAUDE.md # Flutter details
├── frontend/CLAUDE.md # React details
├── ml/CLAUDE.md # ML service details
└── kheti_sahayak_backend/CLAUDE.md # already existed
The root now contains only:
- Project overview (5 lines)
- Service ports table
- Test credentials
- Cross-cutting auth + DB notes
- Troubleshooting
Component-specific details only load when I'm actually working in that subdir.
Result: root CLAUDE.md trimmed from 326 → 55 lines (8 KB → 1.6 KB). ~1.5K tokens saved per session.
Sink #2: Plugin SessionStart Hooks Injecting "Helpful" Context
I use claude-mem for persistent memory across sessions. Genuinely useful. But it has a SessionStart hook that auto-injects a timeline of recent observations at the top of every conversation — about 50 entries, ~2K tokens:
S499 Indeed Auto-Apply — User asked how to automate job applications
S498 Indeed MCP Integration Query — clarifying JobSpy vs Apify
2337 12:33p ✅ systemd/install.sh — Backend Services Enabled
... 47 more lines
I almost never needed this auto-recall. When I want past context, I call mem-search explicitly.
Fix: Disable the Auto-Inject, Keep the Memory
The hook config lives at:
~/.claude/plugins/cache/thedotmack/claude-mem/<version>/hooks/hooks.json
The SessionStart array has three hooks. The third is the timeline injection:
{
"type": "command",
"command": "... node bun-runner.js worker-service.cjs hook claude-code context ..."
}
I removed just that one entry. The other two SessionStart hooks (install + worker-start) and the recording hooks (PostToolUse, Stop, SessionEnd) stay intact, so memory is still being captured. The MCP search server still works — I just have to ask for it.
# Always back up before editing plugin internals
cp hooks.json hooks.json.bak
# Remove the 3rd SessionStart hook (jq or manual edit)
Result: ~2K tokens saved per session. Memory still works on demand.
⚠️ Caveat: editing a file in ~/.claude/plugins/cache/ will be overwritten on plugin upgrade. For durability, mirror the change in your user-level ~/.claude/settings.json hooks block, or add a small post-upgrade re-patch script.
Sink #3: Round 2 of the Skill Vault (~1.4K tokens)
The original vault was a one-time bulk move. After several weeks of actual usage, I saw which skills I'd installed and never touched. The vault was overdue for a second pass.
Audit script (same as the first post, run again):
for f in ~/.claude/skills/*/SKILL.md; do
dir=$(dirname "$f")
name=$(basename "$dir")
size=$(wc -c < "$f")
echo "$size $name"
done | sort -rn | head -30
I ended up vaulting 27 more skills out of 73 actively loaded:
-
8 marketing skills (
mkt-content,mkt-seo,mkt-social, etc.) — I do real marketing in a separate context -
6 research skills (
research,research-deep,research-report, etc.) — episodic, not daily -
5 niche tools (
obsidian-vault,make-pdf,pair-agent,setup-browser-cookies,open-gstack-browser) -
4 design-heavy (
design-consultation,design-html,design-shotgun,devex-review) — restored only when designing -
3 plan reviews (
plan-design-review,plan-devex-review,plan-tune) -
1 interview prep (
staff-engineer-interview) — used a few times last quarter, not weekly
Bulk move:
for s in mkt-content mkt-email mkt-growth mkt-pr mkt-review mkt-seo mkt-social cmo \
research research-add-fields research-add-items research-deep research-report edit-article \
staff-engineer-interview \
obsidian-vault make-pdf pair-agent setup-browser-cookies open-gstack-browser \
plan-design-review plan-devex-review plan-tune \
design-consultation design-html design-shotgun devex-review; do
mv ~/.claude/skills/$s ~/.claude/skills-vault/ 2>/dev/null
done
The skill-vault index skill from the original post still bridges everything — Claude knows where each skill lives and restores it on demand.
Result: 73 → 46 active skills. ~1.4K tokens saved.
The Combined Result
| Fix | Tokens Saved (per session) |
|---|---|
| Sink #1 — CLAUDE.md trim + per-component split | ~1.5K |
| Sink #2 — claude-mem timeline disabled | ~2K |
| Sink #3 — Round 2 skill vault | ~1.4K |
| Total | ~4.9K |
On top of the original 96% reduction from the Skill Vault, this is another solid bite. But honestly, the dollar value isn't the point.
Why This Matters Beyond Cost
Every token in your context is a token Claude has to attend over before generating its response. The bigger your prompt, the more diluted attention becomes on the actual task.
Big context ≠ better answers. Frequently it's the opposite. Targeted context wins.
The pattern across all three fixes is the same:
- Audit what's auto-loaded vs. what's actually useful. You'll be surprised.
- Move episodic content out of the always-on path and into on-demand access.
- Trust the model to pull what it needs — when it does need the vaulted skill, the subdir's CLAUDE.md, or the memory search, it will reach for it.
Less ambient noise. Sharper signal.
Tips for Your Own Audit
-
Inspect, don't guess.
wc -cyourCLAUDE.mdfiles.ls ~/.claude/skills/. Read your pluginhooks.jsonfiles line by line. -
Hierarchy is free. Per-directory
CLAUDE.mdfiles cost nothing when you're not in that directory. -
Plugin hooks are debt. Every
SessionStartorUserPromptSubmithook is a tax. Audit them by hand. Some are essential (auth, telemetry); some inject "context" that's just clutter. - Re-vault every few weeks. Usage patterns shift. Skills that were daily three months ago may be quarterly now. Yesterday's must-have is tomorrow's vault candidate.
-
Watch for per-turn taxes. A
UserPromptSubmithook costs N tokens every single turn. Even a small reminder block adds up fast in a long session.
Conclusion
The Skill Vault pattern is still the heaviest hitter. These three follow-ups are the long tail:
- CLAUDE.md → split per component, slim the root.
- Plugin hooks → audit auto-injected context. Disable what you don't need.
- Skill vault → revisit it. Vault more.
Together: another ~5K tokens per session, zero capability lost.
When your context is tight, your model is sharp.
I'm Prakash Ponali, a Staff Engineer with 16+ years in enterprise eCommerce. Currently building Khetisahayak — a farming helper app for Telugu-speaking farmers in Andhra Pradesh. Find me on LinkedIn.
Top comments (0)