My team and I have been using Claude Code as our primary coding agent for about six months now. Somewhere around last month, I started hitting limits on my MAX plans. My prompts weren't unusually long and my projects aren't massive, so I couldn't figure out where the tokens were going.
So I did what any developer would do when something doesn't make sense - I started logging.
The Setup
Claude Code supports lifecycle hooks - PreToolUse, PostToolUse, SessionStart, Stop. I wrote a simple PreToolUse hook on the Read tool that appends every file read to a JSON log with a timestamp, file path, and a rough token estimate based on file size.
Nothing fancy. Just a log.
What I Found
After 132 sessions across 20 projects, the numbers were consistent:
71% of all file reads were files Claude had already opened in that same session.
In one project, Claude read server.ts four times in a single session. It read package.json three times. Not because the files changed - because it had no awareness of what it had already read.
The deeper issue: Claude doesn't know what a file contains until it opens it. It can't tell the difference between a 50-token config file and a 2,000-token module. It has no map of your project. When it needs to find a function, it scans directories. When it needs to understand your architecture, it opens files one by one.
It works blind.
The Cost
I ran a comparison on a large active project. Same codebase, same prompts, three different setups:
OpenClaw + Claude ~3.4M tokens
Claude CLI (no middleware) ~2.5M tokens
With my hooks ~425K tokens
That's roughly 80% fewer tokens for the same work. The savings come from two places: Claude skipping full file reads when a description is enough, and Claude not re-reading files it already has in context.
What I Built
Those logging hooks evolved into a proper tool. Six Node.js scripts that sit in a .wolf/ directory in your project:
1. anatomy.md - the project index
Every file in your project gets a one-line description and a token estimate:
## src/api/
- `auth.ts` - JWT validation middleware. Reads from env.JWT_SECRET. (~340 tok)
- `users.ts` - CRUD endpoints for /api/users. Pagination via query params. (~890 tok)
Before Claude reads a file, the PreToolUse hook injects this info. Claude sees "this file is your Express config, ~520 tokens" and can decide whether the description is enough or whether it needs the full content. This alone eliminates a significant chunk of unnecessary reads.
The index auto-updates after every write. You never maintain it manually.
2. cerebrum.md - the learning memory
This is the file that makes sessions compound. When you correct Claude - "don't use var, always use const" - it logs the correction here. When you express a preference - "named exports, never default exports" - it goes here. When you make an architecture decision, it goes here.
The critical part is the Do-Not-Repeat list:
## Do-Not-Repeat
- 2026-03-10: Never use `var` - always `const` or `let`
- 2026-03-11: Don't mock the database in integration tests
- 2026-03-14: The auth middleware reads from `cfg.talk`, not `cfg.tts`
The PreToolUse hook on Write checks new code against this list before Claude writes it. It's not perfect - compliance is around 85-90% - but it catches the repeat offenders.
3. buglog.json - the fix memory
Every bug fix gets logged with the error message, root cause, fix, and tags. Before Claude starts debugging anything, the hook checks if that error is already in the log. Instead of spending 15 minutes rediscovering a solution, it finds it in the log and applies it.
{
"error_message": "TypeError: Cannot read properties of undefined (reading 'map')",
"root_cause": "API response was null when users array was expected",
"fix": "Added optional chaining: data?.users?.map() and fallback empty array",
"tags": ["null-check", "api-response", "react"]
}
4. token-ledger.json - the receipt
Every session gets a line item. Lifetime totals track reads, writes, anatomy hits vs misses, and repeated reads blocked. This is how I got the numbers above - and how you can verify the savings for yourself.
How the hooks work
The architecture is simple. Six hooks, three lifecycle events:
| Event | Hook | What it does |
|---|---|---|
| SessionStart | session-start.js | Creates session tracker, logs to memory |
| PreToolUse (Read) | pre-read.js | Shows anatomy info, warns on repeated reads |
| PreToolUse (Write) | pre-write.js | Checks cerebrum Do-Not-Repeat list |
| PostToolUse (Read) | post-read.js | Estimates and records token usage |
| PostToolUse (Write) | post-write.js | Updates anatomy, appends to memory |
| Stop | stop.js | Writes session summary to token ledger |
All hooks are pure Node.js file I/O. No API calls, no network requests, no external services. The only AI calls are optional scheduled tasks (anatomy rescans, memory consolidation) that use your existing Claude subscription.
The honest limitations
- Claude Code hooks are still relatively new.
PreToolUse/PostToolUsehave had reliability issues in some versions. The tool falls back toCLAUDE.mdinstructions when hooks don't fire. - Token tracking is estimation-based (character-to-token ratio), not exact API counts. Accurate to within ~15%.
-
cerebrum.mddepends on Claude actually following instructions to update it. Compliance is ~85-90%, not 100%. - The 80% reduction was on one large project. The average across 20 projects is 65.8%. Your results will depend on project size and how you use Claude.
Try it
npm install -g openwolf
cd your-project
openwolf init
Then use claude normally. The hooks fire invisibly. Run openwolf status to see your numbers.
It's called OpenWolf. Open source, AGPL-3.0.
- GitHub: github.com/cytostack/openwolf
- Docs: openwolf.com
If you have questions about the hook architecture, the token estimation approach, or anything else, I'm happy to go deeper in the comments.
Top comments (0)