I built a Claude Code plugin that only keeps an "agent memory" rule if it can prove it saves tokens

#agents #claude #performance #showdev

Most agent-memory setups add a rule because a model thought it sounded useful. I wanted one that has to earn its spot.

token-warden watches your Claude Code sessions, distills candidate efficiency rules from the expensive ones, then benchmarks each rule on a frozen test suite (with vs without it) and only keeps it if it saves at least 2x the tokens it costs to carry around, and breaks nothing. Everything else gets evicted. It once threw out a rule that saved 38k tokens per run because that "saving" came from the agent giving up and failing the task.

Honest result so far: on a deliberately wasteful agent, a "grep for the symbol before reading the whole file" rule cut a session from ~67k to ~56k tokens (about 16%, roughly 3 cents/session on Sonnet, ~500x what the rule itself costs). On my already-optimized agents the same rule saves basically nothing and gets auto-evicted, which is the entire point. It refuses to keep junk just to look busy.

It's measured, not vibes. Open source, MIT. I'd genuinely like people to try to break it or tell me the numbers are wrong.

https://github.com/vukkt/token-warden