Built a small tool for Claude Code that tracks whether “agent memory” rules actually pay for themselves in token usage.
The idea is simple: every persistent instruction should justify itself by reducing future token cost. If it doesn’t, it gets flagged for removal.
Over time, a surprising amount of memory ends up being neutral or negative ROI once measured. Check it out, would mean a lot :)
Top comments (1)
This resonates. I've been thinking about the same problem but from the opposite angle — every feature I add to my extension that promises "smarter" behaviour ends up costing more tokens than the user saves, and the only ones that paid off were the ones that shrank the prompt context (selecting less, screenshotting less, sending the bare minimum).
The framing I keep getting stuck on is whether memory ROI math should be per-session or per-week. A rule that costs 50 tokens to add but only saves 30 in a single session looks negative, but if it saves 300 across a week of repeat work the math flips. My guess is most "memory ROI" comparisons ignore session-spanning savings because they're hard to attribute, and that might be where your negative-ROI results are hiding.
I also wonder how you handle staleness — a rule that paid off in March might not pay off in June if the model behind the agent changed. Token-warden seems to flag rules for removal based on observed cost, but does it re-test the same rule periodically, or only when someone touches it? The first behaviour catches drift; the second doesn't.
Repo starred, going to read the implementation when I have a minute.