DEV Community

Vuk Topalović
Vuk Topalović

Posted on

"make your AI better" is guesswork — token-warden only keeps changes it can prove, with real numbers on a fair repeatable test, made the work cheaper.

token-warden is a thrifty office manager for your AI assistants. It does four things:

  1. Keeps the receipts. Every time the AI finishes a task, it quietly notes how much that cost — like saving every taxi receipt in a drawer.
  2. Notices waste. When a task costs far more than usual, it asks a cheap junior AI: "Why was that so expensive? What habit would've made it cheaper?" — and writes down a suggested habit, e.g. "search for the right file before opening files at random."
  3. Tests the habit for real — this is the important part. It doesn't just trust the suggestion. It keeps a fixed set of practice tasks (like a standardized test that never changes), and runs them twice: once with the new habit, once without. Now it has hard numbers on whether the habit actually saved money, instead of a hunch.
  4. Keeps only what pays off. A habit takes up room in the AI's memory, and that room itself costs a little every single time. So the rule is strict: a habit must save at least twice what it costs to keep, or it's thrown out. Winners get written into the AI's permanent memory so it uses them automatically forever after; losers are discarded (but remembered as "tried it, didn't work" so the same bad idea won't come back).

    GitHub logo vukkt / token-warden

    Claude Code plugin that makes coding agents measurably cheaper over time: collect token costs, distill candidate rules, benchmark them on a frozen golden suite, and keep only rules that earn their context rent.

    token-warden

    CI License: MIT

    A Claude Code plugin that makes coding agents measurably cheaper over time.

    Most "agent memory" accumulates advice nobody ever verifies. token-warden treats agent memory as an engineering problem: every rule that wants space in an agent's context must prove, on a fixed benchmark, that it saves more tokens than it costs — or it gets evicted. The result is a per-agent memory file containing only rules with measured positive return.

    • Measured, not vibes — every rule carries a token delta from real benchmark runs
    • Self-funding — rules must save ≥ 2× their own context rent to stay
    • Self-auditing — active rules are re-benchmarked round-robin and evicted when they stop earning
    • Zero session overhead — collection runs in a Stop hook that never blocks or fails your work

    Table of contents

Top comments (0)