Create account

DEV Community

Vuk Topalović

Posted on Jun 15

"make your AI better" is guesswork — token-warden only keeps changes it can prove, with real numbers on a fair repeatable test, made the work cheaper.

token-warden is a thrifty office manager for your AI assistants. It does four things:

Keeps the receipts. Every time the AI finishes a task, it quietly notes how much that cost — like saving every taxi receipt in a drawer.
Notices waste. When a task costs far more than usual, it asks a cheap junior AI: "Why was that so expensive? What habit would've made it cheaper?" — and writes down a suggested habit, e.g. "search for the right file before opening files at random."
Tests the habit for real — this is the important part. It doesn't just trust the suggestion. It keeps a fixed set of practice tasks (like a standardized test that never changes), and runs them twice: once with the new habit, once without. Now it has hard numbers on whether the habit actually saved money, instead of a hunch.
Keeps only what pays off. A habit takes up room in the AI's memory, and that room itself costs a little every single time. So the rule is strict: a habit must save at least twice what it costs to keep, or it's thrown out. Winners get written into the AI's permanent memory so it uses them automatically forever after; losers are discarded (but remembered as "tried it, didn't work" so the same bad idea won't come back).
vukkt / token-warden

Claude Code plugin that makes coding agents measurably cheaper over time: collect token costs, distill candidate rules, benchmark them on a frozen golden suite, and keep only rules that earn their context rent.
token-warden

A Claude Code plugin that makes coding agents measurably cheaper over time.

Most "agent memory" accumulates advice nobody ever verifies. token-warden treats agent memory as an engineering problem: every rule that wants space in an agent's context must prove, on a fixed benchmark, that it saves more tokens than it costs — or it gets evicted. The result is a per-agent memory file containing only rules with measured positive return.

Measured, not vibes — every rule carries a token delta from real benchmark runs

Self-funding — rules must save ≥ 2× their own context rent to stay

Self-auditing — active rules are re-benchmarked round-robin and evicted when they stop earning

Zero session overhead — collection runs in a Stop hook that never blocks or fails your work

Table of contents

How it works

Getting started

Commands

The benchmark system

Architecture

The agents

Inter-agent approval gate

Design invariants

A recorded…
View on GitHub

vukkt / token-warden

Claude Code plugin that makes coding agents measurably cheaper over time: collect token costs, distill candidate rules, benchmark them on a frozen golden suite, and keep only rules that earn their context rent.

token-warden

Table of contents