The problem isn't your prompts
If you're running Claude Code, Codex, opencode, or openclaw and the API bill keeps climbing, you've probably tried writing tighter prompts. That's not where the waste is.
Four structural patterns account for most of the token spend in a typical session:
Screenshots at full resolution. The agent reads whatever images you paste or reference. A 3.3 MB screenshot from a high-DPI display lands in the model at full size. The model doesn't need native resolution to understand what's on screen.
Repeated file reads. The agent re-reads files it already touched earlier in the session. A 600-line file read three times costs 1,800 lines of tokens. There's no built-in session memory to prevent the second or third read from running the full price.
Compaction that loses context. When a session compacts, the summary doesn't know which files were actively edited or which symbols mattered, so the next request starts with the wrong picture and prompts more reads.
Bash output floods. Every pytest, npm install, docker build, or git log dumps hundreds of lines of passing-test names, deprecation warnings, and progress bars. The model processes all of it at full token cost.
These compound. On a session with 10+ file reads, a few images, and a test run, you're easily burning 3x the tokens you actually need.
token-goat fixes all four
token-goat (https://github.com/DFKHelper/token-goat) is a hook daemon for Claude Code, Codex CLI, opencode, and openclaw. Install once; it handles the rest.
Image shrinking. Intercepts screenshots before they reach the model and compresses them. A 3.3 MB PNG becomes 84 KB, 97.4% smaller.
Session-aware read hints. Tracks every file the agent reads in the session. When it's about to re-read one, it gets: "you read lines 1–420 of auth.py 12 minutes ago." Most re-reads stop.
Compaction assist. Before the session compacts, a hook builds a structured manifest — edited files, accessed symbols, key reads — and injects it into the compaction context. The next request starts with the right picture.
Bash output compression. Filters long-running command output before it hits the model. pytest goes from 150 passing-test lines to a failures-first view, 80–97% smaller. npm install collapses warnings by package. docker build keeps step headers and errors, drops the rest.
It's all automated, but you can also pull individual functions instead of whole files:
_ token-goat read "src/auth.py::login"_
On a 2,000-line module, that's 85% fewer tokens than reading the full file.
The numbers
100K wasted tokens per session runs about $0.30. Five sessions a week is $450/year. AI coding cost reduction at that scale comes from eliminating structural waste, not from writing shorter prompts. token-goat is free.
4 hours of use on my machine: 59.7 MB of data that never hit the model, 11.5 million tokens avoided. And that was just version 0.1.
Install
Requires uv (https://docs.astral.sh/uv/).
uv tool install token-goat
token-goat install
Works with Claude Code, Codex CLI, opencode, and openclaw. Windows, Linux, WSL, and macOS.
Top comments (0)