DEV Community

Wes
Wes

Posted on

Your LLM Doesn't Need 200 Lines of Test Output

You paste a command's output into your LLM context, and half of it is noise. go test -v on a medium-sized project dumps hundreds of lines of === RUN, --- PASS:, and timing info that no model needs to see. cargo test is worse. git status includes instructions you've read a thousand times. Every line you send is tokens you're paying for, and most of them carry zero information.

The obvious fix is "just pipe through grep." That works until you need different filtering for success versus failure, or you want to keep error details while stripping pass lines, or you're using Claude Code and the output is captured automatically before you can touch it. The real fix is a tool that understands command output structure and compresses it before your LLM ever sees it.

That tool exists, it's two weeks old, and it already ships 40 filters.

What Is tokf?

tokf is a config-driven CLI that sits between your terminal commands and your LLM context window, compressing output down to what matters. It's built in Rust by mpecan, and the core idea is simple: each command gets a TOML filter that describes what to keep, what to skip, and how to format the result. Run git status through tokf and you get the branch name and changed files. Run cargo test and you get a one-line summary or just the failure details. The filter decides based on exit code, regex patterns, and template rendering.

It integrates with Claude Code as a hook, so filtered output flows into context automatically. It also works standalone or as a shell wrapper. About 100 stars, 5 contributors, and the commit history shows daily pushes. The project launched in mid-February 2026 and already has a server component with auth, a filter registry, and a publish workflow. That's an aggressive pace.

The Snapshot

Project tokf
Stars ~100 at time of writing
Maintainer Solo developer, daily commits
Code health 50K lines of Rust, pedantic clippy, thorough test suite
Docs Detailed CLAUDE.md, generated README, CONTRIBUTING guide
Contributor UX Declarative filter tests, tokf verify catches regressions instantly
Worth using Yes, especially with Claude Code

Under the Hood

tokf is a bigger project than you'd expect from the description. The workspace has six crates: the CLI (tokf), common types (tokf-common), the filter engine (tokf-filter), a server (tokf-server), a test macro crate, and end-to-end tests. The CLI alone is about 32K lines of Rust. There's a SQLite-backed tracking system for token savings, a binary config cache using rkyv for fast startup, and a Luau scripting escape hatch for filters that can't be expressed in TOML. That's a lot of machinery for a tool that's been public for two weeks.

The filter engine is the interesting part. A filter config is a TOML file with a processing pipeline: skip and keep patterns for line filtering, section state machines for collecting structured blocks, chunk splitters for repeating output, extract for regex captures with template interpolation, and replace for per-line substitutions. The pipeline branches on exit code: on_success and on_failure can have different templates, skip patterns, and aggregations. A fallback section catches anything that doesn't match.

Here's what the simplest filter in the stdlib looks like (go/vet.toml):

command = "go vet"
keep = ["\\.go:\\d+"]

[on_success]
output = "✓ go vet: ok"

[on_failure]
tail = 20
Enter fullscreen mode Exit fullscreen mode

Five lines. On success, you get a checkmark. On failure, you get the last 20 lines (which will contain the file references). The more complex filters (like cargo/test.toml) use sections, chunks, and aggregation to produce per-crate breakdowns with pass/fail/ignore counts.

The stdlib ships 40 filters covering git, cargo, go, npm, docker, gradle, kubectl, and more. They're embedded into the binary via include_dir!, so tokf works out of the box with no config. Custom filters go in .tokf/filters/ for per-project overrides or ~/.config/tokf/filters/ for user-level ones.

What impressed me most is the testing infrastructure. Every filter has a _test/ directory with fixture files (real command output saved as .txt) and test case TOMLs that declare expectations: contains, not_contains, starts_with, line_count. Run tokf verify and it executes every test case against its filter and reports pass/fail. The whole stdlib suite is 103 cases, and CI enforces that every filter has coverage. That's an unusual level of rigor for a config-driven tool.

The rough edges are mostly consequences of speed. The CLAUDE.md is thorough but some of the filter pipeline interactions aren't documented. I discovered that branch-level skip patterns are silently bypassed when an output template is set, because the template path in apply_branch populates {output} before skip runs. It's not a bug exactly, but it's a gotcha that cost me a debugging cycle. The project is also moving fast enough that the architecture has some weight to it. Six crates, a server, auth flows, a registry. For a two-week-old tool, that's either impressive planning or a sign that scope might outpace polish.

The Contribution

Issue #42 requested a go test filter. Go's verbose test output is a poster child for the problem tokf solves: go test -v ./... on a project with 50 tests produces 150+ lines where you only care about 2 (the ok package time summaries) or 10 (the failure details).

The contribution was pure TOML and fixtures, no Rust changes. The filter strips === RUN, === PAUSE, === CONT headers, all --- PASS: lines (including indented subtest results), bare PASS markers, and "no test files" lines. On success, it prepends ✓ go test and shows only the package summary lines. On failure, it strips the passing package lines and lets everything else through: assertion errors, --- FAIL: lines, panic stack traces.

command = "go test"
skip = ["^=== RUN ", "^=== PAUSE ", "^=== CONT ",
        "--- PASS:", "^PASS$", "^\\?\\s+"]

[on_success]
output = "✓ go test\n{output}"

[on_failure]
skip = ["^ok\\s+"]

[fallback]
tail = 20
Enter fullscreen mode Exit fullscreen mode

The test suite has three fixtures: an all-passing run across two packages with subtests and parallel tests, a mixed pass/fail run with subtest failures and assertion errors, and a panic with a full stack trace. Each has a test case TOML asserting that noise is stripped and signal is preserved.

Getting the filter right required understanding how tokf's pipeline works. The initial design had skip patterns on the on_success branch alongside the output template, which silently produced wrong output because the template path bypasses branch skip. Moving the universal patterns to top-level skip (which runs before {output} is populated) fixed it. The branch-specific ^ok\s+ pattern on on_failure works because that branch has no template. It's a clean solution once you understand the execution model.

PR #246 passes all 103 existing tests plus 3 new ones, clippy clean.

The Verdict

tokf is for anyone using LLMs as development tools. If you're running Claude Code, Cursor, or any agent that consumes terminal output, you're sending noise tokens on every command. tokf compresses that noise away. The Claude Code hook integration means you don't have to change your workflow at all.

The project is very early (two weeks public) and moving fast. The core filter engine is solid and well-tested. The stdlib covers the most common commands. The server and registry components show ambition, though they add complexity that a young project needs to manage carefully.

What would push tokf further? More stdlib filters, obviously. Better documentation of the filter pipeline's execution model, especially the interaction between branch skip and output templates. And adoption. The tool solves a real problem that every LLM-assisted developer hits daily. It just needs people to know it exists.

Go Look At This

If you use Claude Code or any LLM dev tool, try tokf. Install it, hook it up, run a few commands and compare the token counts. The difference on verbose commands is dramatic.

Contributing a filter is one of the lowest-friction open source contributions you can make: write a TOML file, save some fixture output, add test assertions, run tokf verify. No Rust required. Here's the one I wrote.

This is Review Bomb #5, a series where I find under-the-radar projects on GitHub, read the code, contribute something, and write it up. If you know a project that deserves more eyeballs, drop it in the comments.

Top comments (0)