I A/B tested an MCP server that cut my Claude Code token cost

#claude #ai #showdev #opensource

Most "I cut my token usage by X%" posts hand-wave the number. This one shows the method, the repos, and the case where the tool does basically nothing. I'd rather you trust the result than be impressed by it.

The problem: agents read whole files to see three lines

Watch a coding agent work on a large codebase and you'll see the same loop over and over:

grep -rn "handlePayment" src/ → a dozen file:line hits.
Read four of those files in full — hundreds of lines each — just to see the ~10 lines around each hit.
Repeat for the next symbol.

Each whole-file read is hundreds to thousands of tokens of context the model didn't need, and each one is another round trip. On a small repo it's invisible. On a real codebase it compounds into a slow, expensive session — and eventually a context window stuffed with files the agent only glanced at.

The native tools aren't wrong; they're just coarse. Grep finds lines, Read returns files, and the agent is left to staple them together at full token cost.

Parecode: search that returns context, not files

Parecode is an MCP server with three tools that replace that loop:

ParecodeSearch — ripgrep-backed search that returns just the matched windows with surrounding context in a single call. It runs multiple patterns in parallel, merges overlapping windows, chunks per file so a big result set can't blow up your context, reports estimatedTokens so the agent can self-budget, and lists the line ranges it omitted so nothing is silently dropped. Read-only.
ParecodeExpand — the natural follow-up: widen a specific (file, startLine, endLine) range when the agent decides it needs more around one match. Beats a full-file Read once you've located a line. Read-only.
ParecodeEdit — batched edits across many files in one call, with whitespace-tolerant fuzzy matching, pre/post conflict detection so a stale read can't silently clobber a file, and atomic same-directory rename writes. Cross-file edits run in parallel.

So the grep-then-read-four-files dance becomes one ParecodeSearch call that hands back the relevant slices — and ParecodeExpand only when the agent actually wants more. Fewer tokens and fewer turns, because the context arrives in one response instead of five.

The benchmark

I ran a matched A/B test instead of eyeballing it:

Model: Claude Sonnet 4.6
Runs: n = 3 per arm, order alternated to cancel warm-cache and ordering effects
Sessions: fresh each run — no carryover context
Conditions: the identical task with parecode on vs. off
Tasks: search-and-edit work — find every call site of a symbol and edit each — on two real codebases

Repo	Task	Cost	Turns
TypeScript	17 sites, 8 files	−43%	−83%
Unity / C#	11 sites, 5 files	−41%	−76%

Across both: ~40% lower cost and ~75–83% fewer assistant turns. The savings come from collapsing many Grep/Read/Edit round-trips into single ParecodeSearch/ParecodeEdit calls — so the win scales with how much searching and multi-file fan-out a task has.

Where it does not help

If the whole task lives in one file you already have open, parecode's savings shrink toward zero — there's no grep-then-read loop to collapse, so there's nothing to win. Same for reasoning-heavy tasks that aren't really about navigation. It earns its keep on multi-file work across a codebase the agent doesn't have memorized. I'd rather tell you that than have you install it for the wrong job and feel cheated.

One more sharp edge: the edit tool's atomicity is per file, not cross-file — one file in a batch can fail while the others apply, by design. Know that before you lean on it for a sweeping refactor.

Using it with Claude Code

npm install -g parecode   # needs Node 20+ and ripgrep on your PATH
parecode init             # registers the MCP server, a SessionStart hook, and the explore plugin

One honest detail: init installs a SessionStart hook that nudges the agent to prefer ParecodeSearch / ParecodeEdit over the native Grep / Read / Edit. Without that nudge, the first-party tools win by default and the savings never land. There's also a bundled read-only "explore" subagent pinned to a cheaper model, so discovery passes ("where is X?", "find all usages of Y") run in a cheap, isolated context instead of your main session.

Boring on purpose: privacy

Parecode makes no network calls at runtime and ships zero telemetry — nothing about your code or your queries leaves your machine. Session logs go to your OS data directory with 0600 permissions; prune or wipe them whenever. For a tool that sits in the middle of your codebase, that's the only acceptable default.

Try it / tear it apart

It's MIT-licensed, written in TypeScript, and listed on Glama.

Repo: https://github.com/BasilSkyWalk/parecode
npm install -g parecode

If you want a low-commitment gut check first, parecode stats --retroactive estimates how much it would have saved across your past Claude Code sessions (estimated, not measured — but a fair signal for your workflow).

If you run it for real, I'd like to hear what numbers you get — especially the cases where it doesn't help, since that's where the next version gets better. Issues and PRs welcome.