<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: BasilSkyWalk</title>
    <description>The latest articles on DEV Community by BasilSkyWalk (@basilskywalk).</description>
    <link>https://dev.to/basilskywalk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3962044%2Fa0375183-6f08-4f00-a742-2fdc3621e54e.png</url>
      <title>DEV Community: BasilSkyWalk</title>
      <link>https://dev.to/basilskywalk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/basilskywalk"/>
    <language>en</language>
    <item>
      <title>I A/B tested an MCP server that cut my Claude Code token cost</title>
      <dc:creator>BasilSkyWalk</dc:creator>
      <pubDate>Mon, 01 Jun 2026 05:54:23 +0000</pubDate>
      <link>https://dev.to/basilskywalk/i-ab-tested-an-mcp-server-that-cut-my-claude-code-token-cost-3egh</link>
      <guid>https://dev.to/basilskywalk/i-ab-tested-an-mcp-server-that-cut-my-claude-code-token-cost-3egh</guid>
      <description>&lt;p&gt;Most "I cut my token usage by X%" posts hand-wave the number. This one shows the method, the repos, and the case where the tool does basically nothing. I'd rather you trust the result than be impressed by it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem: agents read whole files to see three lines
&lt;/h2&gt;

&lt;p&gt;Watch a coding agent work on a large codebase and you'll see the same loop over and over:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;grep -rn "handlePayment" src/&lt;/code&gt; → a dozen &lt;code&gt;file:line&lt;/code&gt; hits.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Read&lt;/code&gt; four of those files &lt;strong&gt;in full&lt;/strong&gt; — hundreds of lines each — just to see the ~10 lines around each hit.&lt;/li&gt;
&lt;li&gt;Repeat for the next symbol.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each whole-file read is hundreds to thousands of tokens of context the model didn't need, and each one is another round trip. On a small repo it's invisible. On a real codebase it compounds into a slow, expensive session — and eventually a context window stuffed with files the agent only glanced at.&lt;/p&gt;

&lt;p&gt;The native tools aren't wrong; they're just &lt;em&gt;coarse&lt;/em&gt;. Grep finds lines, Read returns files, and the agent is left to staple them together at full token cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parecode: search that returns context, not files
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/BasilSkyWalk/parecode" rel="noopener noreferrer"&gt;Parecode&lt;/a&gt; is an MCP server with three tools that replace that loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ParecodeSearch&lt;/strong&gt; — ripgrep-backed search that returns &lt;em&gt;just the matched windows with surrounding context&lt;/em&gt; in a single call. It runs multiple patterns in parallel, merges overlapping windows, chunks per file so a big result set can't blow up your context, reports &lt;code&gt;estimatedTokens&lt;/code&gt; so the agent can self-budget, and lists the line ranges it omitted so nothing is silently dropped. Read-only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ParecodeExpand&lt;/strong&gt; — the natural follow-up: widen a specific &lt;code&gt;(file, startLine, endLine)&lt;/code&gt; range when the agent decides it needs more around one match. Beats a full-file &lt;code&gt;Read&lt;/code&gt; once you've located a line. Read-only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ParecodeEdit&lt;/strong&gt; — batched edits across many files in one call, with whitespace-tolerant fuzzy matching, pre/post conflict detection so a stale read can't silently clobber a file, and atomic same-directory rename writes. Cross-file edits run in parallel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the grep-then-read-four-files dance becomes one &lt;code&gt;ParecodeSearch&lt;/code&gt; call that hands back the relevant slices — and &lt;code&gt;ParecodeExpand&lt;/code&gt; only when the agent actually wants more. Fewer tokens &lt;em&gt;and&lt;/em&gt; fewer turns, because the context arrives in one response instead of five.&lt;/p&gt;

&lt;h2&gt;
  
  
  The benchmark
&lt;/h2&gt;

&lt;p&gt;I ran a matched A/B test instead of eyeballing it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model:&lt;/strong&gt; Claude Sonnet 4.6&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs:&lt;/strong&gt; n = 3 per arm, &lt;strong&gt;order alternated&lt;/strong&gt; to cancel warm-cache and ordering effects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sessions:&lt;/strong&gt; fresh each run — no carryover context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conditions:&lt;/strong&gt; the identical task with parecode on vs. off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks:&lt;/strong&gt; search-and-edit work — find every call site of a symbol and edit each — on two real codebases&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repo&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Turns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;17 sites, 8 files&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−43%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−83%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unity / C#&lt;/td&gt;
&lt;td&gt;11 sites, 5 files&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−41%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−76%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Across both: &lt;strong&gt;~40% lower cost&lt;/strong&gt; and &lt;strong&gt;~75–83% fewer assistant turns&lt;/strong&gt;. The savings come from collapsing many Grep/Read/Edit round-trips into single ParecodeSearch/ParecodeEdit calls — so the win scales with how much searching and multi-file fan-out a task has.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it does &lt;em&gt;not&lt;/em&gt; help
&lt;/h2&gt;

&lt;p&gt;If the whole task lives in one file you already have open, parecode's savings shrink toward &lt;strong&gt;zero&lt;/strong&gt; — there's no grep-then-read loop to collapse, so there's nothing to win. Same for reasoning-heavy tasks that aren't really about navigation. It earns its keep on multi-file work across a codebase the agent doesn't have memorized. I'd rather tell you that than have you install it for the wrong job and feel cheated.&lt;/p&gt;

&lt;p&gt;One more sharp edge: the edit tool's atomicity is &lt;strong&gt;per file, not cross-file&lt;/strong&gt; — one file in a batch can fail while the others apply, by design. Know that before you lean on it for a sweeping refactor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using it with Claude Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; parecode   &lt;span class="c"&gt;# needs Node 20+ and ripgrep on your PATH&lt;/span&gt;
parecode init             &lt;span class="c"&gt;# registers the MCP server, a SessionStart hook, and the explore plugin&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One honest detail: &lt;code&gt;init&lt;/code&gt; installs a SessionStart hook that nudges the agent to prefer &lt;code&gt;ParecodeSearch&lt;/code&gt; / &lt;code&gt;ParecodeEdit&lt;/code&gt; over the native &lt;code&gt;Grep&lt;/code&gt; / &lt;code&gt;Read&lt;/code&gt; / &lt;code&gt;Edit&lt;/code&gt;. Without that nudge, the first-party tools win by default and the savings never land. There's also a bundled read-only "explore" subagent pinned to a cheaper model, so discovery passes ("where is X?", "find all usages of Y") run in a cheap, isolated context instead of your main session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Boring on purpose: privacy
&lt;/h2&gt;

&lt;p&gt;Parecode makes &lt;strong&gt;no network calls at runtime&lt;/strong&gt; and ships zero telemetry — nothing about your code or your queries leaves your machine. Session logs go to your OS data directory with &lt;code&gt;0600&lt;/code&gt; permissions; prune or wipe them whenever. For a tool that sits in the middle of your codebase, that's the only acceptable default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it / tear it apart
&lt;/h2&gt;

&lt;p&gt;It's MIT-licensed, written in TypeScript, and listed on &lt;a href="https://glama.ai/mcp/servers" rel="noopener noreferrer"&gt;Glama&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/BasilSkyWalk/parecode" rel="noopener noreferrer"&gt;https://github.com/BasilSkyWalk/parecode&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;npm install -g parecode&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want a low-commitment gut check first, &lt;code&gt;parecode stats --retroactive&lt;/code&gt; estimates how much it would have saved across your &lt;em&gt;past&lt;/em&gt; Claude Code sessions (estimated, not measured — but a fair signal for your workflow).&lt;/p&gt;

&lt;p&gt;If you run it for real, I'd like to hear what numbers you get — especially the cases where it &lt;em&gt;doesn't&lt;/em&gt; help, since that's where the next version gets better. Issues and PRs welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
