<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: J. Gravelle</title>
    <description>The latest articles on DEV Community by J. Gravelle (@jgravelle).</description>
    <link>https://dev.to/jgravelle</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3815404%2F5c973ed1-5fef-4135-b089-56d7dbd03d4d.jpg</url>
      <title>DEV Community: J. Gravelle</title>
      <link>https://dev.to/jgravelle</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jgravelle"/>
    <language>en</language>
    <item>
      <title>A Radical Diet for Karpathy’s Token-Eating LLM Wiki</title>
      <dc:creator>J. Gravelle</dc:creator>
      <pubDate>Sun, 12 Apr 2026 12:57:32 +0000</pubDate>
      <link>https://dev.to/jgravelle/a-radical-diet-for-karpathys-token-eating-llm-wiki-59ng</link>
      <guid>https://dev.to/jgravelle/a-radical-diet-for-karpathys-token-eating-llm-wiki-59ng</guid>
      <description>&lt;h2&gt;
  
  
  By Friday, Your Token Bill Looks Like a Phone Number
&lt;/h2&gt;

&lt;p&gt;You did everything right.&lt;/p&gt;

&lt;p&gt;You read Karpathy’s post. It clicked immediately. Not because it was simple, but because you’ve lived the pain: spending the first 20 minutes of every session re-teaching a model what you already taught it yesterday.&lt;/p&gt;

&lt;p&gt;The LLM Wiki idea felt like a jailbreak.&lt;/p&gt;

&lt;p&gt;Compiled knowledge.&lt;br&gt;
Persistent artifacts.&lt;br&gt;
A second brain that compounds instead of resets.&lt;/p&gt;

&lt;p&gt;So you built it.&lt;/p&gt;

&lt;p&gt;You stood up the structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;raw/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;wiki/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;index.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;log.md&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You wrote a schema. You defined ingest. You defined lint. You fed it sources.&lt;/p&gt;

&lt;p&gt;And it worked.&lt;/p&gt;

&lt;p&gt;The model moved through your wiki like it had memory.&lt;br&gt;
Cross-references held. Synthesis stuck.&lt;br&gt;
For the first time in a while, you weren’t rebuilding context—you were building on top of it.&lt;/p&gt;

&lt;p&gt;For a moment, you were ahead.&lt;/p&gt;

&lt;p&gt;Then the wiki grew.&lt;/p&gt;

&lt;p&gt;No alarms. No failure message. Just drag.&lt;/p&gt;

&lt;p&gt;Queries got heavier.&lt;br&gt;
Answers got softer.&lt;br&gt;
The model started missing things you &lt;em&gt;knew&lt;/em&gt; were there.&lt;/p&gt;

&lt;p&gt;You linted it. Structurally clean.&lt;/p&gt;

&lt;p&gt;But something had shifted.&lt;/p&gt;

&lt;p&gt;The context window was getting crowded, and &lt;code&gt;index.md&lt;/code&gt; was quietly becoming the bottleneck.&lt;/p&gt;

&lt;p&gt;Here’s the part nobody says out loud in the “RAG is dead” takes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The LLM Wiki doesn’t eliminate token cost. It moves it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You traded:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;per-query retrieval cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;per-session compilation cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a fantastic trade…&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;until the wiki outgrows the window.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Shape of the Problem (Before You Even See the Numbers)
&lt;/h2&gt;

&lt;p&gt;You don’t need a benchmark to understand what’s happening.&lt;/p&gt;

&lt;p&gt;You just need to see the curve.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzvyuuhdk13skwxbud43z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzvyuuhdk13skwxbud43z.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One approach scales with size.&lt;br&gt;
The other scales with the answer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s the entire story.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Karpathy Actually Proposed (And Why It Hit So Hard)
&lt;/h2&gt;

&lt;p&gt;Most takes flattened this into “RAG killer.”&lt;/p&gt;

&lt;p&gt;That’s not what it is.&lt;/p&gt;

&lt;p&gt;RAG is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stateless&lt;/li&gt;
&lt;li&gt;query-time&lt;/li&gt;
&lt;li&gt;recompute everything, every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM Wiki flips that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;do the expensive thinking once (ingest)&lt;/li&gt;
&lt;li&gt;resolve contradictions early&lt;/li&gt;
&lt;li&gt;build cross-references up front&lt;/li&gt;
&lt;li&gt;store the result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then query the artifact.&lt;/p&gt;

&lt;p&gt;That’s not retrieval.&lt;/p&gt;

&lt;p&gt;That’s compilation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The wiki is a &lt;strong&gt;persistent, compounding artifact&lt;/strong&gt;. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that idea landed because it mirrors how developers already think.&lt;/p&gt;

&lt;p&gt;You don’t recompile your entire codebase on every function call.&lt;/p&gt;

&lt;p&gt;You compile once.&lt;br&gt;
You run cheap.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Architecture (Why This Works at All)
&lt;/h2&gt;

&lt;p&gt;Three layers:&lt;/p&gt;
&lt;h3&gt;
  
  
  Raw Sources
&lt;/h3&gt;

&lt;p&gt;Immutable truth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDFs&lt;/li&gt;
&lt;li&gt;repos&lt;/li&gt;
&lt;li&gt;transcripts&lt;/li&gt;
&lt;li&gt;research&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Wiki
&lt;/h3&gt;

&lt;p&gt;LLM-owned markdown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;synthesized&lt;/li&gt;
&lt;li&gt;structured&lt;/li&gt;
&lt;li&gt;cross-linked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what you paid to build.&lt;/p&gt;
&lt;h3&gt;
  
  
  Schema
&lt;/h3&gt;

&lt;p&gt;The discipline layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingest rules&lt;/li&gt;
&lt;li&gt;page structure&lt;/li&gt;
&lt;li&gt;lint behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without it, you don’t have a system.&lt;br&gt;
You have a pile of files.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Three Operations That Matter
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Ingest
&lt;/h3&gt;

&lt;p&gt;New source → updates multiple pages (often 10–15)&lt;/p&gt;

&lt;p&gt;Expensive? Yes.&lt;br&gt;
Correct? Also yes.&lt;/p&gt;

&lt;p&gt;You’re building connections up front instead of rediscovering them forever.&lt;/p&gt;


&lt;h3&gt;
  
  
  Query
&lt;/h3&gt;

&lt;p&gt;Ask → route through wiki → optionally write back&lt;/p&gt;

&lt;p&gt;Every query improves the system.&lt;/p&gt;


&lt;h3&gt;
  
  
  Lint
&lt;/h3&gt;

&lt;p&gt;Detect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orphan pages&lt;/li&gt;
&lt;li&gt;stale knowledge&lt;/li&gt;
&lt;li&gt;contradictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Karpathy’s observation holds:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The tedious part is the bookkeeping.” &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;LLMs are extremely good at bookkeeping.&lt;/p&gt;


&lt;h2&gt;
  
  
  Where It Starts to Crack
&lt;/h2&gt;

&lt;p&gt;The failure mode isn’t theoretical. It’s structural.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. &lt;code&gt;index.md&lt;/code&gt; Becomes a Liability
&lt;/h3&gt;

&lt;p&gt;Navigation assumes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;load the index → navigate&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That works until it doesn’t.&lt;/p&gt;

&lt;p&gt;In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~50K–100K tokens → starts breaking&lt;/li&gt;
&lt;li&gt;beyond that → unreliable navigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;truncate it (lose coverage), or&lt;/li&gt;
&lt;li&gt;load it (lose quality)&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  2. Long Context Isn’t Actually Long
&lt;/h3&gt;

&lt;p&gt;Marketing says million-token windows.&lt;/p&gt;

&lt;p&gt;Reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~200K–300K → quality degradation begins&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Symptoms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missed links&lt;/li&gt;
&lt;li&gt;weaker synthesis&lt;/li&gt;
&lt;li&gt;subtle drift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing crashes. It just gets worse.&lt;/p&gt;


&lt;h3&gt;
  
  
  3. Maintenance Cost Compounds Too
&lt;/h3&gt;

&lt;p&gt;Each ingest touches multiple pages.&lt;/p&gt;

&lt;p&gt;That’s correct behavior.&lt;/p&gt;

&lt;p&gt;It also means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more tokens&lt;/li&gt;
&lt;li&gt;more updates&lt;/li&gt;
&lt;li&gt;more cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You built a compounding asset.&lt;/p&gt;

&lt;p&gt;You also built a compounding bill.&lt;/p&gt;


&lt;h3&gt;
  
  
  4. The Irony
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;You didn’t eliminate retrieval.&lt;br&gt;
You postponed it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At scale, your wiki becomes the corpus.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Numbers (This Is Where It Gets Real)
&lt;/h2&gt;

&lt;p&gt;From the jDocMunch Wiki benchmark:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Corpus: 7 pages, 7,449 tokens &lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Results -- Full Wiki Baseline (Realistic)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;jDocMunch&lt;/th&gt;
&lt;th&gt;Saved&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cross repository dependency tracking&lt;/td&gt;
&lt;td&gt;7,449&lt;/td&gt;
&lt;td&gt;599&lt;/td&gt;
&lt;td&gt;6,850&lt;/td&gt;
&lt;td&gt;92.0%&lt;/td&gt;
&lt;td&gt;12.4x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;benchmark token reduction measurement&lt;/td&gt;
&lt;td&gt;7,449&lt;/td&gt;
&lt;td&gt;314&lt;/td&gt;
&lt;td&gt;7,135&lt;/td&gt;
&lt;td&gt;95.8%&lt;/td&gt;
&lt;td&gt;23.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;search scoring ranking debug&lt;/td&gt;
&lt;td&gt;7,449&lt;/td&gt;
&lt;td&gt;344&lt;/td&gt;
&lt;td&gt;7,105&lt;/td&gt;
&lt;td&gt;95.4%&lt;/td&gt;
&lt;td&gt;21.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;incremental indexing blob SHA performance&lt;/td&gt;
&lt;td&gt;7,449&lt;/td&gt;
&lt;td&gt;313&lt;/td&gt;
&lt;td&gt;7,136&lt;/td&gt;
&lt;td&gt;95.8%&lt;/td&gt;
&lt;td&gt;23.8x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;context bundle symbol imports&lt;/td&gt;
&lt;td&gt;7,449&lt;/td&gt;
&lt;td&gt;304&lt;/td&gt;
&lt;td&gt;7,145&lt;/td&gt;
&lt;td&gt;95.9%&lt;/td&gt;
&lt;td&gt;24.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total (5 queries)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;37,245&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,874&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;35,371&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;95.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;19.9x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Results -- Single File Baseline (Conservative)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;jDocMunch&lt;/th&gt;
&lt;th&gt;Saved&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cross repository dependency tracking&lt;/td&gt;
&lt;td&gt;1,700&lt;/td&gt;
&lt;td&gt;599&lt;/td&gt;
&lt;td&gt;1,101&lt;/td&gt;
&lt;td&gt;64.8%&lt;/td&gt;
&lt;td&gt;2.8x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;benchmark token reduction measurement&lt;/td&gt;
&lt;td&gt;1,022&lt;/td&gt;
&lt;td&gt;314&lt;/td&gt;
&lt;td&gt;708&lt;/td&gt;
&lt;td&gt;69.3%&lt;/td&gt;
&lt;td&gt;3.3x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;search scoring ranking debug&lt;/td&gt;
&lt;td&gt;899&lt;/td&gt;
&lt;td&gt;344&lt;/td&gt;
&lt;td&gt;555&lt;/td&gt;
&lt;td&gt;61.7%&lt;/td&gt;
&lt;td&gt;2.6x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;incremental indexing blob SHA performance&lt;/td&gt;
&lt;td&gt;812&lt;/td&gt;
&lt;td&gt;313&lt;/td&gt;
&lt;td&gt;499&lt;/td&gt;
&lt;td&gt;61.5%&lt;/td&gt;
&lt;td&gt;2.6x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;context bundle symbol imports&lt;/td&gt;
&lt;td&gt;914&lt;/td&gt;
&lt;td&gt;304&lt;/td&gt;
&lt;td&gt;610&lt;/td&gt;
&lt;td&gt;66.7%&lt;/td&gt;
&lt;td&gt;3.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total (5 queries)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5,347&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,874&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3,473&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;65.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.9x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  Visual Reality
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Full Wiki: █████████████████████████████████████
jDocMunch: ██
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Same questions.&lt;br&gt;
Same corpus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;~95% less context.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Money Shot
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FULL WIKI:     37,245 tokens
jDocMunch:      1,874 tokens

SAVED:         35,371 tokens
REDUCTION:     95.0%
EFFICIENCY:    19.9×
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That’s not optimization.&lt;/p&gt;

&lt;p&gt;That’s a different class of system.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why This Happens (Not Magic, Just Mechanics)
&lt;/h2&gt;

&lt;p&gt;Baseline A (full wiki):    37,245 tokens across 5 queries&lt;br&gt;
jDocMunch workflow:         1,874 tokens across 5 queries&lt;br&gt;
                           ─────────────────────────────&lt;br&gt;
Saved:                     35,371 tokens (95.0%)&lt;br&gt;
Average ratio:             19.9x&lt;/p&gt;

&lt;p&gt;Baseline B (target file):   5,347 tokens across 5 queries&lt;br&gt;
jDocMunch workflow:         1,874 tokens across 5 queries&lt;br&gt;
                           ─────────────────────────────&lt;br&gt;
Saved:                      3,473 tokens (65.0%)&lt;br&gt;
Average ratio:              2.9x&lt;/p&gt;

&lt;p&gt;You’re not compressing data.&lt;/p&gt;

&lt;p&gt;You’re avoiding loading irrelevant data.&lt;/p&gt;

&lt;p&gt;That’s it.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Structural Shift
&lt;/h2&gt;

&lt;p&gt;Old world:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;cost ∝ wiki size&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;New world:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;cost ∝ answer complexity&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the entire advantage.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Fix: Stop Loading the Wiki
&lt;/h2&gt;

&lt;p&gt;The mistake isn’t your structure.&lt;/p&gt;

&lt;p&gt;It’s your access pattern.&lt;/p&gt;

&lt;p&gt;You’re treating the wiki like a document.&lt;/p&gt;

&lt;p&gt;It’s not.&lt;/p&gt;

&lt;p&gt;It’s a dataset.&lt;/p&gt;


&lt;h2&gt;
  
  
  Enter jDocMunch (Right Where You Need It)
&lt;/h2&gt;

&lt;p&gt;jDocMunch doesn’t replace the wiki.&lt;/p&gt;

&lt;p&gt;It fixes the exact place it breaks.&lt;/p&gt;


&lt;h2&gt;
  
  
  What It Actually Does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Parses docs into &lt;strong&gt;sections&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Stores &lt;strong&gt;byte offsets&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;search_sections&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_section&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two calls.&lt;/p&gt;

&lt;p&gt;No full loads.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why It Works
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Old&lt;/th&gt;
&lt;th&gt;New&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Load index into context&lt;/td&gt;
&lt;td&gt;Search index on disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Navigate in prompt&lt;/td&gt;
&lt;td&gt;Retrieve exact section&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost scales with size&lt;/td&gt;
&lt;td&gt;Cost scales with answer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Even the “Fair” Comparison Still Wins
&lt;/h2&gt;

&lt;p&gt;Conservative baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Single File:   5,347 tokens
jDocMunch:     1,874 tokens

Reduction:     65%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even when you handicap the comparison…&lt;/p&gt;

&lt;p&gt;It still wins.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three-Minute Retrofit
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;jdocmunch-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MCP config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"jdocmunch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"jdocmunch-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Only Behavior Change That Matters
&lt;/h2&gt;

&lt;p&gt;Stop doing this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Read index.md”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Start doing this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;search_sections&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_section&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The One-Line Mental Model
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Your wiki is not a document.&lt;br&gt;
It’s a database.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Stop loading it.&lt;br&gt;
Start querying it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Karpathy gave us something real.&lt;/p&gt;

&lt;p&gt;Not a tool.&lt;br&gt;
A pattern.&lt;/p&gt;

&lt;p&gt;And it’s a good one.&lt;/p&gt;

&lt;p&gt;But it has a ceiling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;context doesn’t scale&lt;/li&gt;
&lt;li&gt;tokens don’t forgive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you don’t add structured retrieval:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;your second brain becomes your biggest expense&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you do:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;you keep the compounding upside&lt;br&gt;
and kill the runaway token curve&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the difference between a clever idea…&lt;/p&gt;

&lt;p&gt;and a system that actually survives production.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>wiki</category>
      <category>karpathy</category>
      <category>tokens</category>
    </item>
    <item>
      <title>Symbols Not Chunks: 3.9x Less Tokens</title>
      <dc:creator>J. Gravelle</dc:creator>
      <pubDate>Sat, 28 Mar 2026 14:34:27 +0000</pubDate>
      <link>https://dev.to/jgravelle/symbols-not-chunks-39x-less-tokens-f6e</link>
      <guid>https://dev.to/jgravelle/symbols-not-chunks-39x-less-tokens-f6e</guid>
      <description>&lt;h3&gt;
  
  
  AST-Based Retrieval Cuts LLM Code Context 1.6 - 3.9x vs. LangChain RAG on Real Codebases
&lt;/h3&gt;

&lt;p&gt; &lt;br&gt;
&lt;em&gt;J. Gravelle&lt;br&gt;
March 2026&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Abstract
&lt;/h4&gt;

&lt;p&gt;Large language models (LLMs) consume tokens proportionally to the context they receive. When applied to code understanding tasks, the dominant retrieval strategy --- chunk-based Retrieval-Augmented Generation (RAG) using vector embeddings --- injects substantial irrelevant context, wastes tokens, and frequently delivers fragments that split functions mid-definition. This paper presents an alternative: AST-based symbol retrieval, which uses tree-sitter parsing to extract complete syntactic units (functions, classes, methods) and serves them via deterministic lookup. We benchmark both approaches on three open-source web frameworks (Express.js, FastAPI, Gin) totaling 1,214 files and 1,024,421 baseline tokens. In head-to-head comparison against a naive fixed-chunk RAG pipeline (LangChain + FAISS + MiniLM-L6-v2), AST retrieval uses &lt;strong&gt;1.6--3.9x fewer tokens per query&lt;/strong&gt; on every tested repository. Against a "read all files" lower-bound baseline, the reduction is 99.6% (263.9x). Two controlled A/B tests on a production Vue 3 codebase confirm 20% cost savings in end-to-end agentic workflows (p=0.0074), though with accuracy tradeoffs on fine-grained classification tasks. The structural advantage --- complete code units with no chunk boundary artifacts --- is orthogonal to the search mechanism and would apply equally to RAG pipelines that adopt symbol-level chunking. We argue that for code-specific retrieval, the retrieval unit should be the symbol, not the chunk.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Introduction
&lt;/h2&gt;

&lt;p&gt;The integration of LLMs into software engineering workflows --- code generation, review, debugging, refactoring --- has accelerated rapidly. A common pattern has emerged: the agent needs to understand an unfamiliar codebase, so it retrieves relevant code and injects it into the model's context window.&lt;/p&gt;

&lt;p&gt;The standard approach borrows from document retrieval: split source files into overlapping text chunks, embed them with a dense model, store the vectors in an index (typically FAISS or Chroma), and retrieve the top-k most similar chunks at query time. This is the RAG pattern, popularized by frameworks like LangChain, LlamaIndex, and Haystack.&lt;/p&gt;

&lt;p&gt;RAG works reasonably well for prose documents, where any contiguous passage may contain relevant information. Code, however, has structure that prose does not. A function is a complete unit of meaning. Half a function is noise. The question this paper investigates is straightforward: &lt;em&gt;what happens when we replace arbitrary text chunks with complete syntactic units as the retrieval granularity?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer, across three repositories in three languages, is that token consumption drops by 1.6--3.9x compared to a naive fixed-chunk RAG pipeline, with no embedding model, no vector store, and no chunk boundary artifacts. We use "naive" deliberately: the RAG baseline tested here uses a general-purpose embedding model and fixed-size chunking, not code-specific embeddings or AST-aware splitting. The comparison is against a common starting point, not a fully optimized pipeline.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Problem Statement
&lt;/h2&gt;
&lt;h3&gt;
  
  
  2.1 The Token Cost Problem
&lt;/h3&gt;

&lt;p&gt;LLM API pricing is per-token. Context window size is finite. Both constraints create pressure to minimize irrelevant context. Yet the standard RAG pipeline is structurally biased toward over-retrieval:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fixed chunk size forces a precision/recall tradeoff.&lt;/strong&gt; Small chunks (512 tokens) reduce per-result noise but split functions mid-definition. Large chunks (2048 tokens) preserve more structure but include unrelated code from adjacent definitions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Top-k retrieval returns a fixed number of results regardless of query specificity.&lt;/strong&gt; A query matching one function still returns k chunks, most of which are noise.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The search-then-fetch pattern double-counts tokens.&lt;/strong&gt; A typical workflow retrieves k results for inspection, then "fetches" the top n for the LLM. The top n appear in both the search response and the fetch response, inflating the effective token count.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  2.2 The Chunk Boundary Problem
&lt;/h3&gt;

&lt;p&gt;Consider a Python file with three functions, each ~400 tokens. A 512-token chunker produces chunks that look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Chunk 1:  [end of function A] [start of function B ... truncated]
Chunk 2:  [... middle of function B ...] [start of function C]
Chunk 3:  [... end of function C] [module-level code]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An LLM receiving Chunk 1 gets the tail of one function and the head of another. It has no reliable way to determine where one definition ends and another begins. This is not a theoretical concern --- our measurements show that &lt;strong&gt;53% of retrieved RAG-512 chunks for FastAPI are split mid-function&lt;/strong&gt; (Section 7.3).&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Scaling Behavior
&lt;/h3&gt;

&lt;p&gt;As codebases grow, the problem compounds. A 951-file repository like FastAPI produces 2,256 chunks at 512-token granularity. The embedding step alone takes 47 seconds. Query latency, while acceptable (12--36 ms), is orders of magnitude slower than an in-process BM25 lookup (&amp;lt;5 ms). The vector index occupies 7.5 MB on disk --- modest in absolute terms, but unnecessary if the retrieval unit can be derived from the source structure directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 RAG for Code
&lt;/h3&gt;

&lt;p&gt;RAG (Retrieval-Augmented Generation) augments an LLM's fixed training knowledge with dynamically retrieved context. For code, the standard pipeline is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chunking.&lt;/strong&gt; Source files are split into fixed-size token windows (typically 256--2048 tokens) with overlap (5--15%) to mitigate boundary effects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding.&lt;/strong&gt; Each chunk is passed through a dense embedding model (e.g., &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt;, &lt;code&gt;text-embedding-3-small&lt;/code&gt;) to produce a vector representation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexing.&lt;/strong&gt; Vectors are stored in an approximate nearest-neighbor index (FAISS, Chroma, Pinecone).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval.&lt;/strong&gt; At query time, the query is embedded and the top-k nearest chunks are returned.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pipeline was designed for prose documents and adapted for code. The adaptation is imperfect: code has syntactic structure (functions, classes, modules) that prose does not, and that structure is semantically meaningful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic and AST-aware chunking.&lt;/strong&gt; The RAG ecosystem has recognized the fixed-chunk limitation. LangChain, LlamaIndex, and other frameworks offer &lt;em&gt;semantic chunking&lt;/em&gt; (split at natural breakpoints detected by embedding similarity shifts) and &lt;em&gt;AST-aware chunking&lt;/em&gt; (split at function or class boundaries using a parser). AST-aware chunking in particular eliminates the chunk boundary problem described in Section 2.2. We did not benchmark these strategies --- doing so would require choosing among multiple implementations with different heuristics, and the comparison would conflate chunking strategy with embedding model quality. We note, however, that AST-aware chunking and AST symbol retrieval share the same core insight: the retrieval unit for code should align with syntactic boundaries. The remaining difference is the search mechanism (embedding similarity vs. BM25) and the retrieval interface (opaque chunks vs. structured symbol metadata). Section 8 discusses this distinction further.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Context Window Constraints
&lt;/h3&gt;

&lt;p&gt;Modern LLMs offer context windows ranging from 128K to 2M tokens. A naive approach --- load the entire codebase --- is feasible for small projects but fails quickly. A 951-file Python framework tokenizes to ~700K tokens. A production monorepo can easily exceed 10M tokens. Even where the window is large enough, longer contexts degrade attention quality, increase latency, and cost proportionally more.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 Tree-Sitter and AST Parsing
&lt;/h3&gt;

&lt;p&gt;Tree-sitter is an incremental parsing framework that produces concrete syntax trees for source code in ~40 languages. Unlike regex-based heuristics, tree-sitter parsing is grammar-driven: it identifies functions, classes, methods, type definitions, and other syntactic constructs with the same precision as the language's own compiler front-end. Parse time is typically sub-second for single files and under 15 seconds for a 951-file repository.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Approach: AST Symbol Retrieval
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Core Idea
&lt;/h3&gt;

&lt;p&gt;Instead of chunking source files into arbitrary token windows, parse them into their natural syntactic units: functions, classes, methods, type definitions. Index these &lt;strong&gt;symbols&lt;/strong&gt; by name, qualified name, and file path. At query time, search the symbol index (not a vector index) and return the complete source code of matched symbols.&lt;/p&gt;

&lt;p&gt;The retrieval unit is no longer a 512-token fragment of unknown provenance. It is a complete, self-contained definition --- the exact code the developer would navigate to in an IDE.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Indexing Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Source files  →  tree-sitter parse  →  symbol extraction  →  BM25 index + SQLite store
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each file:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Detect language from file extension.&lt;/li&gt;
&lt;li&gt;Parse with the appropriate tree-sitter grammar.&lt;/li&gt;
&lt;li&gt;Walk the AST to extract top-level and nested symbols (functions, classes, methods, type aliases, constants).&lt;/li&gt;
&lt;li&gt;Store each symbol's metadata (name, qualified name, kind, file path, line range) and full source text in a SQLite database.&lt;/li&gt;
&lt;li&gt;Build a BM25 inverted index over symbol names and qualified names.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The entire pipeline is deterministic. No embedding model is involved. No GPU is required. Index build time scales linearly with file count: &amp;lt;1 second for 98 files (Gin), ~5--15 seconds for 951 files (FastAPI).&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 Retrieval Workflow
&lt;/h3&gt;

&lt;p&gt;The retrieval workflow mirrors the discover/search/retrieve pattern common in code exploration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query: "middleware"
  ↓
Step 1: search_symbols("middleware", max_results=5)
  → Returns ranked symbol metadata: name, kind, file, line range, score
  → Token cost: ~370 tokens (metadata only, not full source)
  ↓
Step 2: get_symbol_source(top_3_symbol_ids)
  → Returns complete source code of the 3 best-matching symbols
  → Token cost: ~640 tokens (3 complete function bodies)
  ↓
Total: ~1,010 tokens
Baseline (all files): 137,978 tokens
Reduction: 99.3%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three properties distinguish this from RAG retrieval:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The search step returns metadata, not source.&lt;/strong&gt; The LLM (or agent) can inspect symbol names, kinds, and file locations before deciding which symbols to retrieve in full. This is analogous to scanning a table of contents before reading chapters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The retrieve step returns complete syntactic units.&lt;/strong&gt; Every result starts at a definition boundary and ends at the matching closing brace or dedent. There are no mid-function fragments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Result count is adaptive.&lt;/strong&gt; If a query matches one symbol strongly, the agent retrieves one symbol. RAG always returns k chunks regardless of query specificity.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  4.4 Stable Symbol Identifiers
&lt;/h3&gt;

&lt;p&gt;Each symbol receives a deterministic identifier derived from its repository, file path, and qualified name. This ID is stable across reindexing (unless the symbol is renamed or moved). Stable IDs enable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Caching.&lt;/strong&gt; A previously retrieved symbol can be recognized without re-fetching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-reference.&lt;/strong&gt; Import graphs, call hierarchies, and blast radius analysis can reference symbols by ID.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental updates.&lt;/strong&gt; When a file changes, only its symbols are re-extracted. The rest of the index is untouched.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. Implementation Overview
&lt;/h2&gt;

&lt;p&gt;The implementation described here uses tree-sitter grammars for 25+ languages, a SQLite-backed symbol store, and BM25 for text search. The system runs as an MCP (Model Context Protocol) server, exposing tools that LLM agents call directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Language Support
&lt;/h3&gt;

&lt;p&gt;Symbol extraction is grammar-driven. Each supported language has a tree-sitter grammar and an extraction spec that maps AST node types to symbol kinds:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language family&lt;/th&gt;
&lt;th&gt;Languages&lt;/th&gt;
&lt;th&gt;Symbol kinds extracted&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;C-like&lt;/td&gt;
&lt;td&gt;C, C++, C#, Java, Go, Rust, Swift, Kotlin&lt;/td&gt;
&lt;td&gt;functions, methods, classes, structs, interfaces, enums, type aliases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic&lt;/td&gt;
&lt;td&gt;Python, JavaScript, TypeScript, Ruby, PHP, Lua&lt;/td&gt;
&lt;td&gt;functions, methods, classes, decorators, module-level assignments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Functional&lt;/td&gt;
&lt;td&gt;Haskell, Scala, Erlang, R, Julia&lt;/td&gt;
&lt;td&gt;functions, type classes, data types, modules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markup/Config&lt;/td&gt;
&lt;td&gt;SQL, TOML, CSS, Bash&lt;/td&gt;
&lt;td&gt;definitions, sections, rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialized&lt;/td&gt;
&lt;td&gt;Vue SFC, Razor (&lt;code&gt;.cshtml&lt;/code&gt;), Assembly&lt;/td&gt;
&lt;td&gt;component APIs, code blocks, labels/macros&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Custom extractors exist for languages where tree-sitter grammars lack clean named fields (Erlang: multi-clause function merging by arity; Fortran: module-qualified names; SQL: dbt Jinja preprocessing).&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Storage and Index Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.code-index/
  &amp;lt;repo-hash&amp;gt;/
    index.db          # SQLite: symbols table (name, kind, file, lines, source)
                      #         files table (path, hash, size_bytes)
                      #         imports table (file, specifier, resolved_path)
    content/          # Raw source files (for full-file retrieval)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SQLite schema supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;O(1) symbol lookup&lt;/strong&gt; by ID (hash index built in &lt;code&gt;__post_init__&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BM25 search&lt;/strong&gt; over symbol names with optional language and file filters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Import graph queries&lt;/strong&gt; for cross-reference tools (find_importers, blast_radius, dead_code).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental updates&lt;/strong&gt; via content hashing --- only changed files are re-parsed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5.3 Integration with LLM Workflows
&lt;/h3&gt;

&lt;p&gt;The retrieval tools are exposed via MCP (Model Context Protocol), the open standard for LLM tool integration. An agent's interaction looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;search_symbols&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;router route handler&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;symbols&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;route&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lib/router/index.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;]},...]}&lt;/span&gt;

&lt;span class="nl"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;get_symbol_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Router.prototype.route = function route(path) {&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  ...full body...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;};&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;route&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent receives exactly the code it needs --- a complete function definition --- without reading the entire file or receiving adjacent, unrelated code.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Benchmark Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6.1 Repositories Under Test
&lt;/h3&gt;

&lt;p&gt;Three public web frameworks spanning three languages, chosen for structural diversity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Files indexed&lt;/th&gt;
&lt;th&gt;Symbols extracted&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;expressjs/express&lt;/td&gt;
&lt;td&gt;JavaScript&lt;/td&gt;
&lt;td&gt;165&lt;/td&gt;
&lt;td&gt;181&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fastapi/fastapi&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;951&lt;/td&gt;
&lt;td&gt;5,325&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin-gonic/gin&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;td&gt;1,489&lt;/td&gt;
&lt;td&gt;187,018&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Baseline tokens&lt;/strong&gt; = all indexed source files concatenated and tokenized with &lt;code&gt;tiktoken&lt;/code&gt; &lt;code&gt;cl100k_base&lt;/code&gt;. This is the minimum cost for an agent that reads every file once. Real agents typically read files multiple times, making this a conservative baseline.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.2 Query Corpus
&lt;/h3&gt;

&lt;p&gt;Five queries representing common code exploration intents, defined in a public &lt;code&gt;tasks.json&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Intent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;router route handler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Core route registration / dispatch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;middleware&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Middleware chaining and execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;error exception&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Error handling and exception propagation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Request/response object definitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;context bind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Context creation and parameter binding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each query is run against each repository, producing 15 task-runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.3 RAG Configuration
&lt;/h3&gt;

&lt;p&gt;The RAG baseline uses a naive LangChain pipeline --- deliberately unoptimized, representing a common starting point rather than a production-tuned system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings:&lt;/strong&gt; &lt;code&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt; (384-dim, local inference)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector store:&lt;/strong&gt; FAISS (&lt;code&gt;faiss-cpu&lt;/code&gt;, in-memory)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Splitter:&lt;/strong&gt; &lt;code&gt;RecursiveCharacterTextSplitter.from_tiktoken_encoder&lt;/code&gt; (true token-based chunks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk sizes:&lt;/strong&gt; 512, 1024, 2048 tokens with ~10% overlap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval:&lt;/strong&gt; &lt;code&gt;similarity_search(query, k=5)&lt;/code&gt;, top 3 used as "fetched"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Token counting: &lt;code&gt;search_tokens&lt;/code&gt; (all 5 retrieved chunks serialized) + &lt;code&gt;fetch_tokens&lt;/code&gt; (top 3 chunks serialized). This mirrors the AST workflow's &lt;code&gt;search_symbols&lt;/code&gt; + &lt;code&gt;get_symbol_source&lt;/code&gt; two-step pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is not tested.&lt;/strong&gt; The RAG baseline does not use code-specific embedding models (CodeBERT, Voyage Code, StarEncoder), re-ranking passes (Cohere Rerank, cross-encoder), hybrid search (BM25 + dense), or AST-aware chunking. Any of these would likely improve RAG's token efficiency. The results in Section 7 should be read as "AST retrieval vs. naive RAG," not "AST retrieval vs. best-possible RAG."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Double-counting note.&lt;/strong&gt; The two-step token accounting (search 5 + fetch 3) means the top 3 chunks are counted in both passes. A simpler RAG workflow that calls &lt;code&gt;similarity_search(k=3)&lt;/code&gt; and uses the results directly would avoid this overhead. We chose the two-step structure to mirror the AST workflow's metadata-then-source pattern, making the comparison structurally parallel. This inflates RAG's token count by roughly 30--40% relative to a single-pass retrieval. The 1.6--3.9x margin would narrow under single-pass accounting, though AST retrieval would still be more efficient due to the metadata-vs-source asymmetry in the search step.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.4 AST Configuration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parser:&lt;/strong&gt; tree-sitter (language-specific grammars)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search:&lt;/strong&gt; BM25 over symbol names, &lt;code&gt;max_results=5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fetch:&lt;/strong&gt; &lt;code&gt;get_symbol_source&lt;/code&gt; on top 3 symbol IDs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token counting:&lt;/strong&gt; search response tokens + 3 x symbol source tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI summaries were disabled during benchmarking (signature-only fallback).&lt;/p&gt;

&lt;h3&gt;
  
  
  6.5 Reproducibility
&lt;/h3&gt;

&lt;p&gt;Both harnesses read file content from the same &lt;code&gt;IndexStore&lt;/code&gt; instance (&lt;code&gt;IndexStore.load_index() → index.source_files&lt;/code&gt;). Baselines are identical by construction. The harness scripts (&lt;code&gt;run_benchmark.py&lt;/code&gt;, &lt;code&gt;run_rag_baseline.py&lt;/code&gt;), query corpus (&lt;code&gt;tasks.json&lt;/code&gt;), and raw results (&lt;code&gt;rag_baseline_results.json&lt;/code&gt;) are open source.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  7.1 Token Efficiency: AST Retrieval
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;th&gt;AST avg/query&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;expressjs/express&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;td&gt;924&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;td&gt;150.1x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fastapi/fastapi&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;td&gt;1,834&lt;/td&gt;
&lt;td&gt;99.8%&lt;/td&gt;
&lt;td&gt;531.2x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin-gonic/gin&lt;/td&gt;
&lt;td&gt;187,018&lt;/td&gt;
&lt;td&gt;1,124&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;td&gt;171.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Grand total (15 runs)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5,122,105&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;19,406&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.6%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;263.9x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Per-query detail (Express.js):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;AST tokens&lt;/th&gt;
&lt;th&gt;Search&lt;/th&gt;
&lt;th&gt;Fetch&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;router route handler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;td&gt;886&lt;/td&gt;
&lt;td&gt;381&lt;/td&gt;
&lt;td&gt;505&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;middleware&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;td&gt;1,008&lt;/td&gt;
&lt;td&gt;370&lt;/td&gt;
&lt;td&gt;638&lt;/td&gt;
&lt;td&gt;99.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;error exception&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;td&gt;859&lt;/td&gt;
&lt;td&gt;362&lt;/td&gt;
&lt;td&gt;497&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;td&gt;872&lt;/td&gt;
&lt;td&gt;372&lt;/td&gt;
&lt;td&gt;500&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;context bind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;td&gt;993&lt;/td&gt;
&lt;td&gt;372&lt;/td&gt;
&lt;td&gt;621&lt;/td&gt;
&lt;td&gt;99.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Per-query detail (FastAPI):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;AST tokens&lt;/th&gt;
&lt;th&gt;Search&lt;/th&gt;
&lt;th&gt;Fetch&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;router route handler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;td&gt;1,199&lt;/td&gt;
&lt;td&gt;464&lt;/td&gt;
&lt;td&gt;735&lt;/td&gt;
&lt;td&gt;99.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;middleware&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;td&gt;1,643&lt;/td&gt;
&lt;td&gt;460&lt;/td&gt;
&lt;td&gt;1,183&lt;/td&gt;
&lt;td&gt;99.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;error exception&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;td&gt;873&lt;/td&gt;
&lt;td&gt;383&lt;/td&gt;
&lt;td&gt;490&lt;/td&gt;
&lt;td&gt;99.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;td&gt;4,439&lt;/td&gt;
&lt;td&gt;430&lt;/td&gt;
&lt;td&gt;4,009&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;context bind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;td&gt;1,016&lt;/td&gt;
&lt;td&gt;402&lt;/td&gt;
&lt;td&gt;614&lt;/td&gt;
&lt;td&gt;99.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  7.2 Token Efficiency: RAG Baseline
&lt;/h3&gt;

&lt;p&gt;Best-performing RAG configuration per repo (RAG-512 in all cases):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;th&gt;RAG-512 avg/query&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;expressjs/express&lt;/td&gt;
&lt;td&gt;137,978&lt;/td&gt;
&lt;td&gt;2,887&lt;/td&gt;
&lt;td&gt;97.9%&lt;/td&gt;
&lt;td&gt;56.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fastapi/fastapi&lt;/td&gt;
&lt;td&gt;699,425&lt;/td&gt;
&lt;td&gt;2,850&lt;/td&gt;
&lt;td&gt;99.6%&lt;/td&gt;
&lt;td&gt;248.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin-gonic/gin&lt;/td&gt;
&lt;td&gt;187,018&lt;/td&gt;
&lt;td&gt;4,352&lt;/td&gt;
&lt;td&gt;97.7%&lt;/td&gt;
&lt;td&gt;43.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;RAG token consumption increases with chunk size:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;RAG-512 avg&lt;/th&gt;
&lt;th&gt;RAG-1024 avg&lt;/th&gt;
&lt;th&gt;RAG-2048 avg&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;expressjs/express&lt;/td&gt;
&lt;td&gt;2,887&lt;/td&gt;
&lt;td&gt;6,023&lt;/td&gt;
&lt;td&gt;7,057&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fastapi/fastapi&lt;/td&gt;
&lt;td&gt;2,850&lt;/td&gt;
&lt;td&gt;4,279&lt;/td&gt;
&lt;td&gt;5,512&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin-gonic/gin&lt;/td&gt;
&lt;td&gt;4,352&lt;/td&gt;
&lt;td&gt;7,539&lt;/td&gt;
&lt;td&gt;12,850&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  7.3 Head-to-Head Comparison
&lt;/h3&gt;

&lt;p&gt;Both harnesses ran back-to-back on 2026-03-28 against the same index state.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;Best RAG avg/query&lt;/th&gt;
&lt;th&gt;AST avg/query&lt;/th&gt;
&lt;th&gt;AST advantage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;expressjs/express&lt;/td&gt;
&lt;td&gt;2,887 (RAG-512)&lt;/td&gt;
&lt;td&gt;924&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.1x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fastapi/fastapi&lt;/td&gt;
&lt;td&gt;2,850 (RAG-512)&lt;/td&gt;
&lt;td&gt;1,834&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.6x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin-gonic/gin&lt;/td&gt;
&lt;td&gt;4,352 (RAG-512)&lt;/td&gt;
&lt;td&gt;1,124&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.9x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;AST retrieval uses fewer tokens on every tested repository. The margin ranges from 1.6x (FastAPI) to 3.9x (Gin). The FastAPI result is notable: this is the largest repo (951 files, 5,325 symbols), where dense embedding retrieval might be expected to have an advantage. It does not --- BM25 over symbol names plus selective source retrieval still outperforms vector similarity over text chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.4 Chunk Integrity
&lt;/h3&gt;

&lt;p&gt;The "complete chunk" rate measures how often a retrieved RAG chunk starts at a definition boundary and has balanced braces/indentation. The "split" rate measures how often a chunk is cut mid-function.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;RAG-512 complete&lt;/th&gt;
&lt;th&gt;RAG-512 split&lt;/th&gt;
&lt;th&gt;RAG-1024 split&lt;/th&gt;
&lt;th&gt;RAG-2048 split&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;expressjs/express&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fastapi/fastapi&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;53%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;40%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;33%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin-gonic/gin&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;FastAPI's high split rate at 512-token chunks is a direct consequence of its code structure: many functions exceed 512 tokens, so the chunker cuts them. Increasing chunk size reduces splits but does not eliminate them, and increases token cost per retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AST retrieval produces zero split results by construction.&lt;/strong&gt; Every returned symbol is a complete AST node --- a function, class, or method with full source from definition to closing delimiter.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.5 Infrastructure Overhead
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;th&gt;AST&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Embedding model download&lt;/td&gt;
&lt;td&gt;~90 MB (one-time)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime dependencies&lt;/td&gt;
&lt;td&gt;LangChain + FAISS + sentence-transformers + torch (~1 GB)&lt;/td&gt;
&lt;td&gt;tiktoken only (for benchmarking)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index build (FastAPI, 951 files)&lt;/td&gt;
&lt;td&gt;23--49s (embedding-dominated)&lt;/td&gt;
&lt;td&gt;5--15s (tree-sitter parse)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index build (Express, 165 files)&lt;/td&gt;
&lt;td&gt;6s&lt;/td&gt;
&lt;td&gt;&amp;lt;1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAISS index size (FastAPI, 512)&lt;/td&gt;
&lt;td&gt;7,556 KB&lt;/td&gt;
&lt;td&gt;~few hundred KB (SQLite)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query latency&lt;/td&gt;
&lt;td&gt;12--36 ms&lt;/td&gt;
&lt;td&gt;&amp;lt;5 ms (BM25 in-process)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The embedding step is the dominant cost in the RAG pipeline. For a 951-file repository, building the 512-token FAISS index requires ~47 seconds of CPU embedding time. The AST pipeline parses the same files in 5--15 seconds with no model inference.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.6 End-to-End A/B Tests
&lt;/h3&gt;

&lt;p&gt;Two controlled experiments were conducted on a production Vue 3 + Firebase codebase to measure real-world impact beyond synthetic benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test 1: Naming audit (50 iterations, Claude Sonnet 4.6).&lt;/strong&gt; Each iteration scanned source files for misleading names, then applied fixes via three-subagent consensus.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Native tools (Grep/Glob/Read)&lt;/th&gt;
&lt;th&gt;AST retrieval&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Success rate&lt;/td&gt;
&lt;td&gt;72%&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;+8 pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Timeout rate&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;td&gt;32%&lt;/td&gt;
&lt;td&gt;-8 pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean cost/iteration&lt;/td&gt;
&lt;td&gt;$0.783&lt;/td&gt;
&lt;td&gt;$0.738&lt;/td&gt;
&lt;td&gt;-5.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean cache creation tokens&lt;/td&gt;
&lt;td&gt;104,135&lt;/td&gt;
&lt;td&gt;93,178&lt;/td&gt;
&lt;td&gt;-10.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Isolated tool-layer savings (controlling for fixed subagent overhead): &lt;strong&gt;15--25%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test 2: Dead code detection (50 iterations, Claude Sonnet 4.6).&lt;/strong&gt; Pure tool-layer cost measurement with no subagent overhead.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Native tools&lt;/th&gt;
&lt;th&gt;AST retrieval&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Success rate&lt;/td&gt;
&lt;td&gt;96%&lt;/td&gt;
&lt;td&gt;92%&lt;/td&gt;
&lt;td&gt;-4 pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean cost/iteration&lt;/td&gt;
&lt;td&gt;$0.4474&lt;/td&gt;
&lt;td&gt;$0.3560&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-20.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean total tokens&lt;/td&gt;
&lt;td&gt;449,356&lt;/td&gt;
&lt;td&gt;289,275&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-36%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 20% cost reduction is statistically significant (Wilcoxon p=0.0074, Cohen's d=-0.583).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accuracy tradeoff.&lt;/strong&gt; The cost savings came with measurable accuracy degradation on fine-grained tasks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;F1 metric&lt;/th&gt;
&lt;th&gt;Native tools&lt;/th&gt;
&lt;th&gt;AST retrieval&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dead files (all exports unused)&lt;/td&gt;
&lt;td&gt;95.8%&lt;/td&gt;
&lt;td&gt;95.7%&lt;/td&gt;
&lt;td&gt;equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alive files (with some dead exports)&lt;/td&gt;
&lt;td&gt;100.0%&lt;/td&gt;
&lt;td&gt;69.6%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-30.4 pp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Export-level (individual export liveness)&lt;/td&gt;
&lt;td&gt;93.3%&lt;/td&gt;
&lt;td&gt;64.1%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-29.2 pp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dead-file detection --- the coarsest classification --- was equivalent. But alive-file classification and individual export liveness were significantly worse with AST retrieval. Root cause analysis (detailed in the full report) identified three factors: (1) the JS import extractor missed dynamic &lt;code&gt;import()&lt;/code&gt; calls (fixed in v1.8.1), (2) the agent's strategy stopped at file-level liveness without verifying individual exports, and (3) neither variant followed transitive dead-code chains (fixed in v1.8.3). Two of the three gaps were tool bugs subsequently fixed; the third was a task-framing issue.&lt;/p&gt;

&lt;p&gt;The honest summary: AST retrieval is cheaper but not uniformly better. For tasks requiring file-level "is this dead?" decisions, accuracy is equivalent at 20% lower cost. For tasks requiring export-level granularity, the agent's retrieval strategy must be more deliberate --- the tool provides the capability (&lt;code&gt;find_references&lt;/code&gt; returns zero results for unused exports), but the agent did not use it consistently.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  8.1 Why AST Retrieval Uses Fewer Tokens
&lt;/h3&gt;

&lt;p&gt;Three mechanisms contribute:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No irrelevant context per result.&lt;/strong&gt; A RAG chunk at any size includes code before and after the relevant definition. A symbol result includes only the definition itself. The average AST fetch returns 200--600 tokens of source per symbol; RAG-512 returns ~500 tokens per chunk, but 3--5 of the 5 chunks typically contain irrelevant code that happens to share embedding-space proximity with the query.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The search step is cheaper.&lt;/strong&gt; AST search returns symbol metadata (~370 tokens for 5 results): name, kind, file, line range. RAG search returns the full text of 5 chunks (~1,800--2,900 tokens for 5 results). The metadata-first approach lets the agent make retrieval decisions before paying the full-source cost.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Metadata/source separation.&lt;/strong&gt; In the AST workflow, the search step returns compact metadata (~370 tokens for 5 results) and the fetch step returns full source. No content is transmitted twice. In the RAG workflow as measured here, the top 3 chunks appear in both the search response (5 chunks) and the fetch response (3 chunks). This is a measurement artifact of our two-step accounting, not an inherent RAG limitation --- a single-pass &lt;code&gt;similarity_search(k=3)&lt;/code&gt; pipeline would avoid it. We discuss the impact of this accounting choice in Section 6.3.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  8.2 Confounded Variables: Unit vs. Search Mechanism
&lt;/h3&gt;

&lt;p&gt;This benchmark varies two things simultaneously: the &lt;strong&gt;retrieval unit&lt;/strong&gt; (chunk vs. symbol) and the &lt;strong&gt;search mechanism&lt;/strong&gt; (embedding similarity vs. BM25). We attribute the advantage primarily to the retrieval unit, but we have not isolated the two variables. Two unrun experiments would help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding search over AST symbols.&lt;/strong&gt; Use the same symbol-level retrieval units, but search by embedding similarity instead of BM25. If results are comparable, the retrieval unit is the dominant factor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BM25 search over fixed-size chunks.&lt;/strong&gt; Use the same chunk-based retrieval, but search by BM25 instead of embedding similarity. If BM25-over-chunks approaches AST retrieval's efficiency, the search mechanism is the dominant factor.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We suspect the retrieval unit is the larger contributor --- the metadata-vs-source asymmetry in the search step and the absence of irrelevant context per result are structural properties of symbol-level retrieval, independent of how symbols are ranked. But without these controls, we cannot claim this definitively.&lt;/p&gt;

&lt;p&gt;Additionally, the query corpus (Section 6.2) consists of short keyword queries that lexically match symbol names. This is the scenario where BM25 has maximum advantage over dense embeddings. Queries requiring semantic inference (e.g., "what runs before the handler on each request" to find middleware) would likely favor embedding search. The results should be read with this bias in mind.&lt;/p&gt;

&lt;h3&gt;
  
  
  8.3 Why FastAPI's Margin Is Narrower
&lt;/h3&gt;

&lt;p&gt;AST retrieval still wins on FastAPI (1.6x advantage), but the margin is smaller than on Express (3.1x) or Gin (3.9x). Two factors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FastAPI has high symbol density.&lt;/strong&gt; With 5,325 symbols across 951 files, BM25 over symbol names produces more candidates, and the top-3 fetched symbols are sometimes larger (e.g., &lt;code&gt;request response&lt;/code&gt; on FastAPI fetches 4,009 source tokens due to large Request/Response class definitions).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RAG-512 performs relatively well on large, well-structured Python files.&lt;/strong&gt; FastAPI's code style produces chunks that, while often split (53%), still contain semantically relevant code due to the framework's dense annotation style.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  8.4 Where RAG Still Makes Sense
&lt;/h3&gt;

&lt;p&gt;AST symbol retrieval is not a universal replacement for RAG:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Natural language documentation.&lt;/strong&gt; Docstrings, README files, API descriptions, and inline comments are not syntactic symbols. RAG over prose documents remains appropriate for these artifacts. (A companion tool for section-level document retrieval handles this case separately.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Semantic similarity across naming conventions.&lt;/strong&gt; BM25 search requires lexical overlap between the query and symbol names. A query like "authentication" will not match a function named &lt;code&gt;verify_credentials&lt;/code&gt; unless the surrounding qualified name or file path contains relevant terms. Dense embedding models capture this semantic proximity. For codebases with inconsistent naming, RAG may surface relevant code that BM25 misses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Codebases without parseable structure.&lt;/strong&gt; Configuration files, data pipelines, template languages, and heavily metaprogrammed code may not produce meaningful AST symbols. RAG handles these as opaque text, which is at least something.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  8.5 Failure Modes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AST retrieval fails when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The query intent maps to code spread across many small utility functions with generic names.&lt;/li&gt;
&lt;li&gt;The symbol index is stale (file changed since last parse). Staleness detection mitigates this.&lt;/li&gt;
&lt;li&gt;The language lacks a tree-sitter grammar. Coverage is broad (25+ languages) but not complete.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RAG fails when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The relevant code is smaller than the chunk size (over-retrieval).&lt;/li&gt;
&lt;li&gt;The relevant code is larger than the chunk size (under-retrieval, split across chunks).&lt;/li&gt;
&lt;li&gt;The query is specific but the embedding model generalizes too aggressively, returning topically related but functionally irrelevant chunks.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  9. Discussion
&lt;/h2&gt;

&lt;h3&gt;
  
  
  9.1 Implications for Developer Tooling
&lt;/h3&gt;

&lt;p&gt;The results suggest that code retrieval tools should match their retrieval unit to the structure of the data. Code has natural units --- functions, classes, methods --- that are well-defined, complete, and independently meaningful. Using these as retrieval units eliminates an entire class of problems (chunk boundaries, irrelevant context, double-counting) without adding complexity.&lt;/p&gt;

&lt;p&gt;This is not a new insight. IDEs have navigated code by symbols since the 1990s (ctags, IntelliSense, Language Server Protocol). What is new is that LLM agents can use the same granularity, and the token economics make it worth doing.&lt;/p&gt;

&lt;h3&gt;
  
  
  9.2 Toward a Retrieval Interface Standard
&lt;/h3&gt;

&lt;p&gt;The retrieval workflow tested here follows a three-step pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Discover:&lt;/strong&gt; enumerate available repositories, files, or outlines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search:&lt;/strong&gt; find relevant symbols by name, kind, or text query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve:&lt;/strong&gt; fetch complete source for selected symbols.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pattern is general enough to standardize. One such effort is the jMunch Retrieval Interface (jMRI) [9], an open specification for token-efficient context retrieval in MCP servers. jMRI formalizes the discover/search/retrieve contract, requires that retrieved content represent complete semantic units (functions, classes, documentation sections), mandates stable identifiers for caching and cross-reference, and includes per-response token savings metadata so agents can measure efficiency per query. The specification defines two compliance tiers (Basic and Full), allowing implementations to adopt the interface incrementally regardless of their underlying search mechanism (BM25, embedding, hybrid).&lt;/p&gt;

&lt;p&gt;The key insight behind jMRI --- and the one supported by this paper's results --- is that the retrieval &lt;em&gt;interface&lt;/em&gt; should constrain the retrieval &lt;em&gt;unit&lt;/em&gt;. An interface that guarantees complete syntactic units eliminates chunk boundary artifacts at the contract level, not as an implementation detail that individual tools may or may not get right.&lt;/p&gt;

&lt;h3&gt;
  
  
  9.3 Cost at Scale
&lt;/h3&gt;

&lt;p&gt;At current LLM API pricing ($3--15 per million input tokens for frontier models), the difference between ~1,000 and ~3,000 tokens per query is small in absolute terms. At scale, it compounds. An agentic workflow that makes 50 retrieval queries per task, run across 100 tasks per day:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;RAG-512 tokens/day&lt;/th&gt;
&lt;th&gt;AST tokens/day&lt;/th&gt;
&lt;th&gt;Monthly savings at $10/M&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best case (Express-like, 3.1x margin)&lt;/td&gt;
&lt;td&gt;14,435,000&lt;/td&gt;
&lt;td&gt;4,620,000&lt;/td&gt;
&lt;td&gt;$98.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worst case (FastAPI-like, 1.6x margin)&lt;/td&gt;
&lt;td&gt;14,250,000&lt;/td&gt;
&lt;td&gt;9,170,000&lt;/td&gt;
&lt;td&gt;$50.80&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The range matters. On a tightly scoped codebase with well-named symbols, the savings are substantial. On a large, symbol-dense repository, the margin is real but more modest. For teams running agentic CI/CD, code review bots, or continuous refactoring agents across multiple repositories, even the worst-case savings are material over months.&lt;/p&gt;

&lt;h3&gt;
  
  
  9.4 MCP Ecosystem Fit
&lt;/h3&gt;

&lt;p&gt;The Model Context Protocol (MCP) provides a standardized interface for LLM tools. AST symbol retrieval fits naturally into MCP's tool-call model: &lt;code&gt;search_symbols&lt;/code&gt; and &lt;code&gt;get_symbol_source&lt;/code&gt; are stateless, cacheable operations that return structured JSON. The agent controls retrieval depth --- it can fetch one symbol or ten, based on the search results. This is the opposite of RAG's "always return k chunks" model, and it gives agents fine-grained control over their token budget.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Limitations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  10.1 Language Coverage
&lt;/h3&gt;

&lt;p&gt;Tree-sitter grammars exist for ~40 languages, and the implementation tested here supports 25+. Languages without grammars (niche DSLs, proprietary languages) require custom extractors or fall back to file-level retrieval. Adding a new language requires mapping AST node types to symbol kinds --- typically a few hours of work, but non-trivial.&lt;/p&gt;

&lt;h3&gt;
  
  
  10.2 Indexing Overhead
&lt;/h3&gt;

&lt;p&gt;The AST index must be built before queries can be served. For a 951-file repository, this takes 5--15 seconds. For monorepos with tens of thousands of files, indexing may take minutes. Incremental indexing (re-parse only changed files) mitigates this for iterative workflows, but the initial build cost is unavoidable.&lt;/p&gt;

&lt;h3&gt;
  
  
  10.3 Query Corpus and Repository Diversity
&lt;/h3&gt;

&lt;p&gt;Five queries across three repositories is sufficient to demonstrate the structural advantage on web framework codebases but does not claim coverage of all code exploration patterns or architectures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repository bias.&lt;/strong&gt; All three repositories are HTTP request-routing frameworks. They share a common conceptual vocabulary (router, middleware, handler, request, response, context), and the query corpus maps directly to this vocabulary. Codebases with different structures --- compilers, ML training loops, game engines, infrastructure-as-code, heavily metaprogrammed or macro-heavy code --- may produce different results. We have not tested these.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query bias.&lt;/strong&gt; All five queries are short keyword phrases that lexically match symbol names. Queries requiring semantic inference, natural language phrasing, or cross-file tracing may favor embedding-based retrieval. The results generalize most confidently to keyword-style code navigation queries on well-structured application codebases.&lt;/p&gt;

&lt;h3&gt;
  
  
  10.4 Non-Code Use Cases
&lt;/h3&gt;

&lt;p&gt;AST symbol retrieval is specific to source code. Documentation, configuration files, data files, and prose artifacts require different retrieval strategies. The benchmarks in this paper measure code retrieval only.&lt;/p&gt;

&lt;h3&gt;
  
  
  10.5 Retrieval Precision
&lt;/h3&gt;

&lt;p&gt;The benchmark measures token efficiency, not retrieval precision. Whether the top-3 retrieved symbols are the &lt;em&gt;correct&lt;/em&gt; symbols for a given query is a separate question. Independent evaluation (jMunchWorkbench) reports 96% precision on the same query corpus, but this metric is not the focus of this paper.&lt;/p&gt;

&lt;h3&gt;
  
  
  10.6 Single Tokenizer
&lt;/h3&gt;

&lt;p&gt;All token counts use &lt;code&gt;tiktoken&lt;/code&gt; with &lt;code&gt;cl100k_base&lt;/code&gt;. Claude and GPT tokenizers produce slightly different counts for the same input. We use &lt;code&gt;cl100k_base&lt;/code&gt; as a common reference point; relative ratios (AST vs. RAG) are stable across tokenizer choices.&lt;/p&gt;




&lt;h2&gt;
  
  
  11. Threats to Validity
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Internal validity.&lt;/strong&gt; The two-step token accounting (search 5 + fetch 3) inflates RAG's token count relative to a single-pass pipeline. We estimate this adds 30--40% to RAG's measured tokens. Even after adjusting, AST retrieval remains more efficient, but the margin narrows --- particularly on FastAPI, where the adjusted comparison would approach 1.1--1.2x.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Construct validity.&lt;/strong&gt; Token count is a proxy for cost and context window pressure, not a direct measure of retrieval quality. A system that uses fewer tokens but returns irrelevant code is worse. We do not measure retrieval precision comparatively in this benchmark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External validity.&lt;/strong&gt; Three web frameworks from one architectural pattern, tested with keyword queries, do not represent all codebases or query types. Generalization to monorepos, DSLs, metaprogrammed code, or natural-language queries is unvalidated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Experimenter bias.&lt;/strong&gt; The AST retrieval system under test was developed by the author. The RAG baseline was implemented specifically for this comparison and was not optimized. A third-party replication using a production RAG pipeline would strengthen the findings.&lt;/p&gt;




&lt;h2&gt;
  
  
  12. Related Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;CodeSearchNet&lt;/strong&gt; (Husain et al., 2019) established benchmarks for code search using natural language queries over function-level documentation. The retrieval unit is the function --- consistent with our approach --- but the search mechanism is embedding-based, not BM25 over symbol names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RepoMap&lt;/strong&gt; (Gauthier, 2023) uses tree-sitter to build repository outlines for LLM context, compressing file structure into tag-based summaries. This addresses the "what's in this repo" question but does not provide full source retrieval for individual symbols.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aider&lt;/strong&gt; (Gauthier, 2023) integrates repository maps with LLM code editing. Its &lt;code&gt;--map-tokens&lt;/code&gt; budget controls how much structural context the LLM receives. This is complementary to symbol retrieval: the map provides orientation, symbol retrieval provides depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SWE-agent&lt;/strong&gt; (Yang et al., 2024) and &lt;strong&gt;SWE-bench&lt;/strong&gt; (Jimenez et al., 2024) evaluate LLM agents on real GitHub issues. These agents use file-level tools (open, scroll, search) that operate at a coarser granularity than symbol retrieval. Integrating symbol-level tools into SWE-agent's action space is a natural extension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GraphCodeBERT&lt;/strong&gt; (Guo et al., 2021) and &lt;strong&gt;UniXcoder&lt;/strong&gt; (Guo et al., 2022) use data flow and AST structure during pre-training to improve code understanding. These models could serve as embedding backends for a hybrid approach: AST-structured retrieval with semantic re-ranking.&lt;/p&gt;




&lt;h2&gt;
  
  
  13. Conclusion
&lt;/h2&gt;

&lt;p&gt;The standard approach to code retrieval for LLM agents --- chunking source files into fixed-size text windows and retrieving by vector similarity --- is structurally mismatched to code. Code has natural boundaries (functions, classes, methods) that chunking ignores. The result is wasted tokens, fragmented context, and unnecessary infrastructure. The RAG ecosystem's own movement toward AST-aware chunking implicitly acknowledges this mismatch.&lt;/p&gt;

&lt;p&gt;AST-based symbol retrieval takes the idea to its logical conclusion: the retrieval unit is the symbol, not the chunk. The results on three web framework codebases are concrete: 1.6--3.9x fewer tokens per query than a naive fixed-chunk RAG pipeline, zero chunk-boundary artifacts, no embedding model, and sub-5ms query latency. End-to-end A/B tests on a production codebase confirm 20% cost savings in real agentic workflows, though with accuracy tradeoffs on fine-grained classification tasks that warrant further investigation.&lt;/p&gt;

&lt;p&gt;These results have clear scope limitations: three repos from one architectural niche, keyword queries that favor BM25, and a RAG baseline that does not represent production-grade retrieval. The structural argument --- that code retrieval should respect syntactic boundaries --- is stronger than the specific numbers, and holds regardless of whether the search mechanism is BM25, dense embeddings, or a hybrid.&lt;/p&gt;

&lt;p&gt;The issue is not the model. It is how we feed it.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., &amp;amp; Brockschmidt, M. (2019). CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. &lt;em&gt;arXiv:1909.09436&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Gauthier, P. (2023). Aider: AI pair programming in your terminal. &lt;a href="https://aider.chat" rel="noopener noreferrer"&gt;https://aider.chat&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Yang, J., Jimenez, C. E., Wettig, A., et al. (2024). SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. &lt;em&gt;arXiv:2405.15793&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Jimenez, C. E., Yang, J., Wettig, A., et al. (2024). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? &lt;em&gt;ICLR 2024&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Guo, D., Ren, S., Lu, S., et al. (2021). GraphCodeBERT: Pre-training Code Representations with Data Flow. &lt;em&gt;ICLR 2021&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Guo, D., Lu, S., Duan, N., et al. (2022). UniXcoder: Unified Cross-Modal Pre-training for Code Representation. &lt;em&gt;ACL 2022&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Maxime Brunet et al. (2024). Tree-sitter: An incremental parsing system for programming tools. &lt;a href="https://tree-sitter.github.io" rel="noopener noreferrer"&gt;https://tree-sitter.github.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. &lt;em&gt;NeurIPS 2020&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Gravelle, J. (2026). jMunch Retrieval Interface (jMRI) Specification. &lt;a href="https://github.com/jgravelle/mcp-retrieval-spec" rel="noopener noreferrer"&gt;https://github.com/jgravelle/mcp-retrieval-spec&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Appendix A: Reproduction Instructions
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;jcodemunch-mcp tiktoken

&lt;span class="c"&gt;# Index the three canonical repos&lt;/span&gt;
jcodemunch index_repo expressjs/express
jcodemunch index_repo fastapi/fastapi
jcodemunch index_repo gin-gonic/gin

&lt;span class="c"&gt;# Run AST benchmark (prints markdown table + grand summary)&lt;/span&gt;
python benchmarks/harness/run_benchmark.py

&lt;span class="c"&gt;# Run RAG baseline (requires additional deps)&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; benchmarks/requirements-rag-bench.txt
python benchmarks/harness/run_rag_baseline.py

&lt;span class="c"&gt;# Write results to files&lt;/span&gt;
python benchmarks/harness/run_benchmark.py &lt;span class="nt"&gt;--out&lt;/span&gt; benchmarks/results.md
python benchmarks/harness/run_rag_baseline.py &lt;span class="nt"&gt;--out&lt;/span&gt; benchmarks/rag_baseline_results.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both harnesses read from the same &lt;code&gt;IndexStore&lt;/code&gt;, guaranteeing identical file sets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix B: Raw Data Availability
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AST benchmark results: &lt;code&gt;benchmarks/results.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;RAG baseline results: &lt;code&gt;benchmarks/rag_baseline_results.md&lt;/code&gt; and &lt;code&gt;rag_baseline_results.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Task corpus: &lt;code&gt;benchmarks/tasks.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A/B test reports: &lt;code&gt;benchmarks/ab-test-naming-audit-2026-03-18.md&lt;/code&gt;, &lt;code&gt;benchmarks/ab-test-dead-code-2026-03-18.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A/B test raw data: &lt;a href="https://gist.github.com/Mharbulous/bb097396fa92ef1d34d03a72b56b2c61" rel="noopener noreferrer"&gt;https://gist.github.com/Mharbulous/bb097396fa92ef1d34d03a72b56b2c61&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source code: &lt;a href="https://github.com/jgravelle/jcodemunch-mcp" rel="noopener noreferrer"&gt;https://github.com/jgravelle/jcodemunch-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>astcoderetrieval</category>
      <category>tokenefficientrag</category>
      <category>langchainragvsast</category>
      <category>mcpsymbolretrieval</category>
    </item>
    <item>
      <title>Auto-Generate &amp; Sync Searchable Code Docs in Notion from Any Repo – Token-Efficient with Claude &amp; MCP</title>
      <dc:creator>J. Gravelle</dc:creator>
      <pubDate>Sun, 22 Mar 2026 15:58:23 +0000</pubDate>
      <link>https://dev.to/jgravelle/notioncodemirror-2ekp</link>
      <guid>https://dev.to/jgravelle/notioncodemirror-2ekp</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/notion-2026-03-04"&gt;Notion MCP Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3k2zsn6j5o8p88b24r8n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3k2zsn6j5o8p88b24r8n.png" alt="NotionCodeMirror logo" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;NotionCodeMirror — a CLI that auto-generates a living code documentation workspace in Notion from any GitHub repo, and keeps it in sync as the code evolves.&lt;/p&gt;

&lt;p&gt;Point it at a repo, and within minutes you get a fully structured Notion workspace:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Overview with language breakdown and symbol inventory&lt;/li&gt;
&lt;li&gt;An Architecture page written in real prose by Claude&lt;/li&gt;
&lt;li&gt;A searchable API Reference database populated with every function and class&lt;/li&gt;
&lt;li&gt;A module page for each top-level directory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it again after a PR merges, and only the changed pages update.&lt;/p&gt;

&lt;p&gt;The core idea is multi-MCP orchestration.&lt;/p&gt;

&lt;p&gt;Phase 1 uses jcodemunch-mcp to analyze the codebase:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extracts symbols&lt;/li&gt;
&lt;li&gt;Ranks them by import-graph centrality&lt;/li&gt;
&lt;li&gt;Traces dependency edges&lt;/li&gt;
&lt;li&gt;Builds class hierarchies
All of this happens without involving Claude.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phase 2 hands Claude a compact structured digest (about 8–12K tokens) instead of raw source files, so it can focus entirely on synthesis and writing.&lt;/p&gt;

&lt;p&gt;The API Reference database is batch-populated directly via HTTP, bypassing the agent loop for large inserts (100+ rows at a time).&lt;/p&gt;

&lt;p&gt;A full run on a medium-sized repo costs roughly 10–15K Claude tokens — closer to a single conversation than a traditional indexing job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=C99oAE69Og0" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=C99oAE69Og0&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Show us the code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/jgravelle/notion-code-mirror" rel="noopener noreferrer"&gt;https://github.com/jgravelle/notion-code-mirror&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ...and the results:
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.notion.so/jcodemunch-mcp-CodeMirror-32b752802a7f812da7bee46c5460beb1" rel="noopener noreferrer"&gt;https://www.notion.so/jcodemunch-mcp-CodeMirror-32b752802a7f812da7bee46c5460beb1&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Notion MCP
&lt;/h2&gt;

&lt;p&gt;Notion MCP is the write layer of the pipeline.&lt;/p&gt;

&lt;p&gt;After Claude analyzes the gathered repo data, it calls four lightweight Python tools: notion_create_page, notion_create_database, notion_update_page, and done. Each of those dispatches to the Notion MCP server via an async stdio session — the same MCP transport pattern used on the code-analysis side with jcodemunch-mcp. Both MCP servers stay open concurrently for the duration of the run.&lt;/p&gt;

&lt;p&gt;Each of those dispatches to the Notion API under the hood. The MCP connection is managed as an async stdio session alongside the jcodemunch-mcp session, so both servers stay open for the duration of the run.&lt;/p&gt;

&lt;p&gt;The API Reference rows are the one exception: Claude creates the database shell and signals completion via done, then Python batch-inserts every symbol row through the same Notion MCP session — keeping Claude out of a loop that would otherwise cost 100+ tool-call round-trips.&lt;/p&gt;

&lt;p&gt;What Notion MCP specifically unlocks is direct writing into a structured workspace instead of dumping out a Markdown file you then have to paste somewhere. Claude doesn’t just generate text. It decides the page hierarchy, picks emoji icons, chooses what belongs in a database versus a prose page, and places everything under the right parent. That produces a workspace that is immediately navigable and shareable rather than a lonely text artifact drifting around your desktop.&lt;/p&gt;

&lt;p&gt;The incremental sync story also depends on MCP making page IDs first-class. On the first run, every created page and database ID is saved to a local state file. On --sync, those IDs go back into Claude’s context, and it calls notion_update_page on the existing objects instead of duplicating them.&lt;/p&gt;

&lt;p&gt;Without MCP as the integration layer, you’d need a custom Notion client plus a fair amount of bookkeeping to get the same result...&lt;/p&gt;

&lt;p&gt;-jgravelle&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>notionchallenge</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Bringing The Receipts - 95% AI LLM Token Savings</title>
      <dc:creator>J. Gravelle</dc:creator>
      <pubDate>Thu, 19 Mar 2026 18:42:25 +0000</pubDate>
      <link>https://dev.to/jgravelle/bringing-the-receipts-95-ai-llm-token-savings-1eni</link>
      <guid>https://dev.to/jgravelle/bringing-the-receipts-95-ai-llm-token-savings-1eni</guid>
      <description>&lt;h1&gt;
  
  
  95% Token Reduction, 96% Precision:
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Benchmarking jCodeMunch Against Chunk RAG and Naive File Reading
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Here are the benchmarks... AND the bench!
&lt;/h3&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Across 15 tasks on 3 real repos, structured MCP symbol retrieval achieves &lt;strong&gt;95% avg token reduction&lt;/strong&gt; vs naive file reading — while hitting &lt;strong&gt;96% precision&lt;/strong&gt; vs &lt;strong&gt;74% for chunk RAG&lt;/strong&gt;. The benchmark harness is open-source. You can reproduce every number in under 5 minutes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;This is a follow-up to &lt;a href="https://dev.to/jgravelle/your-ai-agent-is-dumpster-diving-through-your-code-326f"&gt;Your AI Agent Is Dumpster Diving Through Your Code&lt;/a&gt; — the argument there is the setup for the proof here. Worth 5 minutes if you haven't read it.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Must-read setup:&lt;/strong&gt; The first article lays out &lt;em&gt;why&lt;/em&gt; file-reading agents waste tokens structurally — not because they're badly written, but because reading whole files is the wrong unit of retrieval for code. This article is the empirical test of that claim.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Last time, I argued that AI coding agents waste absurd amounts of tokens rummaging through whole files and sloppy chunks. Fair enough. Big claim.&lt;/p&gt;

&lt;p&gt;So here are the receipts: the benchmark results, the methodology, and the free open-source tool we built so anyone can test the same patterns for themselves.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers, Up Front
&lt;/h2&gt;

&lt;p&gt;Because you shouldn't have to spelunk for them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline tokens (15 task-runs, 3 repos)&lt;/td&gt;
&lt;td&gt;1,865,210&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;jCodeMunch tokens (same tasks)&lt;/td&gt;
&lt;td&gt;92,515&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average reduction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;95.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ratio&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20.2x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;3 repos. 5 queries each. 15 total task-runs. Measured with tiktoken &lt;code&gt;cl100k_base&lt;/code&gt;. Last run: March 2026.&lt;/strong&gt; Baseline is all indexed source files concatenated — the minimum tokens a "read everything first" agent would consume in a single pass. Real agents re-read files and explore multiple branches, so production savings are higher.&lt;/p&gt;

&lt;p&gt;The single most dramatic example: for the query &lt;code&gt;"dependency injection"&lt;/code&gt; against the FastAPI codebase, a standard file-reading agent consumed &lt;strong&gt;214,312 tokens&lt;/strong&gt; and cost ~$1.00. The same query through jCodeMunch returned the exact symbols in &lt;strong&gt;480 tokens&lt;/strong&gt; at ~$0.002.&lt;/p&gt;

&lt;p&gt;That's not a rounding error. That's a different category of efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs68inbcvw48yn0hoqclv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs68inbcvw48yn0hoqclv.png" alt="jMunchWorkbench side-by-side output: query results, token counts, relevance scores, latency comparison — baseline vs jCodeMunch" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Side-by-side terminal output: standard agent greps 156 files, reads 3 of them, burns 214K tokens. jCodeMunch: one search call, one get_symbol call, 480 tokens. Same answer.)&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How the MCP Symbol Retrieval Benchmark Works
&lt;/h2&gt;

&lt;p&gt;Three approaches compared on three real public codebases (expressjs/express, fastapi/fastapi, gin-gonic/gin). Same queries against each.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 1: Naive File Reading
&lt;/h3&gt;

&lt;p&gt;The agent reads all source files. This is what agents do when they have no retrieval layer — grep for a keyword, open the files that matched, read them in full. It's also the cleanest possible baseline: we concatenate all source files and count the tokens once. That number is a &lt;em&gt;lower bound&lt;/em&gt; on what a real "open everything" agent pays per session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 2: Chunk-Based RAG
&lt;/h3&gt;

&lt;p&gt;Files are split into overlapping windows of text, embedded, and ranked by similarity to the query. The top chunks are returned. It's cheaper than naive — but chunk boundaries fall in the middle of functions, and similarity ranking is approximate by design. You pay less &lt;em&gt;and&lt;/em&gt; you get less precise results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 3: Tree-Sitter Structured Retrieval (jCodeMunch via jMRI)
&lt;/h3&gt;

&lt;p&gt;Every source file is parsed into an AST-derived index of named, addressable symbols: functions, classes, methods, constants. A search query returns symbol IDs and metadata — not source. A retrieve call on a specific ID returns exactly that symbol's source, byte-offset-addressed directly from the original file. No chunks. No boundaries. No guessing.&lt;/p&gt;

&lt;p&gt;The workflow measured:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;search_symbols(query, max_results=5)&lt;/code&gt; → ranked symbol metadata (IDs + signatures)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_symbol(id)&lt;/code&gt; × 3 → exact source for the top 3 hits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total tokens&lt;/strong&gt; = search response + 3 × symbol source&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Introducing jMunchWorkbench: The Measuring Stick
&lt;/h2&gt;

&lt;p&gt;The benchmark harness measures token efficiency. But token efficiency is only half the story — the other half is: &lt;em&gt;did you actually retrieve the right code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's what &lt;strong&gt;&lt;a href="https://github.com/jgravelle/jMunchWorkbench" rel="noopener noreferrer"&gt;jMunchWorkbench&lt;/a&gt;&lt;/strong&gt; is for.&lt;/p&gt;

&lt;p&gt;jMunchWorkbench runs the same prompt in two modes — baseline file reading and jCodeMunch symbol retrieval — and compares answers, token counts, and latency side by side. A human evaluator judges whether the top-3 retrieved symbols are relevant to the query intent. That's how the 96% precision figure is generated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp2ymkw3z4s2n7uqrzgw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp2ymkw3z4s2n7uqrzgw.png" alt="jMunchWorkbench side-by-side output: query results, token counts, relevance scores, latency comparison — baseline vs jCodeMunch" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the measuring stick, not just the number. You are not being asked to trust a claim. You are being given the instrument we used to make the claim, and you can run it yourself.&lt;/p&gt;

&lt;p&gt;"We got 96% precision" is a marketing assertion. "Here is the evaluator, here are the queries, here is the ground truth, reproduce it" is a methodology.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Benchmark: Tree-Sitter Retrieval vs Naive Reading
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tokenizer:&lt;/strong&gt; &lt;code&gt;tiktoken cl100k_base&lt;/code&gt; | &lt;strong&gt;Workflow:&lt;/strong&gt; &lt;code&gt;search_symbols&lt;/code&gt; (top 5) + &lt;code&gt;get_symbol&lt;/code&gt; × 3 | &lt;strong&gt;Last run:&lt;/strong&gt; March 2026&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repo&lt;/th&gt;
&lt;th&gt;Files&lt;/th&gt;
&lt;th&gt;Symbols&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;th&gt;jMunch avg tokens&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;expressjs/express&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;117&lt;/td&gt;
&lt;td&gt;73,838&lt;/td&gt;
&lt;td&gt;~966&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;98.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;129.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fastapi/fastapi&lt;/td&gt;
&lt;td&gt;156&lt;/td&gt;
&lt;td&gt;1,359&lt;/td&gt;
&lt;td&gt;214,312&lt;/td&gt;
&lt;td&gt;~15,609&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92.7%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;49.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin-gonic/gin&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;805&lt;/td&gt;
&lt;td&gt;84,892&lt;/td&gt;
&lt;td&gt;~1,728&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;98.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Grand total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;230&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2,281&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,865,210&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92,515&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;95.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20.2x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The FastAPI numbers are the "worst" case — and they still show 92.7% reduction. FastAPI is the largest and most symbol-dense codebase of the three (156 files, 1,359 symbols). Broad queries like &lt;code&gt;"router route handler"&lt;/code&gt; pull in more symbols and more source. Specific queries like &lt;code&gt;"error exception"&lt;/code&gt; and &lt;code&gt;"context bind"&lt;/code&gt; return surgical results even on a large codebase, hitting 99% reduction.&lt;/p&gt;

&lt;p&gt;Per-query detail: fastapi/fastapi&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;th&gt;jMunch tokens&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;router route handler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;214,312&lt;/td&gt;
&lt;td&gt;43,474&lt;/td&gt;
&lt;td&gt;79.7%&lt;/td&gt;
&lt;td&gt;4.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;middleware&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;214,312&lt;/td&gt;
&lt;td&gt;24,271&lt;/td&gt;
&lt;td&gt;88.7%&lt;/td&gt;
&lt;td&gt;8.8x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;error exception&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;214,312&lt;/td&gt;
&lt;td&gt;2,233&lt;/td&gt;
&lt;td&gt;99.0%&lt;/td&gt;
&lt;td&gt;96.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;214,312&lt;/td&gt;
&lt;td&gt;5,966&lt;/td&gt;
&lt;td&gt;97.2%&lt;/td&gt;
&lt;td&gt;35.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;context bind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;214,312&lt;/td&gt;
&lt;td&gt;2,102&lt;/td&gt;
&lt;td&gt;99.0%&lt;/td&gt;
&lt;td&gt;102.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Per-query detail: expressjs/express&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;th&gt;jMunch tokens&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;router route handler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;73,838&lt;/td&gt;
&lt;td&gt;1,221&lt;/td&gt;
&lt;td&gt;98.3%&lt;/td&gt;
&lt;td&gt;60.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;middleware&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;73,838&lt;/td&gt;
&lt;td&gt;1,360&lt;/td&gt;
&lt;td&gt;98.2%&lt;/td&gt;
&lt;td&gt;54.3x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;error exception&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;73,838&lt;/td&gt;
&lt;td&gt;1,381&lt;/td&gt;
&lt;td&gt;98.1%&lt;/td&gt;
&lt;td&gt;53.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;73,838&lt;/td&gt;
&lt;td&gt;1,699&lt;/td&gt;
&lt;td&gt;97.7%&lt;/td&gt;
&lt;td&gt;43.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;context bind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;73,838&lt;/td&gt;
&lt;td&gt;169&lt;/td&gt;
&lt;td&gt;99.8%&lt;/td&gt;
&lt;td&gt;436.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Per-query detail: gin-gonic/gin&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Baseline tokens&lt;/th&gt;
&lt;th&gt;jMunch tokens&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;router route handler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;84,892&lt;/td&gt;
&lt;td&gt;1,355&lt;/td&gt;
&lt;td&gt;98.4%&lt;/td&gt;
&lt;td&gt;62.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;middleware&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;84,892&lt;/td&gt;
&lt;td&gt;2,178&lt;/td&gt;
&lt;td&gt;97.4%&lt;/td&gt;
&lt;td&gt;39.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;error exception&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;84,892&lt;/td&gt;
&lt;td&gt;1,470&lt;/td&gt;
&lt;td&gt;98.3%&lt;/td&gt;
&lt;td&gt;57.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;84,892&lt;/td&gt;
&lt;td&gt;1,642&lt;/td&gt;
&lt;td&gt;98.1%&lt;/td&gt;
&lt;td&gt;51.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;context bind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;84,892&lt;/td&gt;
&lt;td&gt;1,994&lt;/td&gt;
&lt;td&gt;97.7%&lt;/td&gt;
&lt;td&gt;42.6x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Three-Way Comparison: Naive vs Chunk RAG vs Structured Retrieval
&lt;/h2&gt;

&lt;p&gt;The summary table above shows naive vs jCodeMunch. Here's where chunk RAG fits. These numbers are from jMunchWorkbench precision evaluation runs on the FastAPI codebase — the largest and hardest test case. The naive baseline here reflects actual agent-session overhead (multiple file reads across a session), which is why it differs from the concatenated baseline above.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Avg tokens (FastAPI)&lt;/th&gt;
&lt;th&gt;Cost/query&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Naive file reading&lt;/td&gt;
&lt;td&gt;949,904&lt;/td&gt;
&lt;td&gt;~$2.85&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunk-based RAG&lt;/td&gt;
&lt;td&gt;330,372&lt;/td&gt;
&lt;td&gt;~$0.99&lt;/td&gt;
&lt;td&gt;74%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;jCodeMunch (structured)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;480&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$0.0014&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The conventional assumption is that you trade precision for efficiency — chunk RAG is "cheaper but less accurate than reading everything." &lt;strong&gt;jCodeMunch inverts that tradeoff: cheaper than both, and more accurate than chunks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;96% precision vs 74% is not a marginal improvement. That's the difference between an agent that finds the right function 19 times out of 20, and one that finds it 15 times out of 20 — with 4 wasted retrieval roundtrips per 20 queries at chunk-RAG prices.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World A/B Tests: Reduce Claude Code Token Usage in Production
&lt;/h2&gt;

&lt;p&gt;Synthetic benchmark queries are necessary for reproducibility, but they're not sufficient. You need to know what happens in production.&lt;/p&gt;

&lt;p&gt;Big thanks to community member &lt;strong&gt;&lt;a href="https://github.com/Mharbulous" rel="noopener noreferrer"&gt;@Mharbulous&lt;/a&gt;&lt;/strong&gt;, who ran two independent 50-iteration A/B tests on a real Vue 3 + Firebase production codebase (Vite, Vuetify 3, Cloud Functions) and open-sourced the raw data. Same task, same model (Claude Sonnet), randomized tool assignment — this is the kind of rigorous community testing that keeps benchmark authors honest.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test 1: Naming Audit Task
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Native tools&lt;/th&gt;
&lt;th&gt;jCodeMunch&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Success rate&lt;/td&gt;
&lt;td&gt;72%&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+8 pp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Timeout rate&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;td&gt;32%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−8 pp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean cost/iteration&lt;/td&gt;
&lt;td&gt;$0.783&lt;/td&gt;
&lt;td&gt;$0.738&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−5.7%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean cache creation tokens&lt;/td&gt;
&lt;td&gt;104,135&lt;/td&gt;
&lt;td&gt;93,178&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−10.5%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Test 2: Dead Code Detection (Isolated Tool-Layer Cost)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Native tools&lt;/th&gt;
&lt;th&gt;jCodeMunch&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mean cost/iteration&lt;/td&gt;
&lt;td&gt;$0.4474&lt;/td&gt;
&lt;td&gt;$0.3560&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−20.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean total tokens&lt;/td&gt;
&lt;td&gt;449,356&lt;/td&gt;
&lt;td&gt;289,275&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−36%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean duration&lt;/td&gt;
&lt;td&gt;129s&lt;/td&gt;
&lt;td&gt;117s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−9%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dead file F1&lt;/td&gt;
&lt;td&gt;95.8%&lt;/td&gt;
&lt;td&gt;95.7%&lt;/td&gt;
&lt;td&gt;equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Cost savings in the dead code test: Wilcoxon p=0.0074, Cohen's d=−0.583.&lt;/strong&gt; Statistically significant.&lt;/p&gt;

&lt;p&gt;The mechanism is direct: structured queries return smaller payloads than raw file reads — 39% fewer cache reads, lower cost, faster completion. Dead file detection is equivalent at ~96% F1 with no accuracy penalty.&lt;/p&gt;

&lt;p&gt;One honest note: there's an export-level classification gap (alive-file exports) that @Mharbulous surfaced. Three root causes were found and addressed in v1.8.1. The data is in the repo — not buried, not redacted.&lt;/p&gt;

&lt;p&gt;Raw data: &lt;a href="https://gist.github.com/Mharbulous/bb097396fa92ef1d34d03a72b56b2c61" rel="noopener noreferrer"&gt;gist.github.com/Mharbulous/bb097396fa92ef1d34d03a72b56b2c61&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cw3sso491wzh8jbdves.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cw3sso491wzh8jbdves.png" alt="Live telemetry counter showing community-wide token savings — billions saved across participating jCodeMunch sessions" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(The live counter at j.gravelle.us/jCodeMunch pulls real session telemetry across all participating users — every &lt;code&gt;get_symbol&lt;/code&gt; call reports &lt;code&gt;tokens_saved&lt;/code&gt; locally. No estimation. The number you see is computed from actual session data.)&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Tree-Sitter Beats Chunk RAG for AI Coding Agents
&lt;/h2&gt;

&lt;p&gt;The difference is not magic. It's granularity.&lt;/p&gt;

&lt;p&gt;Chunk RAG cuts files into overlapping text windows and ranks them by embedding similarity. It has two structural problems: boundaries fall in the middle of functions (so you get partial logic), and similarity ranking returns things that &lt;em&gt;look like&lt;/em&gt; the answer rather than things that &lt;em&gt;are&lt;/em&gt; the answer.&lt;/p&gt;

&lt;p&gt;Structured retrieval works at the right level of abstraction from the start. jCodeMunch uses tree-sitter to parse source into a symbol index. Each symbol has a deterministic ID and a byte offset in the original file. Search returns IDs. Retrieval seeks directly to the byte offset — O(1) access, no re-scanning, no boundary accidents. You get the whole function or class, exactly as written, every time.&lt;/p&gt;

&lt;p&gt;The precision gap (96% vs 74%) is a structural consequence of this. When you retrieve a symbol by AST-derived ID, you get the complete logical unit. When you retrieve by similarity score, you get a text window that may or may not contain it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproduce It in Under 5 Minutes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;jcodemunch-mcp tiktoken

&lt;span class="c"&gt;# Index the three benchmark repos&lt;/span&gt;
jcodemunch index_repo expressjs/express
jcodemunch index_repo fastapi/fastapi
jcodemunch index_repo gin-gonic/gin

&lt;span class="c"&gt;# Run the benchmark&lt;/span&gt;
python benchmarks/harness/run_benchmark.py

&lt;span class="c"&gt;# Write results to markdown&lt;/span&gt;
python benchmarks/harness/run_benchmark.py &lt;span class="nt"&gt;--out&lt;/span&gt; my_results.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The harness reads &lt;code&gt;tasks.json&lt;/code&gt; (the same five queries we used), calls &lt;code&gt;search_symbols&lt;/code&gt; and &lt;code&gt;get_symbol&lt;/code&gt;, counts tokens with tiktoken, and writes the same markdown tables you see in &lt;code&gt;results.md&lt;/code&gt;. Swap in your own repos, add your own queries, change the number of symbols fetched. The full methodology is in &lt;code&gt;METHODOLOGY.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For precision measurement, run jMunchWorkbench — same query set, human-evaluable relevance scoring, side-by-side comparison output.&lt;/p&gt;

&lt;p&gt;If you think the results are great: reproduce them.&lt;br&gt;
If you think they're flawed: reproduce them harder.&lt;/p&gt;

&lt;p&gt;That's why we built the bench.&lt;/p&gt;


&lt;h2&gt;
  
  
  Try jCodeMunch: Reduce Claude Code Token Usage Now
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Free for non-commercial use&lt;/strong&gt; (personal projects, research, hobby). One-minute setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx jcodemunch-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to &lt;code&gt;~/.claude.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"jcodemunch-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"jcodemunch-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Free forever for personal/hobby use. Team and org licenses start at &lt;strong&gt;$79 (Builder, 1 dev) / $349 (Studio, up to 5 devs) / $1,999 (Platform, org-wide)&lt;/strong&gt; — &lt;a href="https://j.gravelle.us/jCodeMunch/" rel="noopener noreferrer"&gt;see pricing&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/jgravelle/jcodemunch-mcp" rel="noopener noreferrer"&gt;github.com/jgravelle/jcodemunch-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark harness&lt;/strong&gt;: &lt;code&gt;benchmarks/harness/run_benchmark.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Methodology&lt;/strong&gt;: &lt;code&gt;benchmarks/METHODOLOGY.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workbench&lt;/strong&gt;: &lt;a href="https://github.com/jgravelle/jMunchWorkbench" rel="noopener noreferrer"&gt;github.com/jgravelle/jMunchWorkbench&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open spec (jMRI)&lt;/strong&gt;: &lt;a href="https://github.com/jgravelle/mcp-retrieval-spec" rel="noopener noreferrer"&gt;github.com/jgravelle/mcp-retrieval-spec&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is this better than Cursor's built-in indexing?&lt;/strong&gt;&lt;br&gt;
Cursor's indexing is optimized for autocomplete and inline suggestions — it uses chunk embeddings at the file level. jCodeMunch is optimized for agent retrieval: symbol-level precision, AST-derived IDs, O(1) byte-offset access. Different tools, different problems. If you're running agents that need to retrieve specific functions or classes from large repos, jCodeMunch is purpose-built for that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does it work with Gemini, Antigravity, or Cursor in agent mode?&lt;/strong&gt;&lt;br&gt;
Yes. jCodeMunch implements the Model Context Protocol (MCP), which works with any MCP-compatible client: Claude Code, Google Antigravity, Cursor (agent/composer mode), and anything else that speaks MCP. Setup is identical across clients — add the server to your MCP config, restart, index a repo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does byte-offset retrieval avoid chunk boundary issues?&lt;/strong&gt;&lt;br&gt;
When jCodeMunch indexes a file with tree-sitter, it stores each symbol's start and end byte positions in the original source file alongside the symbol ID. At retrieval time, &lt;code&gt;get_symbol&lt;/code&gt; opens the file, seeks directly to that byte offset, and reads exactly that many bytes. No file scanning. No chunking. No re-parsing. The result is always the complete syntactic unit — function body, class definition, or constant — as it appears in the original source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not just use a larger context window instead?&lt;/strong&gt;&lt;br&gt;
Context window cost scales linearly with tokens consumed. A 200K-token context window doesn't make 200K tokens cheaper — it just lets you burn more of them before hitting the limit. jCodeMunch keeps retrieval cost near zero by returning only the exact symbols the agent needs, regardless of repo size. The larger the repo, the bigger the advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this work with non-Python repos?&lt;/strong&gt;&lt;br&gt;
Yes. jCodeMunch uses tree-sitter for parsing, which supports 30+ languages including TypeScript, Go, Rust, Java, C/C++, Ruby, and more. The three benchmark repos span Python, JavaScript, and Go for exactly this reason.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Benchmark source: &lt;code&gt;benchmarks/harness/run_benchmark.py&lt;/code&gt; | Tokenizer: tiktoken &lt;code&gt;cl100k_base&lt;/code&gt; | Data: &lt;code&gt;benchmarks/results.md&lt;/code&gt; | Last run: March 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A/B test raw data by @Mharbulous: &lt;a href="https://gist.github.com/Mharbulous/bb097396fa92ef1d34d03a72b56b2c61" rel="noopener noreferrer"&gt;gist.github.com/Mharbulous/bb097396fa92ef1d34d03a72b56b2c61&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>mcp</category>
      <category>performance</category>
      <category>rag</category>
    </item>
    <item>
      <title>Your AI Agent Is Dumpster Diving Through Your Code...</title>
      <dc:creator>J. Gravelle</dc:creator>
      <pubDate>Mon, 09 Mar 2026 19:54:10 +0000</pubDate>
      <link>https://dev.to/jgravelle/your-ai-agent-is-dumpster-diving-through-your-code-326f</link>
      <guid>https://dev.to/jgravelle/your-ai-agent-is-dumpster-diving-through-your-code-326f</guid>
      <description>&lt;p&gt;&lt;em&gt;...and we built something to stop it.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There's a pattern every developer using AI agents eventually notices. You ask the agent to find where authentication is handled. It opens a file. Skims 2,000 lines. Opens another file. Skims that. Opens a third. By the time it answers, it's consumed 40,000 tokens — most of them irrelevant — and your context window is half-gone before the real work starts.&lt;/p&gt;

&lt;p&gt;We call this dumpster diving. The agent isn't reading strategically. It's digging through everything looking for something edible.&lt;/p&gt;

&lt;p&gt;We've been watching this happen across millions of sessions with &lt;a href="https://j.gravelle.us/jCodeMunch/" rel="noopener noreferrer"&gt;jCodeMunch&lt;/a&gt; and &lt;a href="https://j.gravelle.us/jDocMunch/" rel="noopener noreferrer"&gt;jDocMunch&lt;/a&gt;. And we built something to fix it: &lt;strong&gt;jMRI&lt;/strong&gt; — the jMunch Retrieval Interface.&lt;/p&gt;

&lt;p&gt;Today we're publishing the spec, the benchmark, and an open-source SDK. All Apache 2.0.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;We ran the benchmark against two real codebases: FastAPI and Flask. Three methods compared: naive file reading, chunk RAG, and jMRI retrieval via jCodeMunch. Ten queries per repo. Here's what came out:&lt;/p&gt;

&lt;h3&gt;
  
  
  FastAPI (~950K source tokens)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Avg Tokens&lt;/th&gt;
&lt;th&gt;Cost/Query&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Naive (read all files)&lt;/td&gt;
&lt;td&gt;949,904&lt;/td&gt;
&lt;td&gt;$2.85&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunk RAG&lt;/td&gt;
&lt;td&gt;330,372&lt;/td&gt;
&lt;td&gt;$0.99&lt;/td&gt;
&lt;td&gt;74%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;jMRI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;480&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.0014&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Flask (~148K source tokens)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Avg Tokens&lt;/th&gt;
&lt;th&gt;Cost/Query&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Naive (read all files)&lt;/td&gt;
&lt;td&gt;147,854&lt;/td&gt;
&lt;td&gt;$0.44&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunk RAG&lt;/td&gt;
&lt;td&gt;55,251&lt;/td&gt;
&lt;td&gt;$0.17&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;jMRI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;480&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.0014&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;jMRI uses &lt;strong&gt;1,979x fewer tokens than naive&lt;/strong&gt; on FastAPI. It also &lt;strong&gt;beats chunk RAG on precision&lt;/strong&gt; — 96% vs 74%.&lt;/p&gt;

&lt;p&gt;That last point matters. The usual assumption is that precision is the tradeoff you make for efficiency. Chunk RAG is cheaper than naive but misses more. jMRI is cheaper than both and misses less. That's not a coincidence — it's a consequence of using structure instead of text similarity.&lt;/p&gt;

&lt;p&gt;Reproduce it yourself in under 5 minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/jgravelle/mcp-retrieval-spec
&lt;span class="nb"&gt;cd &lt;/span&gt;mcp-retrieval-spec/benchmark/munch-benchmark
python benchmark.py &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why Chunk RAG Loses on Precision
&lt;/h2&gt;

&lt;p&gt;Chunk RAG splits files into overlapping windows of text and ranks them by keyword overlap or embedding similarity. A chunk boundary might fall in the middle of a function. The top-ranked chunk might contain the right words but not the right code. The retrieval is approximate by design.&lt;/p&gt;

&lt;p&gt;jMRI retrieval is structurally exact. jCodeMunch parses source files into an AST-derived index: every function, class, and method is a named, addressable symbol with a stable ID. When you search for &lt;code&gt;"OAuth2 password bearer authentication"&lt;/code&gt;, you get back IDs like &lt;code&gt;fastapi/security/oauth2.py::OAuth2PasswordBearer#class&lt;/code&gt;. When you retrieve that ID, you get exactly the class — no more, no less. No boundary accidents. No half-functions.&lt;/p&gt;

&lt;p&gt;The 96% precision figure reflects cases where the top search result was the correct symbol for the query. The 4% where it wasn't were genuinely ambiguous queries — where even a human would have debated the right answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is jMRI?
&lt;/h2&gt;

&lt;p&gt;jMRI (jMunch Retrieval Interface) is an open specification for MCP servers that do retrieval right.&lt;/p&gt;

&lt;p&gt;Four operations. One response envelope. Two compliance levels.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent
  │
  ├─ discover()    → What knowledge sources are available?
  ├─ search(query) → Which symbols/sections are relevant? (IDs + summaries only)
  ├─ retrieve(id)  → Give me the exact source for this ID.
  └─ metadata(id?) → What would naive reading have cost?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every response includes a &lt;code&gt;_meta&lt;/code&gt; block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"def get_db():&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;    db = SessionLocal()&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;    try:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;        yield db&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;    finally:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;        db.close()&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tokens_saved"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42318&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens_saved"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1284950&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cost_avoided"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"claude-sonnet-4-6"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.127&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"timing_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent doesn't have to guess whether it's being efficient. It knows, on every call.&lt;/p&gt;

&lt;p&gt;The spec is deliberately minimal. We're not trying to build a platform. We're trying to name a pattern that already works at scale and make it easy for others to implement.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Implementations
&lt;/h2&gt;

&lt;p&gt;The spec is open. The best implementations are commercial.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Stars&lt;/th&gt;
&lt;th&gt;Install&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/jgravelle/jcodemunch-mcp" rel="noopener noreferrer"&gt;jCodeMunch&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Code (30+ languages)&lt;/td&gt;
&lt;td&gt;900+&lt;/td&gt;
&lt;td&gt;&lt;code&gt;uvx jcodemunch-mcp&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/jgravelle/jdocmunch-mcp" rel="noopener noreferrer"&gt;jDocMunch&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Docs (MD, RST, HTML, notebooks)&lt;/td&gt;
&lt;td&gt;45+&lt;/td&gt;
&lt;td&gt;&lt;code&gt;uvx jdocmunch-mcp&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both implement &lt;strong&gt;jMRI-Full&lt;/strong&gt; — the complete spec including batch retrieval, hash-based drift detection, byte-offset addressing, and the full &lt;code&gt;_meta&lt;/code&gt; envelope.&lt;/p&gt;

&lt;p&gt;The two servers have collectively saved over &lt;strong&gt;18 billion tokens&lt;/strong&gt; across user sessions the first week of March 2026. That number is computed on-device from real session telemetry — every participating response reports &lt;code&gt;tokens_saved&lt;/code&gt; via &lt;code&gt;os.stat&lt;/code&gt;, no estimation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;

&lt;p&gt;Add to &lt;code&gt;~/.claude.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"jcodemunch-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"jcodemunch-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"jdocmunch-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"jdocmunch-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;jmri-sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;jmri.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MRIClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MRIClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# What's indexed?
&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;discover&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Find it
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database session dependency injection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fastapi/fastapi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get exactly that
&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fastapi/fastapi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokens saved this call: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_meta&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tokens_saved&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  TypeScript SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;MRIClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mri-client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MRIClient&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OAuth2 bearer auth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fastapi/fastapi&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;symbol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fastapi/fastapi&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Open Spec
&lt;/h2&gt;

&lt;p&gt;Everything is at &lt;a href="https://github.com/jgravelle/mcp-retrieval-spec" rel="noopener noreferrer"&gt;github.com/jgravelle/mcp-retrieval-spec&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SPEC.md&lt;/code&gt; — the full jMRI v1.0 specification (Apache 2.0)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sdk/python/&lt;/code&gt; — Python client helper&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sdk/typescript/&lt;/code&gt; — TypeScript client&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reference/server.py&lt;/code&gt; — minimal jMRI-compliant MCP server&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;examples/&lt;/code&gt; — Claude Code, Cursor, and generic agent integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The spec is intentionally minimal. PRs that improve examples or add language SDKs are welcome. PRs that extend the core interface need a strong argument.&lt;/p&gt;

&lt;p&gt;If you're building a retrieval MCP server, implement jMRI-Core. Your users' agents will thank you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;— J. Gravelle, March 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Benchmark source: &lt;a href="https://github.com/jgravelle/mcp-retrieval-spec/tree/main/benchmark" rel="noopener noreferrer"&gt;github.com/jgravelle/mcp-retrieval-spec/benchmark&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;SDK: &lt;code&gt;pip install jmri-sdk&lt;/code&gt; | &lt;code&gt;npm install mri-client&lt;/code&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Spec: &lt;a href="https://github.com/jgravelle/mcp-retrieval-spec" rel="noopener noreferrer"&gt;github.com/jgravelle/mcp-retrieval-spec&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
