<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Blas Rodriguez Irizar</title>
    <description>The latest articles on DEV Community by Blas Rodriguez Irizar (@blasrodri).</description>
    <link>https://dev.to/blasrodri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F893796%2F69e3a1c6-9a42-45e4-8030-33441add4de8.jpeg</url>
      <title>DEV Community: Blas Rodriguez Irizar</title>
      <link>https://dev.to/blasrodri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/blasrodri"/>
    <language>en</language>
    <item>
      <title>RAG for codebases is hard. Trusting the answer is harder.</title>
      <dc:creator>Blas Rodriguez Irizar</dc:creator>
      <pubDate>Mon, 29 Jun 2026 17:16:36 +0000</pubDate>
      <link>https://dev.to/blasrodri/rag-for-codebases-is-hard-trusting-the-answer-is-harder-4lfi</link>
      <guid>https://dev.to/blasrodri/rag-for-codebases-is-hard-trusting-the-answer-is-harder-4lfi</guid>
      <description>&lt;p&gt;There's a good post making the rounds — &lt;a href="https://dev.to/mahima_thacker/rag-for-codebases-is-harder-than-it-looks-1nhg"&gt;"RAG for Codebases Is Harder Than It Looks"&lt;/a&gt; — on why naive retrieval falls apart on code. Filtering out &lt;code&gt;node_modules&lt;/code&gt;, chunking on AST boundaries instead of token counts, preserving file-path metadata, treating architecture questions as graph traversals rather than nearest-neighbor lookups. All correct. If you're building code RAG, read it.&lt;/p&gt;

&lt;p&gt;But notice where the author lands at the very end:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Developer trust depends on retrieval quality, &lt;strong&gt;honest uncertainty acknowledgment, and verifiable sources&lt;/strong&gt; — not just algorithmic sophistication.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the part I want to pull on. Because everything before it — better chunking, better embeddings, repo maps — makes the model's answer &lt;em&gt;more likely to be right&lt;/em&gt;. None of it makes the answer &lt;em&gt;checkable&lt;/em&gt;. And once an LLM is writing your code, "more likely to be right" is not the same problem as "did it actually do what it just told me it did."&lt;/p&gt;

&lt;h2&gt;
  
  
  The retrieval is upstream of a bigger trust gap
&lt;/h2&gt;

&lt;p&gt;Good RAG gets the model better context. The model then makes an edit and reports back: &lt;em&gt;"I added the &lt;code&gt;/v1/refund&lt;/code&gt; route, set &lt;code&gt;MAX_RETRIES&lt;/code&gt; to 5, only touched the parser, and the tests pass."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every one of those is a factual claim about the repo as it exists right now. And here's the uncomfortable measurement: across 100 real SWE-agent runs on real GitHub issues, &lt;strong&gt;30% of the attempts that &lt;em&gt;failed&lt;/em&gt; the test suite still claimed they fixed the issue&lt;/strong&gt; — "the method has been successfully added," "the issue has been resolved." Ground truth from the SWE-bench eval, claims judged by an LLM, reproducible.&lt;/p&gt;

&lt;p&gt;Better retrieval doesn't fix that. A model with perfect context can still over-claim what it did. The hallucination moves from "wrong fact about the codebase" to "wrong fact about my own change" — which is harder to catch, because it sounds like a status update, not a guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "verifiable sources" actually requires
&lt;/h2&gt;

&lt;p&gt;The article's three closing words — honest uncertainty, verifiable sources, no speculation — are exactly the design constraints I've been building &lt;code&gt;truth&lt;/code&gt; around. But I'd argue you can't get them from a better RAG pipeline, because they're not retrieval properties. They're &lt;em&gt;verification&lt;/em&gt; properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Honest uncertainty&lt;/strong&gt; isn't "the model hedges." It's a system that can say &lt;strong&gt;Refused&lt;/strong&gt; to &lt;em&gt;"this is cleaner"&lt;/em&gt; because there's no measurable subject — and mean it. A verifier that bluffs is worse than none.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verifiable sources&lt;/strong&gt; isn't "cite a chunk." It's a verdict that rests on a binary fact you read directly: a file present/absent in the git diff, an AST symbol that exists or doesn't, an exact value, a command's exit code. Cited to &lt;code&gt;file:line&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No speculation&lt;/strong&gt; isn't a prompt instruction the model can ignore. It's a hard architectural rule: a fixed-rule engine decides the verdict, not a model. The agent cannot talk it into a different answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So &lt;code&gt;truth&lt;/code&gt; does the inverse of RAG. RAG retrieves context &lt;em&gt;to help the model answer&lt;/em&gt;. &lt;code&gt;truth&lt;/code&gt; retrieves evidence &lt;em&gt;to check what the model claimed&lt;/em&gt; — against the real code, the working-tree git diff, recorded command runs, and logs — and returns &lt;strong&gt;Supported / Contradicted / Refused&lt;/strong&gt;, every verdict cited.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;truth verify-turn &lt;span class="s2"&gt;"I added the /v1/refund endpoint, set MAX_RETRIES to 3, I only changed src/api.rs, and tests pass"&lt;/span&gt;
&lt;span class="go"&gt;
  ✓ Supported     I added the /v1/refund endpoint  (src/api.rs)
  ✗ Contradicted  set MAX_RETRIES to 3             (src/config.rs:1)
  ✗ Contradicted  I only changed src/api.rs        (src/api.rs)
  ✗ Contradicted  tests pass                       (recorded 2026-06-11 09:14 UTC)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An LLM only parses the sentence into a structured claim. The verdict comes from a deterministic engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The asymmetry the article gestures at, made explicit
&lt;/h2&gt;

&lt;p&gt;"Honest uncertainty acknowledgment" has a sharp edge the article doesn't dwell on: &lt;strong&gt;a verifier that false-accuses gets uninstalled on day one.&lt;/strong&gt; Crying wolf on a truthful agent is the one unforgivable failure.&lt;/p&gt;

&lt;p&gt;So &lt;code&gt;truth&lt;/code&gt; only ever says &lt;code&gt;Contradicted&lt;/code&gt; on a structured binary fact — never on a noisy count (a text scan over-counts comments; a log window samples). &lt;em&gt;"Nobody uses this,"&lt;/em&gt; &lt;em&gt;"updated all 4 call sites,"&lt;/em&gt; &lt;em&gt;"X is unused"&lt;/em&gt; — those surface their evidence as &lt;strong&gt;Inconclusive&lt;/strong&gt;: a suspicion to act on, not a verdict that blocks. And it's measured, not asserted: a labeled corpus of &lt;em&gt;real&lt;/em&gt; agent over-claims holds the false-contradiction rate at &lt;strong&gt;0&lt;/strong&gt; as a hard CI gate. When &lt;code&gt;truth&lt;/code&gt; blocks, it means it.&lt;/p&gt;

&lt;p&gt;This is the same instinct the article has about retrieval noise — "if chunks are too large, retrieval becomes noisy" — applied to verdicts. A noisy verifier is as useless as noisy retrieval. Worse, because it erodes the exact trust it was supposed to provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where they meet
&lt;/h2&gt;

&lt;p&gt;The author is right that code RAG needs preprocessing, structure-awareness, and citations. I'd add one thing to the conclusion: those buy you a &lt;em&gt;better-informed&lt;/em&gt; model, not a &lt;em&gt;trustworthy&lt;/em&gt; one. When the model is editing your repo, the last mile isn't retrieval quality — it's whether you can independently confirm what it told you, with a citation, deterministically, and without it ever crying wolf.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;truth&lt;/code&gt; is that last mile. It runs entirely locally (a single SQLite file in &lt;code&gt;.truth/&lt;/code&gt;; your code never leaves the machine), exposes one MCP tool an agent calls on &lt;em&gt;its own&lt;/em&gt; message before reporting done, and can be a hook the agent can't skip.&lt;/p&gt;

&lt;p&gt;If you're building code RAG, your model's answers are about to get a lot better. Make them checkable too.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/blasrodri/truth" rel="noopener noreferrer"&gt;github.com/blasrodri/truth&lt;/a&gt;&lt;/p&gt;

</description>
      <category>truth</category>
      <category>ai</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
