<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: shy The</title>
    <description>The latest articles on DEV Community by shy The (@shy_the_a91bfb236d4eeb5bb).</description>
    <link>https://dev.to/shy_the_a91bfb236d4eeb5bb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3962650%2Fa3db3dac-7b61-4c75-8ba7-bcf038b369f9.png</url>
      <title>DEV Community: shy The</title>
      <link>https://dev.to/shy_the_a91bfb236d4eeb5bb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shy_the_a91bfb236d4eeb5bb"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>shy The</dc:creator>
      <pubDate>Thu, 04 Jun 2026 16:09:11 +0000</pubDate>
      <link>https://dev.to/shy_the_a91bfb236d4eeb5bb/-51ib</link>
      <guid>https://dev.to/shy_the_a91bfb236d4eeb5bb/-51ib</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55" class="crayons-story__hidden-navigation-link"&gt;I Thought My AI Code Reviewer Was Finished. Then a Single Hallucinated Line Number Broke Everything.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;GitHub “Finish-Up-A-Thon” Challenge Submission&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/shy_the_a91bfb236d4eeb5bb" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3962650%2Fa3db3dac-7b61-4c75-8ba7-bcf038b369f9.png" alt="shy_the_a91bfb236d4eeb5bb profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/shy_the_a91bfb236d4eeb5bb" class="crayons-story__secondary fw-medium m:hidden"&gt;
              shy The
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                shy The
                
              
              &lt;div id="story-author-preview-content-3810406" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/shy_the_a91bfb236d4eeb5bb" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3962650%2Fa3db3dac-7b61-4c75-8ba7-bcf038b369f9.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;shy The&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 3&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55" id="article-link-3810406"&gt;
          I Thought My AI Code Reviewer Was Finished. Then a Single Hallucinated Line Number Broke Everything.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devchallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devchallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/githubchallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;githubchallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/typescript"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;typescript&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt;&amp;nbsp;reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              1&lt;span class="hidden s:inline"&gt;&amp;nbsp;comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>I Thought My AI Code Reviewer Was Finished. Then a Single Hallucinated Line Number Broke Everything.</title>
      <dc:creator>shy The</dc:creator>
      <pubDate>Wed, 03 Jun 2026 11:04:20 +0000</pubDate>
      <link>https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55</link>
      <guid>https://dev.to/shy_the_a91bfb236d4eeb5bb/i-thought-my-ai-code-reviewer-was-finished-then-a-single-hallucinated-line-number-broke-everything-2a55</guid>
      <description>&lt;h3&gt;
  
  
  What I Built
&lt;/h3&gt;

&lt;p&gt;Difflens is a LangGraph-powered automated code review pipeline that analyzes pull requests and posts review comments through GitHub Actions.&lt;/p&gt;

&lt;p&gt;I thought building a multi-agent reviewer would be the hard part. I was wrong.&lt;/p&gt;

&lt;p&gt;The real challenge was making sure AI-generated comments could actually survive GitHub's API validation.&lt;/p&gt;

&lt;p&gt;** Why My PR Comments Kept Disappearing **&lt;br&gt;
Getting the pipeline to trigger GitHub Actions felt like a huge win. But then, I noticed a fatal issue: the comments were disappearing.&lt;/p&gt;

&lt;p&gt;The logs showed the LLM was generating brilliant security and logic feedback, but on the actual Pull Request page, nothing was posted. After digging into the Octokit error logs, I found a subtle reliability bug: Coordinate Hallucination.&lt;/p&gt;

&lt;p&gt;Even when the model correctly identified a vulnerability, it couldn't reliably anchor that feedback to a valid line in the Git Diff. My system had a strict VerifierNode designed to block invalid API requests. When it saw these hallucinated out-of-bound coordinates, Octokit threw a hard &lt;code&gt;422 Unprocessable Entity&lt;/code&gt; error: &lt;code&gt;Validation Failed: {"resource":"PullRequestReviewComment","code":"invalid","field":"line"}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It silently dropped the comments. A single hallucinated number was wiping out the entire review pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumtobm0xknnttxvvwokr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumtobm0xknnttxvvwokr.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix: Deterministic Guardrails&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During early testing, I discovered that many otherwise valid review comments were never reaching GitHub because their generated coordinates failed validation.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlkvle4cpy2izadyh012.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlkvle4cpy2izadyh012.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core issue isn't just the AI being 'wrong'—it's that the system trusts non-deterministic output to interact with a strict API. I implemented a deterministic layer to bridge this gap. Instead of hoping the model gets line numbers right, I parse the raw diff into an index map as a source of truth. The ⁠VerifierNode⁠ uses this map to intercept and sanitize agent outputs before they ever hit the GitHub API.&lt;/p&gt;

&lt;p&gt;At the boundary of this layer, I enforce a hard check to strip away any out-of-bounds coordinates or malformed responses before they hit the GitHub API. To generate the 'source of truth' for this check, I implemented the parser below. It processes raw diff hunks into a granular, line-by-line index map. It doesn't just check bounds; it handles line-number offsets to ensure consistent alignment before the agent's feedback is ever posted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseDiffToValidLines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePatch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validLines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// 1. Git Diff Protocol Compatibility: When changes are only 1 line, ',1' is omitted.&lt;/span&gt;
  &lt;span class="c1"&gt;// Regex fix: Added ^ and m flags to anchor at the start of the line, preventing false matches within code content.&lt;/span&gt;
  &lt;span class="c1"&gt;// Make the ',count' for both left and right sides optional capture groups (?:,...)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hunkHeader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^@@ -&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)?&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\+(\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)(?:&lt;/span&gt;&lt;span class="sr"&gt;,&lt;/span&gt;&lt;span class="se"&gt;(\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;))?&lt;/span&gt;&lt;span class="sr"&gt; @@/gm&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;hunkHeader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePatch&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// If the second group (count) is not matched, it means there is only 1 line of change. Default to 1.&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// 2. Design Choice: Why extract only the line numbers after '+'?&lt;/span&gt;
    &lt;span class="c1"&gt;// Because the GitHub PR Review API strict rules require anchoring comments to valid lines in the NEW file (Right Side).&lt;/span&gt;
    &lt;span class="c1"&gt;// This loop fully extracts all context and added lines within the current block.&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;validLines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;validLines&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implementing this &lt;code&gt;parseDiffToValidLines&lt;/code&gt; parser was the real engineering bottleneck. Translating raw unified diffs into a stable TypeScript Set requires meticulous edge-case handling—accounting for context lines, additions, and GitHub API's strict positioning rules.&lt;/p&gt;

&lt;p&gt;Copilot turned out to be most useful when dealing with repetitive regex iterations, edge cases, and test scaffolding. Wrestling with standard diff protocols is a nightmare; handling multi-hunk shifts and avoiding encoding offsets is incredibly tedious.&lt;/p&gt;

&lt;p&gt;Instead of debugging regex boundary errors for days, I laid out the core mathematical constraints, and Copilot did the heavy lifting. We co-authored the precise regex patterns for hunk headers and built an exhaustive suite of unit tests to smoke out boundary shifts. This tight feedback loop compressed days of frustrating manual diff-parsing into a single afternoon of rapid iteration.&lt;br&gt;
&lt;em&gt;(If the diff is empty or malformed, the regex match fails gracefully, returning an empty set and blocking the comment pipeline entirely.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Result:From Defensive Filtering to Proactive Injection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During early testing across 5 intentionally varied dummy pull requests, 9 out of 14 generated comments were rejected due to invalid coordinates. While the sample size is small, it was enough to expose a structural reliability issue. That's a 64% silent failure rate—brilliant engineering ideas lost in the ether simply because the AI couldn't read the Git Diff index accurately. &lt;/p&gt;

&lt;p&gt;The breakthrough came when I inverted the architecture. By shifting the deterministic &lt;code&gt;validLines&lt;/code&gt; map upstream, I injected it directly into the LLM's prompt context as a hard constraint. The system transitioned from a "try-and-fail" model to a "pre-verified" execution model. By forcing the agent to anchor its reasoning to the deterministic list &lt;em&gt;before&lt;/em&gt; generating coordinates, the coordinate hallucination issue was effectively neutralized⁠.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Next Horizon: From Hackathon to Production-Grade
&lt;/h3&gt;

&lt;p&gt;While the deterministic guardrail solved our immediate coordinate hallucination crisis, taking Difflens to a true enterprise-grade standard requires a structured architectural evolution. &lt;/p&gt;

&lt;p&gt;I am currently mapping out the next phase: implementing cross-file comment deduplication using &lt;code&gt;path + line + content&lt;/code&gt; hashing, and building a hallucination telemetry pipeline to systematically trace prompt drift. These aren't just features—they are the safety nets required to transition from a single-agent prototype to a production-ready, multi-agent orchestrator. &lt;/p&gt;

&lt;p&gt;Copilot was helpful here as well, particularly for prototyping validation flows and data structures around comment deduplication and telemetry collection.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Impact (Before vs. After)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before (initial benchmark across 5 multi-file PRs):&lt;/strong&gt; 14 suggestions generated ➔ 5 posted (&lt;strong&gt;64% lost due to hallucination&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After:&lt;/strong&gt; 14 suggestions generated ➔ 14 posted (&lt;strong&gt;0% lost during testing&lt;/strong&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Demo
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Automated PR Feedback (GitHub Action):&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuf7msbdfz85kquws0x8a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuf7msbdfz85kquws0x8a.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Engineering Implementation (VS Code):&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8i1a1pjdcodjrvmjykp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8i1a1pjdcodjrvmjykp.png" alt=" " width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;(Note on the output: The system successfully intercepted an empty validation array. This demonstrates the deterministic guardrail in action—rather than firing a broken API call, the bot executed a graceful fallback.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I spent days wondering why brilliant code reviews were disappearing into the void, all because of a single hallucinated line number. Building Difflens taught me that we need to stop expecting LLMs to be perfect. The turning point wasn't a better prompt; it was accepting the AI's flaws. &lt;/p&gt;

&lt;p&gt;By letting the agents do the creative reasoning, but forcing their output through a ruthless, old-school validation layer, the silent failures finally stopped. Blindly trusting AI "magic" is a production hazard. We have to build a solid, deterministic box for that magic to safely operate in.This is where Copilot continues to be useful—it acted as an engineering accelerator,helping me build the rigid boundaries and safety nets needed to actually make a multi-agent system production-ready.&lt;/p&gt;

&lt;p&gt;I'm not done yet, though. The next major headache is handling coordinate drift when a PR gets rebased mid-review. If you are building LLM pipelines and wrestling with similar Git parsing nightmares, I would love to hear how you are tackling them in the comments below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Source Code
&lt;/h3&gt;

&lt;p&gt;You can check out the full implementation of this deterministic guardrail here: &lt;a href="https://github.com/ywu593412-afk/difflens" rel="noopener noreferrer"&gt;Explore Difflens on GitHub&lt;/a&gt;. If this deep-dive helped you debug your own pipeline, I'd appreciate a ⭐️ on the project.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>ai</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Why AI Code Review Tools Keep Commenting on Lines That Don’t Exist</title>
      <dc:creator>shy The</dc:creator>
      <pubDate>Mon, 01 Jun 2026 11:51:51 +0000</pubDate>
      <link>https://dev.to/shy_the_a91bfb236d4eeb5bb/why-ai-code-review-tools-keep-commenting-on-lines-that-dont-exist-ald</link>
      <guid>https://dev.to/shy_the_a91bfb236d4eeb5bb/why-ai-code-review-tools-keep-commenting-on-lines-that-dont-exist-ald</guid>
      <description>&lt;p&gt;While experimenting with AI-powered code review systems, I kept running into a strange problem.&lt;/p&gt;

&lt;p&gt;The model would generate a perfectly reasonable review comment.&lt;/p&gt;

&lt;p&gt;The code issue was real.&lt;/p&gt;

&lt;p&gt;The explanation made sense.&lt;/p&gt;

&lt;p&gt;But the comment was attached to a line that didn’t exist in the pull request.&lt;/p&gt;

&lt;p&gt;At first, I assumed this was just another example of LLM hallucination.&lt;/p&gt;

&lt;p&gt;After digging deeper, I found something more specific.&lt;/p&gt;

&lt;p&gt;The Problem Isn’t Code Understanding&lt;/p&gt;

&lt;p&gt;Most modern LLMs are surprisingly good at understanding code changes.&lt;/p&gt;

&lt;p&gt;They can often identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Potential bugs&lt;/li&gt;
&lt;li&gt;Missing edge cases&lt;/li&gt;
&lt;li&gt;Naming issues&lt;/li&gt;
&lt;li&gt;Logic problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The strange part was that many review comments correctly identified a problem while referencing the wrong line.&lt;/p&gt;

&lt;p&gt;The model understood what was wrong.&lt;/p&gt;

&lt;p&gt;It failed to understand where it was.&lt;/p&gt;

&lt;p&gt;The Unified Diff Trap&lt;/p&gt;

&lt;p&gt;Most AI review systems operate on unified diffs.&lt;/p&gt;

&lt;p&gt;A simplified example looks like this:&lt;br&gt;
@@ -120,7 +120,8 @@&lt;br&gt;
-const timeout = 3000;&lt;br&gt;
+const timeout = 10000;&lt;br&gt;
 initialize();&lt;br&gt;
Humans rarely think about line coordinates because GitHub handles them automatically.&lt;/p&gt;

&lt;p&gt;For an LLM, however, the situation is different.&lt;/p&gt;

&lt;p&gt;The model must reconstruct file positions using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hunk headers&lt;/li&gt;
&lt;li&gt;Added lines&lt;/li&gt;
&lt;li&gt;Deleted lines&lt;/li&gt;
&lt;li&gt;Context lines&lt;/li&gt;
&lt;li&gt;Running offsets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single counting mistake can shift every coordinate that follows.&lt;/p&gt;

&lt;p&gt;What I Observed&lt;/p&gt;

&lt;p&gt;Across repeated testing, several patterns appeared frequently:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deleted-Line References&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model sometimes generated comments that pointed to deleted lines.&lt;/p&gt;

&lt;p&gt;The feedback itself was often valid.&lt;/p&gt;

&lt;p&gt;The target location wasn’t.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Coordinate Drift&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Large diffs increased the error rate significantly.&lt;/p&gt;

&lt;p&gt;After enough additions and deletions, line references would gradually drift away from the intended location.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Out-of-Range Targets&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Occasionally, comments referenced line numbers that simply didn’t exist inside the patch.&lt;/p&gt;

&lt;p&gt;These comments could not be attached to the pull request at all.&lt;/p&gt;

&lt;p&gt;Why Prompt Engineering Wasn’t Enough&lt;/p&gt;

&lt;p&gt;My first instinct was to improve prompting.&lt;/p&gt;

&lt;p&gt;I tried:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More explicit instructions&lt;/li&gt;
&lt;li&gt;Structured outputs&lt;/li&gt;
&lt;li&gt;Additional examples&lt;/li&gt;
&lt;li&gt;Coordinate reminders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The error rate improved.&lt;/p&gt;

&lt;p&gt;It never disappeared.&lt;/p&gt;

&lt;p&gt;The reason seems straightforward.&lt;/p&gt;

&lt;p&gt;Predicting text and maintaining exact positional bookkeeping are fundamentally different tasks.&lt;/p&gt;

&lt;p&gt;A model can understand a code issue while simultaneously making a counting error.&lt;/p&gt;

&lt;p&gt;A Different Approach&lt;/p&gt;

&lt;p&gt;Eventually I stopped treating coordinates as trusted output.&lt;/p&gt;

&lt;p&gt;Instead of assuming the model was correct, I added a deterministic verification step.&lt;/p&gt;

&lt;p&gt;Every generated review comment is checked against the actual diff structure before being returned.&lt;/p&gt;

&lt;p&gt;The validator verifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File existence&lt;/li&gt;
&lt;li&gt;Valid patch coordinates&lt;/li&gt;
&lt;li&gt;Added-line targets&lt;/li&gt;
&lt;li&gt;Hunk boundaries&lt;/li&gt;
&lt;li&gt;Out-of-range references&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a comment fails validation, it is either corrected or discarded.&lt;/p&gt;

&lt;p&gt;The goal isn’t to make the reviewer smarter.&lt;/p&gt;

&lt;p&gt;The goal is to prevent invalid comments from reaching the pull request.&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;One lesson stood out during this project:&lt;/p&gt;

&lt;p&gt;Semantic understanding and coordinate accuracy are different problems.&lt;/p&gt;

&lt;p&gt;LLMs are often better at the first than the second.&lt;/p&gt;

&lt;p&gt;As AI tooling becomes more integrated into developer workflows, deterministic validation layers may become just as important as the models themselves.&lt;/p&gt;

&lt;p&gt;I ended up open-sourcing the implementation here:&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/ywu593412-afk/DiffLens" rel="noopener noreferrer"&gt;https://github.com/ywu593412-afk/DiffLens&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m curious whether other developers building AI review systems have encountered similar coordinate-mapping issues.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>softwareengineering</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
