<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: shy The</title>
    <description>The latest articles on DEV Community by shy The (@shy_the_a91bfb236d4eeb5bb).</description>
    <link>https://dev.to/shy_the_a91bfb236d4eeb5bb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3962650%2Fa3db3dac-7b61-4c75-8ba7-bcf038b369f9.png</url>
      <title>DEV Community: shy The</title>
      <link>https://dev.to/shy_the_a91bfb236d4eeb5bb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shy_the_a91bfb236d4eeb5bb"/>
    <language>en</language>
    <item>
      <title>Why AI Code Review Tools Keep Commenting on Lines That Don’t Exist</title>
      <dc:creator>shy The</dc:creator>
      <pubDate>Mon, 01 Jun 2026 11:51:51 +0000</pubDate>
      <link>https://dev.to/shy_the_a91bfb236d4eeb5bb/why-ai-code-review-tools-keep-commenting-on-lines-that-dont-exist-ald</link>
      <guid>https://dev.to/shy_the_a91bfb236d4eeb5bb/why-ai-code-review-tools-keep-commenting-on-lines-that-dont-exist-ald</guid>
      <description>&lt;p&gt;While experimenting with AI-powered code review systems, I kept running into a strange problem.&lt;/p&gt;

&lt;p&gt;The model would generate a perfectly reasonable review comment.&lt;/p&gt;

&lt;p&gt;The code issue was real.&lt;/p&gt;

&lt;p&gt;The explanation made sense.&lt;/p&gt;

&lt;p&gt;But the comment was attached to a line that didn’t exist in the pull request.&lt;/p&gt;

&lt;p&gt;At first, I assumed this was just another example of LLM hallucination.&lt;/p&gt;

&lt;p&gt;After digging deeper, I found something more specific.&lt;/p&gt;

&lt;p&gt;The Problem Isn’t Code Understanding&lt;/p&gt;

&lt;p&gt;Most modern LLMs are surprisingly good at understanding code changes.&lt;/p&gt;

&lt;p&gt;They can often identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Potential bugs&lt;/li&gt;
&lt;li&gt;Missing edge cases&lt;/li&gt;
&lt;li&gt;Naming issues&lt;/li&gt;
&lt;li&gt;Logic problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The strange part was that many review comments correctly identified a problem while referencing the wrong line.&lt;/p&gt;

&lt;p&gt;The model understood what was wrong.&lt;/p&gt;

&lt;p&gt;It failed to understand where it was.&lt;/p&gt;

&lt;p&gt;The Unified Diff Trap&lt;/p&gt;

&lt;p&gt;Most AI review systems operate on unified diffs.&lt;/p&gt;

&lt;p&gt;A simplified example looks like this:&lt;br&gt;
@@ -120,7 +120,8 @@&lt;br&gt;
-const timeout = 3000;&lt;br&gt;
+const timeout = 10000;&lt;br&gt;
 initialize();&lt;br&gt;
Humans rarely think about line coordinates because GitHub handles them automatically.&lt;/p&gt;

&lt;p&gt;For an LLM, however, the situation is different.&lt;/p&gt;

&lt;p&gt;The model must reconstruct file positions using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hunk headers&lt;/li&gt;
&lt;li&gt;Added lines&lt;/li&gt;
&lt;li&gt;Deleted lines&lt;/li&gt;
&lt;li&gt;Context lines&lt;/li&gt;
&lt;li&gt;Running offsets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single counting mistake can shift every coordinate that follows.&lt;/p&gt;

&lt;p&gt;What I Observed&lt;/p&gt;

&lt;p&gt;Across repeated testing, several patterns appeared frequently:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deleted-Line References&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model sometimes generated comments that pointed to deleted lines.&lt;/p&gt;

&lt;p&gt;The feedback itself was often valid.&lt;/p&gt;

&lt;p&gt;The target location wasn’t.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Coordinate Drift&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Large diffs increased the error rate significantly.&lt;/p&gt;

&lt;p&gt;After enough additions and deletions, line references would gradually drift away from the intended location.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Out-of-Range Targets&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Occasionally, comments referenced line numbers that simply didn’t exist inside the patch.&lt;/p&gt;

&lt;p&gt;These comments could not be attached to the pull request at all.&lt;/p&gt;

&lt;p&gt;Why Prompt Engineering Wasn’t Enough&lt;/p&gt;

&lt;p&gt;My first instinct was to improve prompting.&lt;/p&gt;

&lt;p&gt;I tried:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More explicit instructions&lt;/li&gt;
&lt;li&gt;Structured outputs&lt;/li&gt;
&lt;li&gt;Additional examples&lt;/li&gt;
&lt;li&gt;Coordinate reminders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The error rate improved.&lt;/p&gt;

&lt;p&gt;It never disappeared.&lt;/p&gt;

&lt;p&gt;The reason seems straightforward.&lt;/p&gt;

&lt;p&gt;Predicting text and maintaining exact positional bookkeeping are fundamentally different tasks.&lt;/p&gt;

&lt;p&gt;A model can understand a code issue while simultaneously making a counting error.&lt;/p&gt;

&lt;p&gt;A Different Approach&lt;/p&gt;

&lt;p&gt;Eventually I stopped treating coordinates as trusted output.&lt;/p&gt;

&lt;p&gt;Instead of assuming the model was correct, I added a deterministic verification step.&lt;/p&gt;

&lt;p&gt;Every generated review comment is checked against the actual diff structure before being returned.&lt;/p&gt;

&lt;p&gt;The validator verifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File existence&lt;/li&gt;
&lt;li&gt;Valid patch coordinates&lt;/li&gt;
&lt;li&gt;Added-line targets&lt;/li&gt;
&lt;li&gt;Hunk boundaries&lt;/li&gt;
&lt;li&gt;Out-of-range references&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a comment fails validation, it is either corrected or discarded.&lt;/p&gt;

&lt;p&gt;The goal isn’t to make the reviewer smarter.&lt;/p&gt;

&lt;p&gt;The goal is to prevent invalid comments from reaching the pull request.&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;One lesson stood out during this project:&lt;/p&gt;

&lt;p&gt;Semantic understanding and coordinate accuracy are different problems.&lt;/p&gt;

&lt;p&gt;LLMs are often better at the first than the second.&lt;/p&gt;

&lt;p&gt;As AI tooling becomes more integrated into developer workflows, deterministic validation layers may become just as important as the models themselves.&lt;/p&gt;

&lt;p&gt;I ended up open-sourcing the implementation here:&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/ywu593412-afk/DiffLens" rel="noopener noreferrer"&gt;https://github.com/ywu593412-afk/DiffLens&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m curious whether other developers building AI review systems have encountered similar coordinate-mapping issues.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>softwareengineering</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
