<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akash Khanna</title>
    <description>The latest articles on DEV Community by Akash Khanna (@akahkhanna).</description>
    <link>https://dev.to/akahkhanna</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4012589%2F2c3d32e0-bd94-45e5-8dc8-f71db7c27c33.png</url>
      <title>DEV Community: Akash Khanna</title>
      <link>https://dev.to/akahkhanna</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akahkhanna"/>
    <language>en</language>
    <item>
      <title>My coding agent said "everything preserved." A method was already gone.</title>
      <dc:creator>Akash Khanna</dc:creator>
      <pubDate>Fri, 03 Jul 2026 08:27:08 +0000</pubDate>
      <link>https://dev.to/akahkhanna/my-coding-agent-said-everything-preserved-a-method-was-already-gone-5851</link>
      <guid>https://dev.to/akahkhanna/my-coding-agent-said-everything-preserved-a-method-was-already-gone-5851</guid>
      <description>&lt;p&gt;I've spent most of two decades in enterprise Java, where "done" meant it is working and you have done what you were asked for.  It's in production, it works, and you can prove it. So when I started leaning on a coding agent and it told me "done — everything preserved, tests pass,"my instinct was the one every long-tenured engineer has: &lt;em&gt;show me.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here's the one that actually stung. I had Claude collapse three classes into one — it had grown too complex — and later split another apart. Copy-paste the whole refactored class, done in the chat window and pasted into my IDE, not in an agent that touches the repo directly. It reported everything preserved. I believed it. Days later, something broke in a path no test covered, and I traced it to a method that  wasn't there anymore. And this happened even when I asked three different agents to compare and even with the max usage and the best of models I could use at that point. It hadn't made the trip during the merge. Nothing flagged it, because nothing forced the claim to be checked. I just prayed the refactor was clean, and it wasn't.&lt;/p&gt;

&lt;p&gt;The thing that clicked: &lt;strong&gt;a claim is not evidence, and a rule in the agent's context is not enforcement.&lt;/strong&gt; "Everything preserved" is an assertion about a mapping the agent reshaped and never verified. "Tests pass" is a claim about a run that, it turned out, never happened that session. I'd been treating the agent's summary as if it were the ground truth. It's the press release, not the primary source.&lt;/p&gt;

&lt;p&gt;So I built a small thing to read the primary source. It's a Claude Code Stop hook. When the agent says it's finished, the hook reads the actual &lt;code&gt;git diff&lt;/code&gt; and the session transcript — not the summary — and checks the claim against what really changed, before the turn is allowed to end. It's deterministic: no model sits in the check itself, which matters, because the same confident narration that talks an LLM reviewer into "looks good" has nothing to grab onto in a diff parser.&lt;/p&gt;

&lt;p&gt;On the refactoring pain specifically, here's the honest scope, so I too do not sound like the agent I faced an issue with. It doesn't try to detect "a method was dropped" — that needs intent, and guessing intent is where false positives come from. It detects the observable &lt;em&gt;consequence&lt;/em&gt;: a call that no longer resolves, in a turn that claimed nothing changed. That one reframe makes the messy cases fall out correctly — a rename with the callers updated stays silent; a rename that &lt;em&gt;missed&lt;/em&gt; a caller fires on the real broken call; a merge or a split doesn't matter, because it checks whether the symbol is defined anywhere in the whole tree, not just where it used to live. What it can't catch, and says so plainly: a method dropped with no caller left behind — no dangling reference, no textual signal, invisible to a diff. It trades recall for precision on purpose. It would not have caught &lt;em&gt;every&lt;/em&gt; possible silent drop in my refactor — but it would have caught the one that broke, because something still called it.&lt;/p&gt;

&lt;p&gt;The broader set of things it flags, in plain terms: "tests pass" when no test ran this session, stubs and TODOs in new code, a file the agent said it changed that isn't in the diff, imports pointing at nothing, hardcoded secrets, and rules from your own project docs that got overridden anyway.&lt;/p&gt;

&lt;p&gt;What surprised me is how much company this idea has now. There's a wave of people landing in the same place — that you shouldn't ask the same agent to write code and verify it, because it has no incentive to fail its own work; that the validator needs separate context; that the answer is guardrails that make the bad behaviour hard, not prompts that ask nicely. A year ago this felt like a personal irritation. Now it feels like a category forming, which is a good sign I'm not the only one muttering "show me" at a green checkmark.&lt;/p&gt;

&lt;p&gt;It's open source (MIT), it's deterministic, and in-session it's tamper-&lt;em&gt;evident&lt;/em&gt;, not tamper-&lt;em&gt;proof&lt;/em&gt; , not perfect and I am still fixing issues as I find them— the hook shares a filesystem with the agent it audits, so the real guarantee lives in the pre-commit/CI rung, in a separate trust domain (that's in there too). It checks that the agent did what it &lt;em&gt;said&lt;/em&gt; — not that what it said was &lt;em&gt;right&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Link's below. The one thing I actually want back: tell me where it false-flags you. False positives are what kill a tool like this, and I'd rather hear about them than not.&lt;br&gt;
&lt;a href="https://github.com/akahkhanna/groundtruth" rel="noopener noreferrer"&gt;https://github.com/akahkhanna/groundtruth&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>claude</category>
      <category>coding</category>
    </item>
  </channel>
</rss>
