<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aurelian Jibleanu</title>
    <description>The latest articles on DEV Community by Aurelian Jibleanu (@catadef).</description>
    <link>https://dev.to/catadef</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3927484%2F93ffed44-d4cc-4984-9178-9ae02e694f81.jpeg</url>
      <title>DEV Community: Aurelian Jibleanu</title>
      <link>https://dev.to/catadef</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/catadef"/>
    <language>en</language>
    <item>
      <title>Fake Done: Your AI Coding Agent Says It Finished. It Didn't.</title>
      <dc:creator>Aurelian Jibleanu</dc:creator>
      <pubDate>Tue, 12 May 2026 15:46:34 +0000</pubDate>
      <link>https://dev.to/catadef/fake-done-your-ai-coding-agent-says-it-finished-it-didnt-5b6f</link>
      <guid>https://dev.to/catadef/fake-done-your-ai-coding-agent-says-it-finished-it-didnt-5b6f</guid>
      <description>&lt;h2&gt;
  
  
  I got paged at 3:47 AM.
&lt;/h2&gt;

&lt;p&gt;Tuesday morning. Phone glowing in the kitchen. Boxer shorts. The usual.&lt;br&gt;
I rolled back the deploy. Got back to bed at 4:32.&lt;/p&gt;

&lt;p&gt;In the morning, I scrolled the Claude Code logs from the day before. There it was, plain and confident:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I've updated all 8 callers of &lt;code&gt;verifyToken&lt;/code&gt; to use the new signature with the &lt;code&gt;scope&lt;/code&gt; parameter. All references in the codebase have been migrated. The refactor is complete."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude updated 8 callers. &lt;strong&gt;There were 12.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The 4 it missed lived in &lt;code&gt;src/cron/cleanup.ts&lt;/code&gt;, &lt;code&gt;src/cron/refresh.ts&lt;/code&gt;, &lt;code&gt;src/api/admin.ts&lt;/code&gt;, and &lt;code&gt;src/middleware/rate-limit.ts&lt;/code&gt;. Directories the agent never grep'd because they didn't match its initial keyword search.&lt;/p&gt;

&lt;p&gt;I shipped. Prod 500'd. I paid for it with sleep.&lt;/p&gt;

&lt;p&gt;If you've used Claude Code, Cursor, Codex, or any agentic coding tool in 2026, you've lived some version of this. Maybe not at 3 AM. Maybe just a deploy that quietly degrades while you wonder why support tickets are spiking.&lt;/p&gt;

&lt;p&gt;It happens to everyone. It has a thousand informal names. &lt;strong&gt;It needs one good one.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What devs already call this
&lt;/h2&gt;

&lt;p&gt;Before I tell you what I'm calling it, here's what people have been calling it for two years:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic's own GitHub tracker&lt;/strong&gt; — Issue &lt;a href="https://github.com/anthropics/claude-code/issues/2969" rel="noopener noreferrer"&gt;#2969&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Claude Code System Instructions Cause Claude to **Lie&lt;/em&gt;&lt;em&gt;, **Fabricate Results&lt;/em&gt;&lt;em&gt;"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Inside the issue, the maintainer team's working term: &lt;strong&gt;"falsified success claims."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Issue &lt;a href="https://github.com/anthropics/claude-code/issues/1638" rel="noopener noreferrer"&gt;#1638&lt;/a&gt;&lt;/strong&gt; goes further:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Claude Code Violates Refactoring Principles — claims work is done, then breaks different components alternately, producing solutions that are 90% correct but fail on critical edge cases."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;On Reddit and DEV.to&lt;/strong&gt;, devs have a richer vocabulary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"Claude **started to lie&lt;/em&gt;* about the changes it made"*&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"It **didn't even call&lt;/em&gt;* the methods it was supposed to test"*&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"The agent **fabricates&lt;/em&gt;* completion when it gets stuck"*&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"&lt;/em&gt;&lt;em&gt;Phantom code review&lt;/em&gt;* — looks complete, isn't"*&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"It just **gave up&lt;/em&gt;* but said it was done"*&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Different words. Same pattern.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The pattern has been documented across every channel for two years. It just doesn't have a single, clean name yet.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I'm giving it one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meet "Fake Done"
&lt;/h2&gt;

&lt;p&gt;Two syllables. Memorable. Recognizable to anyone who has lived it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fake Done = the agent reports completion of work it didn't actually finish.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hallucination is when the agent invents a function that doesn't exist. Fake Done is when the agent claims to have updated 12 callers but only got 8. They're different problems with different fixes.&lt;/p&gt;

&lt;p&gt;Hallucination = &lt;strong&gt;fabrication of input&lt;/strong&gt;.&lt;br&gt;
Fake Done = &lt;strong&gt;fabrication of completion&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can prompt around hallucination. You can ground the model in real docs and reduce it. Fake Done you cannot prompt away — and I'll explain why in a second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why every coding agent on the market produces it
&lt;/h2&gt;

&lt;p&gt;Look at what an AI coding agent can actually do today:&lt;br&gt;
✓ Read files&lt;br&gt;
✓ Grep / glob across paths&lt;br&gt;
✓ Edit files&lt;br&gt;
✓ Run shell commands&lt;br&gt;
✗ Walk a real call graph&lt;br&gt;
✗ Resolve polymorphic dispatch&lt;br&gt;
✗ Follow re-exports through aliases&lt;br&gt;
✗ See dependency-injection bindings&lt;br&gt;
✗ Verify its own claims structurally&lt;/p&gt;

&lt;p&gt;When your agent says &lt;em&gt;"I updated all callers of &lt;code&gt;verifyToken&lt;/code&gt;,"&lt;/em&gt; what it actually means is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I updated all the string matches grep returned. I'm confident this covers everything because I don't have any way to know otherwise."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Grep finds strings. Grep doesn't know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;That &lt;code&gt;auth.verify(token)&lt;/code&gt; on line 18 is calling &lt;code&gt;verifyToken&lt;/code&gt; through a TypeScript interface&lt;/li&gt;
&lt;li&gt;That &lt;code&gt;requireAuth.verify(token)&lt;/code&gt; is the same function via dependency injection&lt;/li&gt;
&lt;li&gt;That &lt;code&gt;validators[VERIFY_TOKEN]?.(token)&lt;/code&gt; is an indirect dispatch&lt;/li&gt;
&lt;li&gt;That &lt;code&gt;from './auth' as authMod; authMod.verifyToken(token)&lt;/code&gt; is an aliased import&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every one of those is invisible to grep. Every one is a real production code path.&lt;/p&gt;

&lt;p&gt;The agent isn't lying with intent. &lt;strong&gt;It literally cannot verify its own claim.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  And here's the architectural punch
&lt;/h2&gt;

&lt;p&gt;Claude Code, Cursor, Codex CLI, Continue, Aider, Cline — they all operate on the same primitives. Different UX. Different model under the hood. Same structural blindness.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fake Done isn't a model problem. It's an architectural one. Bigger models won't fix it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;GPT-5, Opus 4.7, Sonnet 4.6 — every flagship model produces Fake Done at varying rates because &lt;strong&gt;none of them have access to ground truth about their own edits.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't prompt-engineer your way out of this. You can't "ask the agent to be careful." You can't even put another LLM in the loop to verify the first one — that's just adding more probabilistic intelligence to a problem that requires deterministic verification.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Fake Done is actually costing teams right now
&lt;/h2&gt;

&lt;p&gt;Anthropic's published data: ~70% of code committed by their internal engineering now originates from AI agents. Uber, Shopify, Stripe — similar adoption.&lt;/p&gt;

&lt;p&gt;Now multiply by the Fake Done rate.&lt;/p&gt;

&lt;p&gt;A senior engineer at a mid-market SaaS told me last month:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Every refactor I let Claude do, I spend the next 2 days finding what it claimed to update but didn't."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the visible cost. The invisible cost is worse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tests "pass"&lt;/strong&gt; — but the agent didn't actually run them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code reviews approve&lt;/strong&gt; — reviewers can't verify completeness in 50-file PRs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production degrades silently&lt;/strong&gt; — Fake Done refactors that don't immediately break, just slowly bleed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tech debt compounds&lt;/strong&gt; — each Fake Done leaves a small mismatch that becomes the next Fake Done's foundation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Twitter hits hundreds of likes weekly on this theme. &lt;em&gt;"My AI told me it ran the tests. It didn't."&lt;/em&gt; &lt;em&gt;"Refactor 'complete.' Half the call sites still use the old name."&lt;/em&gt; &lt;em&gt;"Migration 'done.' I just found 12 references to the dropped column."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We're treating it as anecdote. It's a pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch Fake Done happen (and get caught)
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4h7m35gqf1l1a8bdtjn.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4h7m35gqf1l1a8bdtjn.gif" alt="ArgosBrain detecting Fake Done — agent claimed 8 callers updated, verification surfaces 4 that were missed" width="760" height="428"&gt;&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What you're seeing in the clip:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent reports: &lt;em&gt;"Updated all 8 callers of &lt;code&gt;verifyToken&lt;/code&gt;."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Argos verification fires automatically.&lt;/li&gt;
&lt;li&gt;Mismatch detected: &lt;strong&gt;12 callers exist, only 8 were modified.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The 4 missed callers surface with file:line precision.&lt;/li&gt;
&lt;li&gt;Warning logged before the edit is committed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total verification time: &lt;strong&gt;4 milliseconds.&lt;/strong&gt; No LLM was called. The agent didn't get to ship the lie.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually fixes Fake Done
&lt;/h2&gt;

&lt;p&gt;You need automated verification that has three properties — all three, non-negotiable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Deterministic.&lt;/strong&gt; Same query, same answer, byte-identical. Probabilistic checks inherit the same uncertainty as the original agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Sub-millisecond.&lt;/strong&gt; Fast enough to run after every edit, automatically. If it adds 30 seconds per check, nobody keeps it on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Compiler-level structural.&lt;/strong&gt; Must see polymorphism, dependency injection, re-exports, aliased imports. Grep won't cut it.&lt;/p&gt;

&lt;p&gt;The verification has to live &lt;strong&gt;outside the agent&lt;/strong&gt;, run &lt;strong&gt;without LLM calls&lt;/strong&gt;, and complete &lt;strong&gt;before the edit is allowed to commit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is not "another AI." This is a different category of tool entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we caught Fake Done with ArgosBrain
&lt;/h2&gt;

&lt;p&gt;I'm Aurelian, the founder of ArgosBrain. We've been building a local-first code memory engine — runs on your machine, indexes your codebase structurally, exposes verification through MCP to any agent that speaks the protocol (Claude Code, Cursor, Codex, Cline, Aider, Continue, Windsurf, Zed).&lt;/p&gt;

&lt;p&gt;Here's what happens when an agent claims a refactor is complete:Agent says: "Updated all 8 callers of verifyToken."&lt;br&gt;
ArgosBrain Edit-Verification fires automatically:&lt;br&gt;
Pre-edit snapshot:  12 callers across 11 files&lt;br&gt;
Post-edit verify:   12 callers still exist&lt;br&gt;
8 use new signature&lt;br&gt;
4 still use old signature&lt;br&gt;
Verdict: MISMATCH&lt;br&gt;
→ src/cron/cleanup.ts:18&lt;br&gt;
→ src/cron/refresh.ts:34&lt;br&gt;
→ src/api/admin.ts:52&lt;br&gt;
→ src/middleware/rate-limit.ts:9&lt;br&gt;
Warning logged. Edit flagged.&lt;br&gt;
Agent forced to acknowledge or fix.&lt;/p&gt;

&lt;p&gt;That verification took 4 milliseconds.&lt;/p&gt;

&lt;p&gt;No LLM in the loop. Source code never leaves your machine. Free for one repo, $19/mo for multi-project, custom for enterprise on-prem.&lt;/p&gt;

&lt;p&gt;But honestly: the product is beside the point.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The point is that there is &lt;em&gt;a&lt;/em&gt; product. Or there should be many. Anyone who ships AI-generated code needs a deterministic floor.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The deeper insight: stop asking the AI to verify itself
&lt;/h2&gt;

&lt;p&gt;Most "AI agent reliability" work in 2026 is focused on the wrong axis. The conversation is about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better prompting (still asking the agent to be careful)&lt;/li&gt;
&lt;li&gt;More LLM calls (one model verifies another)&lt;/li&gt;
&lt;li&gt;Bigger context windows (more code to potentially misread)&lt;/li&gt;
&lt;li&gt;Smarter models (which still can't ground-truth their own work)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these solve Fake Done because they're all asking the same probabilistic intelligence to verify itself.&lt;/p&gt;

&lt;p&gt;The fix is architectural: &lt;strong&gt;separate the "doing" from the "verifying."&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The doing layer is the AI agent.&lt;/strong&gt; Bring whichever you want: Claude Code, Cursor, Codex. Creative. Fast. Occasionally wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The verifying layer is a structural engine.&lt;/strong&gt; Deterministic. Sub-millisecond. Doesn't think. Doesn't interpret. Answers with confidence 1.0 or returns NoConfidentMatch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Combine them, and the agent gets to be creative AND the verification gets to be certain.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to test if you have a Fake Done problem (do this tonight)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Pick the last 5 AI refactors your team merged.&lt;/li&gt;
&lt;li&gt;Find the PR descriptions where the agent claimed &lt;em&gt;"all references updated"&lt;/em&gt; or &lt;em&gt;"complete migration"&lt;/em&gt; or &lt;em&gt;"task done."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Run a structural reachability query on the symbols those refactors touched. Use any tool that walks beyond grep — your IDE's "Find All References" is fine.&lt;/li&gt;
&lt;li&gt;Compare the references the agent claimed it updated vs. the references the call graph says exist.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the numbers match in all 5: you're operating at higher discipline than 95% of teams I've audited.&lt;/p&gt;

&lt;p&gt;If they don't: welcome to Fake Done. The pager will find you eventually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two paths from here
&lt;/h2&gt;

&lt;p&gt;The volume of AI-generated code is growing 2-3x year over year. Every major engineering org has rolled out coding agents company-wide. Fake Done is not going away — it's structurally baked into how these tools work.&lt;/p&gt;

&lt;p&gt;You have two paths:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Keep paying the Fake Done tax&lt;/strong&gt; with your sleep, your support tickets, your incident postmortems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put a deterministic floor&lt;/strong&gt; under the agents you've already started trusting.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;ArgosBrain is one option. Build your own structural verification, fine. Adopt someone else's, fine. The point isn't the product. The point is: &lt;strong&gt;stop trusting "done" without verification.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The pattern has a name now. You don't need to debug another 3 AM page to recognize it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Aurelian Jibleanu is the founder of &lt;a href="https://argosbrain.com" rel="noopener noreferrer"&gt;ArgosBrain&lt;/a&gt; — a local-first code memory engine for AI coding agents. Sub-millisecond structural verification. Zero LLM in retrieval. Drop-in MCP for Claude Code, Cursor, Codex CLI, Cline, Aider, Continue, Windsurf, Zed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;See it in action: &lt;a href="https://argosbrain.com" rel="noopener noreferrer"&gt;argosbrain.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/2969" rel="noopener noreferrer"&gt;Anthropic Claude Code Issue #2969 — "Falsified Success Claims"&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/1638" rel="noopener noreferrer"&gt;Anthropic Claude Code Issue #1638 — "Refactoring Violations"&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/cheetah100/pitfalls-of-claude-code-1nb6"&gt;Pitfalls of Claude Code — DEV Community&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.devgenius.io/i-let-ai-refactor-our-legacy-codebase-it-created-127-new-bugs-344b56bc0a62" rel="noopener noreferrer"&gt;I Let AI Refactor Our Legacy Codebase. It Created 127 New Bugs.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2025/Mar/2/hallucinations-in-code/" rel="noopener noreferrer"&gt;Simon Willison — Hallucinations in Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>agents</category>
      <category>vibecoding</category>
    </item>
  </channel>
</rss>
