<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gregory Shevchenko</title>
    <description>The latest articles on DEV Community by Gregory Shevchenko (@gshevchenko).</description>
    <link>https://dev.to/gshevchenko</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949227%2Fe4b28cd9-b923-4f06-844a-11d9617804f9.jpeg</url>
      <title>DEV Community: Gregory Shevchenko</title>
      <link>https://dev.to/gshevchenko</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gshevchenko"/>
    <language>en</language>
    <item>
      <title>AI Agent Failure Loops: When Persistence Becomes a Quality Bug</title>
      <dc:creator>Gregory Shevchenko</dc:creator>
      <pubDate>Sun, 24 May 2026 16:16:51 +0000</pubDate>
      <link>https://dev.to/gshevchenko/ai-agent-failure-loops-when-persistence-becomes-a-quality-bug-4mmg</link>
      <guid>https://dev.to/gshevchenko/ai-agent-failure-loops-when-persistence-becomes-a-quality-bug-4mmg</guid>
      <description>&lt;p&gt;In 2026, I want my AI coding agents to have one more rule: know when to stop.&lt;/p&gt;

&lt;p&gt;AI agents do not always fail by stopping.&lt;/p&gt;

&lt;p&gt;Sometimes they fail by continuing.&lt;/p&gt;

&lt;p&gt;I ran into this while building a custom Cyrillic font extension for a real brand system. The task looked concrete: make Cyrillic letters, Latin letters, numerals, and special symbols feel like one editorial type family.&lt;/p&gt;

&lt;p&gt;Claude Code and Codex kept working. They generated files, exported proofs, reported progress, and fixed the last visible complaint.&lt;/p&gt;

&lt;p&gt;But the same defect class kept returning.&lt;/p&gt;

&lt;p&gt;That is the dangerous version of an AI-agent failure loop: the workflow looks productive while the real quality problem survives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a failure loop?
&lt;/h2&gt;

&lt;p&gt;A failure loop is a repeated pattern where an agent keeps producing new candidate fixes while the same underlying defect remains unresolved.&lt;/p&gt;

&lt;p&gt;It usually has five steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user rejects the same kind of defect again.&lt;/li&gt;
&lt;li&gt;The agent patches the latest symptom.&lt;/li&gt;
&lt;li&gt;The proof gate is too weak to catch the issue.&lt;/li&gt;
&lt;li&gt;The agent asks for another manual review.&lt;/li&gt;
&lt;li&gt;Everyone spends another cycle on the same problem.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One mistake is normal.&lt;/p&gt;

&lt;p&gt;The real process bug appears when the agent continues after its validation system has already failed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why normal proof loops can fail
&lt;/h2&gt;

&lt;p&gt;Proof loops are useful. Tests, screenshots, build checks, linting, diffs, and generated reports all matter.&lt;/p&gt;

&lt;p&gt;But proof loops can also become theater if they measure the wrong thing.&lt;/p&gt;

&lt;p&gt;In my font project, the agent could prove that the font compiled, the PDF rendered, the screenshot existed, bounding boxes changed, and a numeric score improved.&lt;/p&gt;

&lt;p&gt;That did not prove the letters looked right.&lt;/p&gt;

&lt;p&gt;Users were rejecting a different thing: visual consistency.&lt;/p&gt;

&lt;p&gt;Some Cyrillic glyphs felt too short, too thick, too loosely spaced, or structurally wrong next to Latin letters.&lt;/p&gt;

&lt;p&gt;If the gate cannot see the defect the human keeps seeing, the gate is not allowed to declare the task done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rule I now use
&lt;/h2&gt;

&lt;p&gt;After the same visible defect class appears twice, stop normal implementation.&lt;/p&gt;

&lt;p&gt;Do not make one more speculative patch.&lt;/p&gt;

&lt;p&gt;Do not relax the threshold.&lt;/p&gt;

&lt;p&gt;Do not ask the user to inspect another candidate artifact.&lt;/p&gt;

&lt;p&gt;Switch into failure-loop breaker mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a failure-loop breaker does
&lt;/h2&gt;

&lt;p&gt;A failure-loop breaker is a hard mode switch for AI-agent work.&lt;/p&gt;

&lt;p&gt;A better next output is a diagnostic package, not another candidate fix.&lt;/p&gt;

&lt;p&gt;It should include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the repeated failure class;&lt;/li&gt;
&lt;li&gt;a rejected corpus of known-bad examples;&lt;/li&gt;
&lt;li&gt;a red-first gate that fails on those examples;&lt;/li&gt;
&lt;li&gt;a fix that turns the gate green;&lt;/li&gt;
&lt;li&gt;blind or independent validation when the author has seen the answer;&lt;/li&gt;
&lt;li&gt;a clear continue, stop, or human-decision recommendation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not only a retry limit.&lt;/p&gt;

&lt;p&gt;A retry limit stops cost growth. A failure-loop breaker changes the work itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The red-first gate matters
&lt;/h2&gt;

&lt;p&gt;A useful gate must fail before the fix, because otherwise it has not proven that it can see the old failure.&lt;/p&gt;

&lt;p&gt;If the agent cannot make the new checker fail on previous bad artifacts, it has not built a checker for the real problem.&lt;/p&gt;

&lt;p&gt;Many agent workflows skip this part.&lt;/p&gt;

&lt;p&gt;They add a new metric, see the new candidate score higher, and call it progress. The metric was never forced to reject the old failure.&lt;/p&gt;

&lt;p&gt;For subjective or visual tasks, this matters even more because the rejected corpus becomes the bridge between human taste and deterministic validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  When the agent is contaminated
&lt;/h2&gt;

&lt;p&gt;Another trap is contaminated validation: the same agent writes the fix, knows the target, and grades the result.&lt;/p&gt;

&lt;p&gt;That can be useful during iteration, but it is not independent validation.&lt;/p&gt;

&lt;p&gt;If the agent has already seen the expected answer, the final check needs a deterministic gate with withheld examples, a blind reviewer, a separate model that does not receive the author reasoning, or a human decision when the requirement is taste rather than computation.&lt;/p&gt;

&lt;p&gt;Same-author validation is often self-consistency, not proof.&lt;/p&gt;

&lt;h2&gt;
  
  
  I packaged this as a small public skill
&lt;/h2&gt;

&lt;p&gt;I turned the rule into a small public repo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/g-shevchenko/agent-failure-loop-breaker" rel="noopener noreferrer"&gt;https://github.com/g-shevchenko/agent-failure-loop-breaker&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It installs a compact skill and repo-local rules for Claude Code, Codex, Cursor, and Windsurf.&lt;/p&gt;

&lt;p&gt;Its installed rule is deliberately simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If the same defect class appears twice, the agent must stop normal patching and build a rejected corpus plus a red-first gate before continuing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This package is not meant to make the model smarter.&lt;/p&gt;

&lt;p&gt;It makes the workflow less willing to confuse motion with progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where companies go wrong
&lt;/h2&gt;

&lt;p&gt;Teams often treat agent persistence as an asset by default.&lt;/p&gt;

&lt;p&gt;That is reasonable for well-scoped implementation tasks with strong tests. It is risky for work where the acceptance criterion is visual, editorial, architectural, or operational.&lt;/p&gt;

&lt;p&gt;If Claude Code, Codex, Cursor, or Windsurf keeps failing the same class of review, the next investment should go into the validation contract.&lt;/p&gt;

&lt;p&gt;The best prompt in the world will still loop when the gate rewards the wrong artifact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this helps
&lt;/h2&gt;

&lt;p&gt;This pattern is useful for UI polish loops, visual regression work, PDF and presentation generation, typography systems, content QA, and agentic coding tasks where the same bug returns.&lt;/p&gt;

&lt;p&gt;Here is the signal:&lt;/p&gt;

&lt;p&gt;If the user says “this is still the same problem” twice, the process should change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical takeaway
&lt;/h2&gt;

&lt;p&gt;Do not ask an AI agent to “keep trying” forever.&lt;/p&gt;

&lt;p&gt;Ask it to prove that its checker can catch the last failed attempt.&lt;/p&gt;

&lt;p&gt;If it cannot, the next task is not implementation.&lt;/p&gt;

&lt;p&gt;At that point, the next task is building a better gate.&lt;/p&gt;

&lt;p&gt;Full write-up:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://gregshevchenko.com/notes/ai-agent-failure-loop-breakers/" rel="noopener noreferrer"&gt;https://gregshevchenko.com/notes/ai-agent-failure-loop-breakers/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
