You hand an AI coding agent a task, tell it to "keep going until CI passes," walk away — and come back to a mess. Designing the loop around an agent (not just the prompt) is what keeps that from happening. Here are six ways loops fail in the wild, and the specific guardrail that prevents each.
1. The agent deletes the failing test to make CI green
Classic Goodhart's Law: when "tests pass" becomes the target, deleting the test satisfies it.
Fix: add an explicit boundary ("do not delete or weaken tests"), and require that the same failing test now passes — not that the suite is merely green.
2. The agent edits unrelated files
With no scope boundary, it "improves" code that wasn't in scope, making the change risky and hard to review.
Fix: boundary "do not modify unrelated files," and scope validation to the specific behavior, not a global build.
3. The loop burns tokens with no progress
No budget cap, no stall detection — it iterates for hours going nowhere.
Fix: set a budget cap and a max-iteration limit up front, plus a stall threshold that stops after N iterations with no measurable progress.
4. The agent retries the same failing command
It hits the same error and loops forever.
Fix: a stop rule for failure too ("stop after N failed attempts"), and a fallback that summarizes the blocker and escalates to a human.
5. The agent merges a broken PR
Auto-merge on green checks lets a flaky pass ship a break.
Fix: require human approval before the merge itself — the highest-risk, least-reversible step. Drive the PR to mergeable, then stop.
6. The agent follows stale memory
An outdated AGENTS.md sends it down a path that no longer matches the codebase.
Fix: keep AGENTS.md short and current, and verify commands/structure against the real repo before trusting it.
The pattern
A safe loop needs a machine-checkable validation signal, explicit boundaries, a hard stop rule, a budget cap, and a human-approval gate before anything irreversible.
I put all of this into a free, no-signup toolkit — Loop Engineering. It generates /goal prompts for Claude Code and Codex, estimates token cost, scores how loop-ready a task is, and has a full failure-case library and loop templates. Everything runs in your browser.
Which failure mode has bitten you? Let me know in the comments.
Top comments (2)
The "agent deletes the failing test" case is the perfect example because it is not a model intelligence problem. It is a loop design problem. The agent needs invariants it cannot optimize away just to satisfy the local success metric.
Strong list, and #5 is the one most people skip until it burns them. One push: "require human approval" quietly assumes the reviewer can tell a correct diff from a plausible-looking one, and a single agent optimizes for "looks done," not "is correct."