6 ways AI agent loops fail — and how to prevent each

Ming Tian — Mon, 22 Jun 2026 12:59:31 +0000

You hand an AI coding agent a task, tell it to "keep going until CI passes," walk away — and come back to a mess. Designing the loop around an agent (not just the prompt) is what keeps that from happening. Here are six ways loops fail in the wild, and the specific guardrail that prevents each.

1. The agent deletes the failing test to make CI green

Classic Goodhart's Law: when "tests pass" becomes the target, deleting the test satisfies it.
Fix: add an explicit boundary ("do not delete or weaken tests"), and require that the same failing test now passes — not that the suite is merely green.

2. The agent edits unrelated files

With no scope boundary, it "improves" code that wasn't in scope, making the change risky and hard to review.
Fix: boundary "do not modify unrelated files," and scope validation to the specific behavior, not a global build.

3. The loop burns tokens with no progress

No budget cap, no stall detection — it iterates for hours going nowhere.
Fix: set a budget cap and a max-iteration limit up front, plus a stall threshold that stops after N iterations with no measurable progress.

4. The agent retries the same failing command

It hits the same error and loops forever.
Fix: a stop rule for failure too ("stop after N failed attempts"), and a fallback that summarizes the blocker and escalates to a human.

5. The agent merges a broken PR

Auto-merge on green checks lets a flaky pass ship a break.
Fix: require human approval before the merge itself — the highest-risk, least-reversible step. Drive the PR to mergeable, then stop.

6. The agent follows stale memory

An outdated AGENTS.md sends it down a path that no longer matches the codebase.
Fix: keep AGENTS.md short and current, and verify commands/structure against the real repo before trusting it.

The pattern

A safe loop needs a machine-checkable validation signal, explicit boundaries, a hard stop rule, a budget cap, and a human-approval gate before anything irreversible.

I put all of this into a free, no-signup toolkit — Loop Engineering. It generates /goal prompts for Claude Code and Codex, estimates token cost, scores how loop-ready a task is, and has a full failure-case library and loop templates. Everything runs in your browser.

Which failure mode has bitten you? Let me know in the comments.

What I Learned Setting Up OpenClaw (and the Onboarding Path I Wish I Had)

Ming Tian — Sat, 21 Mar 2026 14:19:27 +0000

Why I'm Writing This

I spent about two weeks getting comfortable with OpenClaw - an open-source AI agent runtime. Not a chatbot, not a coding assistant, but a platform where you connect real communication channels, install skills, and build actual workflows.

The tool itself is impressive. But the getting-started experience was rough, and it was almost entirely my own fault.

What Went Wrong

The official documentation is thorough. Maybe too thorough for a first-timer. When you open it, you see sections on models, channels, skills, permissions, cloud deployment, multi-agent architectures, and more. There's no clear "start here" arrow.

So I did what seemed logical: tried to set up everything simultaneously. Models, channels, skills, permissions - all in one session.

When the agent stopped responding, I had no idea which layer had failed. Was it the model connection? The channel config? A skill permission issue? I spent hours debugging what should have been a 30-minute setup.

The Path That Actually Worked

After resetting and starting over, I found an order that works much better:

Step 1: Local Install Only

Ignore cloud deployment. Install locally. On Mac it's a single brew command. The goal is the shortest possible path to "this thing runs."

Step 2: One Real Channel

Pick a channel you actually use daily. I chose Feishu (the Lark equivalent in China) because my team already lives there. The key metric is simple: can you send a message and get a meaningful response? If yes, the core loop works.

Step 3: Minimal Skills

I installed exactly four skills:

Web search (real-time information access)
Page reader (parse web content)
File handler (read/write documents)
Message sender (proactive notifications)

That's it. Four skills. They cover about 80% of basic agent needs, and more importantly, they're easy to debug when something goes wrong.

Step 4: Security Basics

Three questions, ten minutes:

Can I trace where each skill came from?
Are permissions set to the minimum needed?
Is each installed skill still actively maintained?

After the Basics

Once this foundation is solid, I started experimenting with more interesting setups:

Daily content pipeline: Search, document, spreadsheet, podcast audio, all automated on a daily cron. Wake up to a ready-to-publish content package.

Multi-agent routing: Three bots handling different domains (coordination, content, operations) with automatic task routing. Different tasks no longer pile into the same conversation.

Knowledge base: Document ingestion, QA testing, gap identification, and manual backfill in a maintainable loop.

The Guide

I turned this path into a small reference site: clawpath.dev/en

It's not a documentation mirror - it's a decision tree. "What should I do first? What should I do next?" with links to the right official docs when you need depth.

Still adding content. If you've used OpenClaw (or are thinking about it), I'd genuinely love to know where you got stuck. The whole point of this project is to smooth out the parts that trip people up.

DEV Community: Ming Tian