Chapter 5. The Structural Limits of AI Agents

Why an Advanced Section?

If you've worked through the basics — installation and tool usage — one important question remains.

"Why does an AI Agent sometimes go off in completely the wrong direction?"

The answer is straightforward. AI Agents carry the same kinds of problems that arise when working with people. The misunderstandings, unilateral decisions, and wasted effort you see with human colleagues appear in AI Agents too. The difference is that AI Agents create these problems faster and at greater scale.

The Advanced Section covers these structural limits and how to keep them under control in real-world work.

0. The 4 Structural Limits of AI Agents

Limit ① — Without explicit instructions, it acts on its own judgment

When instructions are incomplete, an AI Agent interprets and executes based on what it already knows.

This happens with people too. If you tell a new hire "take care of this" without further guidance, they'll handle it their own way. By the time you see the result and say "that's not what I meant," the time is already gone.

With AI Agents, the problem is more severe. A person will stop and ask when they hit something they don't know. An AI Agent fills in the gaps with inference and keeps going to the end. It doesn't distinguish between what it inferred and what it was told — it proceeds as if everything is certain.

[With a human colleague]
  "I'm not sure about this part — how should I handle it?" → asks

[With an AI Agent]
  "This wasn't specified... JWT seems appropriate" → just does it
  (the user has no idea this judgment was even made)

Why this is dangerous: The user can't tell which parts the agent decided on its own. The output looks like everything was done according to instructions, but internally there are arbitrary decisions mixed in.

Limit ② — As a conversation grows longer, it prioritizes conversation flow over written guidelines

Even when guidelines are defined in CLAUDE.md, as the conversation accumulates, the agent gives more weight to recent exchanges when deciding what to do next.

This mirrors what happens with people. A team might agree on a set of principles at the start of a project, but those principles gradually fade under the pressure of urgent requests in daily meetings.

An AI Agent's context window is 200K tokens. As the conversation grows, the guidelines document becomes a tiny fraction of the total context, while recent exchanges dominate. The agent centers its next action on the most recent back-and-forth.

[Early in the session]
  CLAUDE.md guideline: "Apply all changes only after approval"
  → follows the guideline well

[Mid-session — after 50 exchanges]
  User: "Fix this too"
  User: "Change that too"
  → a pattern of "execute immediately" takes hold in the conversation flow
  → starts making changes without asking for approval

Why this is dangerous: The user believes the guidelines are still being followed, but the agent may have already effectively abandoned them due to conversational drift.

Limit ③ — When a problem goes unsolved, it falls into a vicious cycle

When a problem resists resolution, the agent ① acts on its own judgment while ② getting buried in the conversation flow, sinking progressively deeper.

The cycle unfolds like this:

[The Vicious Cycle]

  A problem occurs
    ↓
  AI Agent attempts to solve it on its own judgment
    ↓
  Partial fix — addresses symptoms, not the root cause
    ↓
  Derivative problems emerge
    ↓
  Agent attempts to solve those on its own judgment too
    ↓
  Yet more derivative problems...
    ↓
  As the conversation lengthens, the original goal and guidelines are lost
    ↓
  Result: the entire codebase drifts away from the intended design

Human developers can fall into similar traps during debugging: "I fixed that error and broke something else, fixed that and broke something else again..." But a person can reach a point of saying "I think the whole approach here is wrong" and stop. AI Agents are weak at this kind of judgment. They keep digging without stopping, and once they've committed to a direction, they rarely reverse course.

Why this is dangerous: Ten minutes of a vicious cycle can produce code changes that take 30 or more minutes to undo. When the changes span multiple files, even git revert may not cleanly restore the original state.

Limit ④ — Ambiguous parts are filled in by inference (and the agent doesn't say so)

This is an extension of Limit ①, but important enough to call out separately.

When instructions are ambiguous or incomplete, an AI Agent fills in the gaps through inference — and does not tell the user it has done so.

[User's instruction]
> Build a payment API. Integrate with TossPayments.

[Parts filled in by AI Agent's inference — the user has no idea]
  · Payment state management: custom state machine (vs. relying on Toss webhooks)  ← inferred
  · Error handling: fail after 3 retries (vs. fail immediately)                    ← inferred
  · Partial refunds: not implemented (judged to be out of scope)                   ← inferred
  · Concurrent requests: optimistic locking (vs. pessimistic locking)              ← inferred
  · Logging: console.log (vs. structured logger)                                   ← inferred

The output may appear to work fine. But if even one of those five decisions doesn't match the intended design, the cost of finding and fixing it later is high. Architecture-level decisions in particular — state management strategy, concurrency handling — can require a full structural overhaul to change after the fact.

Why this is dangerous: The moment a user thinks "I gave the instruction, so the result must match it," they move on to the next step without verifying. It's usually three or four steps later that the inferred parts cause a problem.

The Common Root Cause Behind All Four Limits

These limits all stem from a single underlying cause:

AI Agents are optimized to "execute plausibly" rather than to say "I don't know."

A person will admit when they don't know something, and ask for confirmation when uncertain. An AI Agent acts as though it knows things it doesn't, and as though it's confident when it isn't.

Every countermeasure covered in the Advanced Section ultimately converges on one principle:

┌──────────────────────────────────────────────────────────┐
│                                                          │
│   Minimize the areas where the AI Agent decides          │
│   on its own, and have a person explicitly make          │
│   any decision that requires judgment.                   │
│                                                          │
│   → Clear instructions (Advanced Ch. 1)                  │
│   → Preserve outputs as files for verification           │
│     (Advanced Ch. 1)                                     │
│   → Provide structured guidelines as decision            │
│     criteria (Advanced Ch. 2)                            │
│                                                          │
└──────────────────────────────────────────────────────────┘