dengkui yang

Posted on May 2

Why AI Agents Get Stuck in Loops

#agentloop

Subtitle: An ontology-inspired view of why repeated action is not the same thing as recovery.

Author note: This article is written for AI builders, prompt engineers, automation teams, and founders experimenting with long-running AI agents.

Summary

Most AI agent loops are described as planning bugs, reasoning bugs, or memory bugs.

Sometimes they are.

But many looping failures have a more structural cause:

the agent can keep acting, but it cannot convert world feedback into internal adjustment.

That is why the behavior looks so repetitive. The agent retries the same tool, rewrites the same plan, makes the same assumption sound more careful, and burns more tokens without becoming more correct.

From an ontology-inspired perspective, the problem is not only whether the agent can produce the next action.

The problem is whether the agent knows what kind of thing a failure is:

a broken assumption
a boundary condition
a validation failure
a stop signal
a reason to escalate

If failure is not represented correctly, the agent will keep moving without really changing.

That is what a loop is.

1. Looping Is Not Persistence

From the outside, looping can look like effort.

The agent keeps going. It tries again. It produces more output. It searches longer. It calls more tools. It explains itself more confidently.

But persistence and looping are not the same thing.

Persistence means the agent remains committed to the goal while changing its internal model after new feedback.

Looping means the agent remains committed to the motion while refusing, or failing, to change its internal model.

That distinction matters.

An agent can be energetic, articulate, and even superficially rational while still being trapped in a dead behavioral cycle.

The hidden question is not:

Can the agent do another step?

The hidden question is:

Did the last step change what the agent now believes about the task?

If the answer is no, repeated action is usually just a prettier failure.

2. The Failure Pattern

Here is a common example.

User:
Pull customer records from the CRM, summarize the churn risks,
and send a short report.

Agent:
Understood. I will retrieve the records and prepare the report.

Step 1:
The agent calls the CRM API.

Feedback:
403 permission denied.

Bad recovery:
The agent retries.
Then rewrites the request and retries.
Then searches internal notes and retries again.
Then says the tool may be unstable and retries again.

At this point, the agent is still active, but nothing essential has changed.

Reality has already provided a strong signal:

This path is blocked by an authorization boundary.

Yet the agent does not turn that signal into a different internal state.

It does not move from:

"task in progress"

to:

"blocked by permission"
"needs escalation"
"must choose alternative path"

It simply repeats action without transformation.

Diagram: Retry Loop vs Recovery Path

The important difference is not whether the agent keeps moving.

It is whether the movement contains state transition.

3. Where the Loop Actually Forms

Most people look for the loop at the level of output.

I think the loop usually forms one layer deeper.

It forms at the point where feedback should have become self-revision, but did not.

An agent normally needs a chain like this:

Act
Receive feedback
Classify the meaning of the feedback
Update assumptions or boundaries
Choose a different next move

Looping happens when step 3 or step 4 is weak.

The agent receives a signal, but the signal never becomes an ontological event inside the agent.

It is treated as noise, friction, or a temporary inconvenience rather than as evidence that the task model itself must change.

Diagram: The Feedback Conversion Gap

That gap is where many agent loops live.

The world speaks.
The agent hears something.
But nothing structural is updated.

So the next action remains a variation of the previous action.

4. A Small Ontology of Recovery

When I use the word ontology here, I do not mean an abstract metaphysical system.

I mean a practical map of what the agent treats as real inside a task.

For recovery, at least six things need to exist inside the agent's model:

Goal: What should be preserved or achieved?
Assumption: What am I currently presuming is true?
Boundary: What am I not allowed, not able, or not ready to do?
Feedback: What did the world just reveal?
State transition: What must change inside me now?
Next move: Continue, narrow, ask, replan, stop, or escalate.

If one of these is missing, loops become much more likely.

For example:

if boundary is weak, the agent retries forbidden paths
if assumption is weak, the agent never notices what became false
if state transition is weak, the agent narrates failure without changing behavior
if next-move selection is weak, the agent keeps producing action-shaped noise

The loop is not caused by a lack of words.

It is caused by a lack of structure.

5. Why Prompt Patches Often Make Loops Worse

When an agent loops, the reflex is to add more instructions:

do not retry too many times
think step by step
reflect before acting
ask for help if blocked
verify your answer

These patches can help in narrow cases.

But they often fail because they remain external commands.

A prompt can say:

If something goes wrong, fix it.

But that is not the same as giving the agent a reliable method for answering:

What kind of wrong is this?
Which assumption failed?
Which boundary appeared?
Is this a recoverable obstacle or a stop condition?
Should I continue, ask, narrow scope, or escalate?

Long prompts often increase behavioral surface area without improving transition quality.

That is why some agents become more verbose in failure rather than more adaptive.

They gain more language for retrying, not more architecture for changing.

6. What This Changes in Training

If the main problem is failed internal adjustment, then training should not focus only on successful task completion.

It should also focus on failure transitions.

Instead of asking only whether the agent eventually got the answer, we should ask:

Did the agent classify the failure correctly?
Did it name the broken assumption?
Did it detect a boundary?
Did it update its state?
Did it choose a meaningfully different next move?
Did it know when to stop and escalate?

This changes the role of teacher AI.

The teacher should not only reward good outputs.
It should also interrogate the student's recovery logic.

Diagram: Teacher-Student Recovery Training

The key teaching question becomes:

What changed in the world, and what should therefore change in you?

That is the center of recovery-oriented agent training.

7. A Small Example

Here is a compact teacher-student pattern.

Teacher:
The agent called a tool three times and got a 403 each time.
What happened?

Student:
The retrieval failed. It should try again with a clearer request.

Teacher:
That is an action answer, not a recovery answer.
What did the world reveal?

Student:
The current path is blocked by a permission boundary.
Retrying will not produce new information.

Teacher:
Good. What should change internally?

Student:
The agent should update its state from "task in progress"
to "authorization blocked," stop retrying, record the boundary,
and ask for access or choose another route.

The critical move is not the next tool call.

The critical move is the state transition.

That is the difference between motion and learning.

8. Open Question

I suspect a large share of agent loops come from missing self-adjustment architecture rather than missing intelligence in the narrow sense.

If that is right, then "better reasoning" alone may not be the main fix.

We may need agents with a clearer ontology of:

goals
assumptions
boundaries
feedback
state transitions
stop conditions

I would be curious where others disagree.

Are agent loops mainly a memory problem, a search problem, a reward problem, or do they reflect a deeper failure to turn feedback into self-revision?

If you have a looping agent example, I can map it to this framework.

DEV Community