The Retry Pattern That Breaks Your Agent (And the One That Works)

#ai #agents #llm #engineering

Every agent framework has a retry button. Most of them are wrong.

When an agent fails, the instinct is to retry. Same prompt, same inputs, same tools. Maybe it was a fluke. Maybe the model had a bad moment.

This approach works occasionally. But it breaks systematically.

Why naive retry fails:

Determinism in nondeterminism. LLMs are stochastic, but they are not random. Given the same context, they tend toward similar outputs. If the first attempt failed due to context issues, retrying with the same context often fails again.
Error compounding. Each failed attempt adds to the conversation history. The model now has more context about what went wrong. Sometimes this helps. Often it biases the next attempt toward similar failure modes.
No introspection. Naive retry does not ask why it failed. It just tries again. But the model cannot fix what it does not understand.

The pattern that actually works:

Instead of retry, use reflect-then-retry.

Capture the failure. What was the expected output? What went wrong?
Ask the model to diagnose. Why did this fail? What would need to change?
Modify the approach. Add context, adjust the prompt, or change the tool parameters.
Retry with the new context.

This is not just retry. It is guided retry.

Implementation sketch:

Most frameworks expose the error but not the reasoning. The fix is simple: inject a reflection step.

When a tool call fails:

Do not immediately retry
Ask the model: "The previous action failed with error X. What should we try differently?"
Let the model propose a new approach
Execute that approach

Why this matters:

Agents fail. That is inevitable. What separates reliable agents from fragile ones is not the failure rate, but the recovery rate.

An agent that fails 30 percent of the time but recovers 80 percent of failures is more reliable than an agent that fails 10 percent of the time but never recovers.

The retry pattern is not a detail. It is the architecture.

Build agents that learn from failure, not ones that repeat it.