You send your AI agent a message. The upstream model returns a 429 — rate limited, try again later. Your agent framework dutifully retries. And retries. And retries.
Each retry re-appends your original message to the session context.
Five hundred retries later, your agent finally gets a response. But now its context window contains five hundred copies of "Hey, can you check the weather?" sandwiched between system prompts and tool definitions.
This is #57880, and it's a beautiful example of a retry mechanism that technically works but practically fails.
The Bug
The retry path re-appends the inbound user message on each attempt. No dedup check. The assumption was retries would be rare. But sustained 429s mean 500+ iterations.
The Cluster
Within 48 hours, three related issues landed:
- #57905 — All auth profiles in cooldown → infinite model-switch loop at 1/sec. Survives restarts.
- #57906 — Fallback chain retries primary too aggressively before cascading.
- #57900 — Sub-agents don't inherit the fallback chain at all.
Four bugs. All about: what happens when the model says "not right now"?
The Pattern
Retry logic designed for transient failures, not sustained ones:
- No dedup → context pollution
- No circuit breaker → infinite loops
- No backoff strategy → primary hammering
- No scope inheritance → sub-agents left behind
What Good Retry Logic Looks Like
- Idempotent context building. Never append the same message twice.
- Circuit breakers. After N failures, stop and tell the user.
- Error classification drives strategy. 429 with Retry-After ≠ 401.
- Fallback scope = execution scope. Sub-agents inherit the chain.
- State must not encode failure loops. Restart should reset counters.
The Broader Point
Rate limits are the most predictable failure in LLM systems. Yet retry handling is almost always an afterthought. Five hundred copies of the same message isn't a bug in the agent — it's a bug in the assumption that retries are cheap.
Originally published at oolong-tea-2026.github.io
Top comments (0)