During a live demo last week, our sales‑assistant agent kept asking the same clarification question for 4 minutes, exhausting the API quota and costing the client $2,180 in extra usage.
1️⃣ Unbounded Recursion in Prompt Templates
Why template self‑reference spirals
Prompt engineering feels like stitching together a puzzle, but a single placeholder that feeds its own output back into the next request is a classic trap. When the template contains something like {{conversation_history}} and you concatenate the entire previous prompt each turn, you create a geometric series of tokens. The model never sees a clean “stop” condition; it just keeps expanding the context until the provider‑imposed max‑tokens ceiling forces a truncation or a timeout.
In our own travel‑booking agent, the bug manifested as a 12‑token bump per iteration. After ten cycles the prompt swelled by 120 tokens, pushing the request over the 4 K token limit and triggering a 429 rate‑limit error that the orchestrator interpreted as “need more detail.” The loop kept feeding the same question back to the user, and the UI froze.
Detecting the pattern with token counters
The first line of defense is a cheap runtime token counter. Increment a gauge every time you render a prompt and compare it against a static ceiling (e.g., 3 500 tokens). If the delta exceeds a small threshold (≈ 30 tokens) between two consecutive calls, raise a warning. Most teams ignore this metric because it lives outside the model’s response payload, but once you surface it on a Prometheus dashboard the growth becomes impossible to miss.
Data point: 38 % of all timeout alerts trace back to a single recursive placeholder in the prompt string.
Real‑world example
A travel‑booking agent re‑injects the full user query into the next prompt, causing the token count to grow by 12 tokens per iteration until the model hits the max token limit. The fix was to isolate the user query in a separate variable and only append the new system instructions, cutting the per‑iteration growth to zero.
2️⃣ Missing Termination Conditions in State Machines
Finite‑state vs. event‑driven loops
Most orchestration layers are built on finite‑state machines (FSMs) because they give you a clear picture of where an execution can go. However, developers often assume that every external call will eventually produce a “success” event. When a downstream service returns an unexpected status code, the FSM can silently re‑enter the same state, creating an invisible loop.
Our order‑fulfillment workflow waited for a payment confirmation. The payment gateway, during a brief outage, returned HTTP 202 Accepted (meaning “processing”) instead of the expected 200 OK. The state machine was coded to transition on 200 only, so it fell back to the same “await‑payment” node, which immediately re‑issued the request, similar to what we documented in our WhatsApp agent stack, similar to what we documented in our AI trust audits. The loop added roughly 2 seconds per cycle.
Instrumentation with Prometheus counters
Add a counter per state transition: state_transitions_total{from="await_payment",to="await_payment"}. If a particular edge spikes above a baseline (e.g., > 0.1 Hz), fire an alert. In our case the hidden self‑loop added 2 s per cycle, inflating total runtime by 13×.
Data point: The monitoring dashboard showed 187 ms average latency per transition, but a hidden self‑loop added 2 s per cycle, inflating total runtime by 13×.
Real‑world example
An order‑fulfillment workflow never exits the “await‑payment” state when the payment service returns a 202 Accepted instead of 200 OK. The remedy was to treat any 2xx as a terminal success or to add an explicit timeout that forces a transition to a “payment‑failed” fallback after 30 seconds.
3️⃣ Over‑eager Retry Policies on Transient Errors
Exponential backoff misconfiguration
Retry logic is supposed to smooth out flaky endpoints, but when the backoff curve is too aggressive you end up hammering the same service until the orchestrator’s own timeout fires. A common mistake is to set the initial delay to 100 ms and the multiplier to 1.5, but then forget to cap the max delay. On a 429 Too Many Requests response the agent will spin, each retry spawning another request that again receives 429, and the cycle never breaks.
Circuit‑breaker as a safety net
Implement a circuit‑breaker that trips after N consecutive failures (e.g., 5) and stays open for a configurable cool‑down period. The breaker should surface a distinct error code that the FSM can handle, rather than silently looping back into the same step.
Data point: $4,200/mo extra spend was incurred when the agent retried a flaky vector‑search endpoint 15 times per request.
Real‑world example
A recommendation agent treats every 429 response as retryable, leading to a cascade where the same query loops through the cache layer indefinitely. The fix was to add a Retry-After header parser, cap retries at three, and route the request to a fallback index when the circuit is open.
4️⃣ Shared Memory Leaks Between Agents
Global context pollution
When multiple agents read and write to the same key‑value store without namespacing, they effectively share a global brain. A stale “last_intent” entry written by a marketing bot can be consumed by a support bot minutes later, causing the latter to think the user is still in a previous conversation branch. The result is a phantom loop that only surfaces under load because the shared cache fills up with obsolete blobs.
Scoped context patterns
The antidote is to enforce strict scoping: prepend a unique session ID or agent identifier to every cache key (session:{session_id}:last_intent). Additionally, set TTLs that match the expected conversation length (usually 5–10 minutes). Periodic cache eviction jobs prevent the store from ballooning.
Data point: 12 deployments suffered from cross‑talk bugs after a shared Redis cache grew to 1.3 GB of stale session blobs.
Real‑world example
Two independent chatbots read/write the same “last_intent” key, causing one bot to re‑trigger the other's fallback loop. After moving to a namespaced key schema and adding a 600‑second TTL, the issue vanished. The pattern is now documented in our internal best‑practice guide, which we reference whenever we spin up a new agent on the voice agent platform.
5️⃣ Fixes: Guardrails, Timeouts, and Idempotent Design
Static analysis of prompts
Run a linter that flags any placeholder appearing more than once in a prompt template, or any placeholder that references the entire prior prompt. Tools like promptlint can be integrated into the CI pipeline; a failing lint job stops the merge before the buggy prompt hits production.
Runtime watchdogs
A watchdog is a lightweight async wrapper that monitors elapsed time and iteration count for each step. If the step exceeds max_iterations or timeout_sec, the wrapper aborts and reports a LoopAbort metric, similar to what we documented in our agent ops in production. In our production fleet, a 5‑second watchdog cut infinite‑loop incidents from 9 per week to 0 in the following sprint.
Idempotent action contracts
Every external call should be idempotent or explicitly marked as non‑idempotent. When an action is retried, the downstream service must either safely ignore duplicates or return a deterministic “already‑done” response. This eliminates the need for the orchestrator to guess whether a retry succeeded, which is a frequent source of hidden loops.
Data point: Implementing a 5‑second watchdog cut infinite‑loop incidents from 9 per week to 0 in the following sprint — see our our voice stack for the full breakdown.
Real‑world example
Adding a loop_counter variable to the orchestrator and aborting when it exceeds 7 prevented the sales‑assistant from looping during the demo. The counter is exposed as a Prometheus gauge (agent_loop_counter) so we can see at a glance when a particular request is approaching the safety ceiling.
A reusable guard: @loop_guard decorator
import asyncio
import time
from prometheus_client import Counter, Gauge
# Prometheus metrics
loop_abort_counter = Counter(
"agent_loop_aborts_total",
"Number of times a loop was aborted by the guard",
["agent_name"]
)
loop_iteration_gauge = Gauge(
"agent_loop_iteration",
"Current iteration count for a guarded step",
["agent_name"]
)
class LoopAbort(RuntimeError):
"""Raised when a step exceeds its iteration or time limits."""
def loop_guard(max_iterations: int = 5, timeout_sec: float = 3.0):
"""
Decorator for async agent steps.
- Stops after `max_iterations` cycles.
- Cancels if the step runs longer than `timeout_sec`.
Logs iteration count and raises LoopAbort on breach.
"""
def decorator(func):
async def wrapper(*args, **kwargs):
agent_name = func.__module__
start = time.monotonic()
for i in range(1, max_iterations + 1):
loop_iteration_gauge.labels(agent_name=agent_name).set(i)
try:
# Enforce per‑iteration timeout
return await asyncio.wait_for(
func(*args, **kwargs),
timeout=timeout_sec
)
except asyncio.TimeoutError:
# Timeout on this iteration, continue loop
if i == max_iterations:
loop_abort_counter.labels(agent_name=agent_name).inc()
raise LoopAbort(
f"{agent_name} exceeded {max_iterations} iterations "
f"or {timeout_sec}s per iteration"
)
# Small sleep to avoid tight CPU loop (optional)
await asyncio.sleep(0.01)
# If we exit the loop without returning, abort
loop_abort_counter.labels(agent_name=agent_name).inc()
raise LoopAbort(
f"{agent_name} hit iteration limit ({max_iterations})"
)
return wrapper
return decorator
# Example usage
@loop_guard(max_iterations=7, timeout_sec=5)
async def fetch_recommendations(query: str):
# call to external vector search service
...
The snippet above is the kind of thing we now sprinkle over every async step that talks to an external system. The Prometheus counters give us a clear signal when a guard trips, and the LoopAbort exception bubbles up to the orchestrator where we transition to a safe fallback state.
TL;DR
Instrument every orchestration hop, enforce hard iteration caps, and treat timeouts as first‑class failures—this alone eliminated all runaway loops in our production fleet.
Top comments (1)
Loop-forever is the single most expensive agent failure because it fails silently into your bill rather than crashing, so a root-cause breakdown is genuinely useful. The ones I keep hitting map to yours, I'd bet: no clear termination condition (the agent doesn't know what "done" looks like), oscillation between two states (does A, undoes A, repeats), retrying a deterministically-failing action expecting a different result, and goal drift where it loses the original objective and wanders. The fixes split into two families - better stopping criteria (the agent knows when to quit) and an external circuit breaker (a step/cost cap that kills it regardless of what it "thinks").
My strong opinion: you need both, but the external cap is non-negotiable, because you can't trust the thing that's looping to correctly decide it should stop looping. That belt-and-suspenders is exactly how I build Moonshift, the thing I work on - a multi-agent pipeline that takes a prompt to a deployed SaaS, with idle-watchdogs and hard step/cost caps so a wedged agent gets killed by the harness, not by its own judgment (which is why a full build stays ~$3 flat, first run free no card). Genuinely useful post - this is the failure that bites everyone once. Of your 4, which did you find hardest to detect? Oscillation is the sneaky one for me; it looks like progress.