Elizabeth Fuentes L for AWS

Posted on May 4

How to Prevent AI Agent Reasoning Loops from Wasting Tokens

#python #aws #tutorial #ai

AI agent reasoning loops occur when an agent calls the same tool repeatedly without making progress, convinced that one more attempt will produce the perfect answer. The agent wastes tokens, time, and money without delivering a result. This post shows how to detect and block repeated calls, validated with a demo where ambiguous tools caused 14 calls vs clear SUCCESS states that stopped in 2.

This demo uses Strands Agents. The patterns — debounce hooks, clear tool states, and call limits — are framework-agnostic and apply to any agent that supports lifecycle hooks, including LangGraph, AutoGen, and CrewAI.

Working code: github.com/aws-samples/sample-why-agents-fail

Series: Why AI Agents Fail

Context Window Overflow — Memory Pointer Pattern for large data
MCP Tools That Never Respond — Async pattern for slow external APIs
AI Agent Reasoning Loops (this post) — Detect and block repeated tool calls

The Problem: Agents That Overthink

AI agent reasoning loops occur when an agent calls the same tool repeatedly without making progress, wasting tokens and time without delivering a result. AI agents don't just fail by giving wrong answers; they fail by never finishing. Research shows agents get trapped in reasoning loops where they call the same tool repeatedly, convinced that "one more step" will produce the perfect answer.

The Decoder (Jan 2025) found that even with unlimited computing power, overthinking leads to poor decisions. Incomplete understanding of the world causes compounding errors. Each additional reasoning step makes things worse, not better.

Particula (Jul 2025) (community observation) documented an extreme case: an agent executed 847 reasoning steps at $47 per minute and never delivered a final answer. It kept refining logic, questioning conclusions, and requesting more data in an endless cycle.

CodiesHub (Dec 2025) (community observation) identifies the root causes:

Unclear goals — agent doesn't know when the task is complete
Ambiguous tool feedback — tools don't return clear success/failure states
No stopping criteria — no hard limits on iterations or time

Why Loops Happen: Ambiguous Tool Feedback

Ambiguous tool feedback occurs when tools return partial results or suggest "more data may be available" without clear terminal states, causing agents to retry the same call. Tools that return partial results or suggest "more data may be available" cause agents to retry:

@tool
def search_flights(origin: str, destination: str, max_price: float) -> str:
    """Search for flights under a max price."""
    prices = [random.randint(200, 800) for _ in range(3)]
    matching = [p for p in prices if p <= max_price]
    # The problem: "More results may be available" signals the LLM to retry
    # The agent interprets this as "I should search again to find a better deal"
    return (
        f"Found {len(matching)} flights under ${max_price} "
        f"(out of {len(prices)} checked). "
        "Note: More results may be available. Prices change frequently."
    )

That "Note: More results may be available" triggers the loop. The agent sees it and thinks: "Maybe if I search again, I'll find a better deal." It retries with the same parameters, gets similar results, and the cycle continues.

Solution 1: Debounce Hook with Strands

Strands Hooks intercept the agent lifecycle at any point. A Debounce Hook uses BeforeToolCallEvent to detect duplicate calls before they execute:

from strands.hooks import HookProvider, BeforeToolCallEvent, BeforeInvocationEvent

class DebounceHook(HookProvider):
    def __init__(self, window_size=3):
        self.call_history = []       # Tracks (tool_name, input) pairs
        self.window_size = window_size  # Sliding window size for duplicate detection
        self.blocked_count = 0

    def register_hooks(self, registry):
        # BeforeInvocationEvent fires once at the start of each agent.invoke() call
        registry.add_callback(BeforeInvocationEvent, self.reset)
        # BeforeToolCallEvent fires before every tool execution — this is where we intercept
        registry.add_callback(BeforeToolCallEvent, self.check_duplicate)

    def reset(self, event):
        # Clear history at the start of each invocation so limits don't bleed across calls
        self.call_history = []

    def check_duplicate(self, event):
        # Build a fingerprint from tool name + exact inputs
        key = (event.tool_use["name"], str(event.tool_use["input"]))
        recent = self.call_history[-self.window_size:]

        if recent.count(key) >= 2:
            # cancel_tool is a native Strands API: blocks execution and returns this message to the LLM
            event.cancel_tool = "BLOCKED: Duplicate call detected"
            self.blocked_count += 1
            return

        self.call_history.append(key)

agent = Agent(tools=[search_flights], hooks=[DebounceHook()])

The hook tracks the last 3 tool calls. If the same tool with the same parameters appears twice, the third attempt is blocked via event.cancel_tool, a native Strands API that prevents tool execution and returns an error message to the LLM.

Solution 2: Clear SUCCESS/FAILED States

Tools that return explicit terminal states help agents know when to stop:

@tool
def book_hotel(hotel: str, guest: str, nights: int) -> str:
    """Book a hotel room. Returns clear SUCCESS or FAILED.

    Returns:
        SUCCESS: Booking confirmed with ID
        FAILED: Booking failed with reason
    """
    if random.random() > 0.15:
        conf = f"HT{random.randint(10000, 99999)}"
        price = random.randint(150, 350)
        return f"SUCCESS: Booking {conf} confirmed — {guest} at {hotel}, {nights} nights, ${price * nights} total"
    return f"FAILED: {hotel} fully booked"

When the agent receives "SUCCESS: Booking HT79265 confirmed", it knows the task is done. No ambiguity, no extra calls.

Solution 3: Hard Limits with LimitToolCounts

CodiesHub recommends: "Iterations, tokens, time, spend are non-negotiable." Strands provides LimitToolCounts in the Hooks Cookbook — a hook that caps tool calls per invocation:

from strands.hooks import HookProvider, BeforeToolCallEvent, BeforeInvocationEvent
from threading import Lock

class LimitToolCounts(HookProvider):
    """Limits tool calls per invocation. From Strands Hooks Cookbook."""

    def __init__(self, max_tool_counts: dict[str, int]):
        # Per-tool call budgets: {"search_flights": 2} means max 2 searches per invocation
        self.max_tool_counts = max_tool_counts
        self.tool_counts = {}
        self._lock = Lock()  # Thread-safe for concurrent tool calls in Swarm scenarios

    def register_hooks(self, registry):
        registry.add_callback(BeforeInvocationEvent, self.reset_counts)
        registry.add_callback(BeforeToolCallEvent, self.intercept_tool)

    def reset_counts(self, event):
        # Reset per invocation so limits apply per task, not per agent lifetime
        with self._lock:
            self.tool_counts = {}

    def intercept_tool(self, event):
        tool_name = event.tool_use["name"]
        with self._lock:
            max_count = self.max_tool_counts.get(tool_name)
            count = self.tool_counts.get(tool_name, 0) + 1
            self.tool_counts[tool_name] = count

            if max_count and count > max_count:
                # Hard ceiling: block the call and tell the LLM explicitly to stop
                event.cancel_tool = f"Tool '{tool_name}' limit reached. DO NOT CALL ANYMORE."

# Enforce a hard limit of 2 flight searches per booking task — prevents runaway costs
limit_hook = LimitToolCounts(max_tool_counts={"search_flights": 2})
agent = Agent(tools=[search_flights], hooks=[limit_hook])

Even if the agent wants to search 10 times, it's capped at 2. Hard ceiling, predictable costs.

Demo Results

We tested with a travel booking agent that searches for flights and hotels:

Scenario	Tool Calls	Time	Result
Ambiguous Feedback	14	21s	Agent retried organically — "prices may change" caused loops
DebounceHook	12	15s	Reduced retries but some variation in parameters
Clear SUCCESS States	2	4s	Agent stopped immediately after SUCCESS
LimitToolCounts	6 (2 blocked)	6s	Hard ceiling enforced — no runaway

The contrast is dramatic: 14 calls with ambiguous tools vs 2 calls with clear SUCCESS states. That is a 7x difference caused purely by tool feedback design.

When to Use Each Solution

DebounceHook — prevents duplicate calls with identical parameters. Use when tools are idempotent and retrying with the same input is wasteful.

Clear SUCCESS/FAILED states — the simplest solution. Design tools to return explicit terminal states. The agent knows when to stop.

LimitToolCounts — hard ceiling on tool calls per invocation. Use in production to prevent runaway costs regardless of tool design. From the Strands Hooks Cookbook.

All three together — defense in depth. Clear states prevent most loops, debounce catches duplicates, and hard limits guarantee bounded execution.

Try It Yourself

You need Python 3.9+, uv, and an OpenAI API key.

git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-fail/stop-ai-agents-wasting-tokens/03-reasoning-loops-demo
uv venv && uv pip install -r requirements.txt
export OPENAI_API_KEY="your-key-here"

uv run python test_reasoning_loops.py   # Runs all 4 scenarios

Or open test_reasoning_loops.ipynb in Jupyter, JupyterLab, VS Code, or your preferred notebook environment.

Key Takeaways

Ambiguous tool feedback causes organic loops — "more results may be available" makes agents retry
14 calls vs 2 calls — clear SUCCESS states reduce calls by 7x in our demo
Hooks intercept before execution — BeforeToolCallEvent.cancel_tool blocks the call before the tool runs. The DebounceHook is ~30 lines of code
Hard limits are mandatory — every agent needs caps on iterations, time, and spend
847 steps at $47/min was documented (Particula, community observation) — unbounded agents burn money without delivering answers

Frequently Asked Questions

Why do AI agents repeat the same tool call?

Agents repeat tool calls when tool responses contain ambiguous feedback such as "more results may be available" or "prices change frequently." The LLM interprets these signals as a reason to retry, expecting different or better results. Without clear terminal states (SUCCESS/FAILED), the agent has no way to know the task is complete.

What is a DebounceHook and how does it prevent reasoning loops?

A DebounceHook tracks recent tool calls in a sliding window. When the same tool is called with identical parameters more than a set threshold (typically 2 times within a window of 3), the hook blocks the call using event.cancel_tool before the tool executes. The LLM receives a "BLOCKED: Duplicate call" message and must try a different approach. In Strands Agents, this is about 30 lines of code using the HookProvider API.

How do clear SUCCESS/FAILED states reduce tool calls?

When a tool returns "SUCCESS: Booking HT79265 confirmed," the LLM recognizes the task is complete and stops calling that tool. Ambiguous responses such as "Found 2 flights, more may be available" lack this signal, causing the agent to retry. In our demo, clear states reduced tool calls from 14 to 2, a 7x improvement.