How to Stop Your LLM Agent From Looping Itself Into Oblivion

#ai #python #llm #debugging

You build a shiny new agent. It works great in the demo. Then you deploy it, and the next morning you wake up to find it called the same search function 47 times in a row before finally giving up. Sound familiar?

I hit this exact problem last week on a client project. The agent was supposed to research a topic, summarize findings, and write a report. Instead, it kept fetching the same URL, getting the same content, "reflecting" on whether it had enough information, deciding no, and fetching it again. Beautiful infinite loop. Expensive infinite loop.

This is one of those problems that doesn't show up in tutorials. Every "build an agent in 50 lines" post conveniently skips it. So let's actually dig into why it happens and how to fix it.

Why Agents Loop

There are three main reasons your agent gets stuck repeating itself.

Fuzzy completion criteria. Most agent loops look something like: "keep calling tools until the model says it's done." That works fine when the task is clear. It falls apart when the model isn't sure whether it has enough information. Without a hard stopping rule, "I'll just check one more time" can repeat indefinitely.

Context degradation. As tool results pile up in the context window, the model starts losing track of what it has already done. By turn 20, the system prompt and original task are buried under JSON blobs. The model essentially forgets that it already searched for "user authentication patterns" and searches again.

No structured memory of past tool calls. Many agent loops naively dump tool results back into context with no separate tracking. The model has no easy way to ask "have I already called search('X')?" because that information lives somewhere in 30k tokens of half-remembered chat history.

Step One: Add a Hard Iteration Cap

This sounds obvious, but you'd be amazed how many production agent loops don't have one. Always set a hard upper bound:

MAX_ITERATIONS = 15

def run_agent(task, tools):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": task},
    ]

    for i in range(MAX_ITERATIONS):
        response = call_model(messages, tools)

        # Model decided it's done — return the final answer
        if response.stop_reason == "end_turn":
            return response.content

        # Execute tool calls and feed results back into the conversation
        tool_results = execute_tools(response.tool_calls)
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

    # Fail loudly instead of silently burning more tokens
    raise AgentLoopExceeded("Hit max iterations without completing task")

Yes, you'll occasionally truncate a legitimate long task. That's fine. Failing loudly is much better than racking up a $400 inference bill at 3am.

Step Two: Deduplicate Tool Calls

Track what the agent has already called. If it tries to call the same tool with the same arguments, intercept it:

import hashlib
import json

class ToolCallTracker:
    def __init__(self):
        self.seen = {}  # fingerprint -> cached result

    def fingerprint(self, name, args):
        # Stable hash of the call signature (sort keys for determinism)
        canonical = json.dumps({"name": name, "args": args}, sort_keys=True)
        return hashlib.sha256(canonical.encode()).hexdigest()

    def get_or_execute(self, name, args, executor):
        fp = self.fingerprint(name, args)
        if fp in self.seen:
            # Return the cached result plus a nudge for the model
            return {
                "result": self.seen[fp],
                "warning": "You already called this. Try something different.",
            }
        result = executor(name, args)
        self.seen[fp] = result
        return {"result": result}

The warning field is the secret sauce. It tells the model "hey, you've been here before." In my testing this alone reduced loops by something like 70 percent. I haven't run a rigorous benchmark — that's just from eyeballing trace logs across maybe 200 runs.

Step Three: Detect Semantic Loops

Sometimes the agent doesn't repeat the exact same call. It does search("python async"), then search("async in python"), then search("python asyncio"). Same intent, different arguments.

For this you need fuzzy matching. The cheap version uses embeddings — sentence-transformers is fine for this:

from sentence_transformers import SentenceTransformer
import numpy as np
import json

model = SentenceTransformer("all-MiniLM-L6-v2")

class SemanticLoopDetector:
    def __init__(self, threshold=0.92):
        self.threshold = threshold
        self.history = []  # list of (embedding, call_repr)

    def check(self, name, args):
        repr_str = f"{name}({json.dumps(args, sort_keys=True)})"
        emb = model.encode(repr_str)

        for prev_emb, prev_repr in self.history:
            # Cosine similarity between the new call and each prior one
            sim = float(np.dot(emb, prev_emb) / (
                np.linalg.norm(emb) * np.linalg.norm(prev_emb)
            ))
            if sim > self.threshold:
                return prev_repr  # Loop detected — return what matched

        self.history.append((emb, repr_str))
        return None

Tune the threshold to taste. Around 0.92 is roughly where you catch real loops without flagging genuinely different queries. Higher and you miss loops; lower and you start blocking useful exploration.

Step Four: Force a Decision

If you detect three near-duplicate calls in a row, stop being polite. Inject a system message telling the agent to either commit to an answer or stop:

if detector.consecutive_duplicates >= 3:
    messages.append({
        "role": "user",
        "content": (
            "You have repeated similar tool calls three times. "
            "Based on what you already know, either provide your "
            "best answer now or explicitly say you cannot complete "
            "this task. Do not call any more tools."
        ),
    })

This is brutal and it works. Most stuck agents will produce a reasonable answer once you take the option to keep looping off the table.

Prevention: Habits to Bake In From Day One

A few things that save real pain later:

Log every tool call with timestamps. When something goes wrong you want to read the trace, not guess.
Set a token budget per task, not just an iteration count. A loop that fits in 5 iterations but each one pulls a 50k-token document is just as bad.
Write completion criteria into the system prompt. "Stop after you have at least 3 sources" beats "stop when you're done."
Test with adversarial inputs. Give the agent a task with no good answer. Make sure it gives up gracefully instead of looping forever.
Make tool errors visible to the model. If a tool failed, say so plainly in the result. Silent failures push the model into retry-storm territory.

The agent ecosystem is still figuring out best practices. Most of the tooling we'd actually want — proper tracing, deterministic replay, structured tool-call memory — is being reinvented in every framework. Until things settle, defensive coding is the price of admission.

If you've already shipped an agent to production without these guards, I'd suggest checking your billing dashboard before you check anything else. Ask me how I know.