The 2 a.m. Runaway Loop: Detecting and Killing an Agent That Won't Stop

#ai #agents #python #llm

Book: Agents in Production — Building, Tracing, and Shipping Multi-Step AI You Can Trust
Also by me: Observability for LLM Applications — the companion book in The AI Engineer's Library (2-book series)
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

In April 2025 a user filed Issue #44726 against the Claude Code repo. The title said [BUG][Billing]. The body was a table of token counts. In a normal session, output tokens trail input tokens by a wide margin. This one read 74:1. Another attached session read 175:1. The account balance had gone negative.

The report traces it to a compounding loop: conversation history and file context growing unbounded across tool calls. The agent had no idea anything was wrong. From its own point of view it was working.

That is the whole problem in one sentence. The thing running in the loop cannot tell you it is running in the loop, because it is the thing that decides whether to keep running. So the detection and the kill switch have to live outside the model, in code the model never gets to touch.

How the loop actually starts

A runaway is rarely one dramatic decision. It is a small pattern that repeats.

Issue #27281 is a clean example: an agent stuck repeating "let me write the document" turn after turn, narrating its own intent, never actually calling the Write tool. A full context window, burned, on an agent talking about work it never did. Issue #6004 is an "infinite compaction loop" — the agent compacts its own history, fails to make progress, compacts again, and users hit their usage limit far sooner than expected.

These share a shape. The model emits a valid response. It maybe calls a tool. The tool maybe returns something. Every individual step is fine. Your load balancer sees a healthy endpoint. The bug is in the trajectory, not in any one step, and a trajectory is exactly the thing a request-level test suite cannot see.

The model does not "notice" the repetition because a forward pass sees only what is in the window right now. If the loop is short, the window shows a repeating pattern and the model might break out. If the loop compacts into a summary, the repetition looks like progress. There is no internal clock. There is no back-pressure unless you build it.

Ceiling first: the loop the model can't see

Before any clever detection, put a hard ceiling around the loop. Three axes: steps, tokens, wall-clock. This is the floor of the whole system, and it is a plain for loop the model never gets to override.

import time

MAX_STEPS = 12
MAX_TOKENS = 200_000
MAX_WALL_CLOCK = 300  # seconds


def run_agent(step_fn, task):
    tokens = 0
    start = time.monotonic()
    for _ in range(MAX_STEPS):
        if tokens >= MAX_TOKENS:
            raise RuntimeError("token cap hit")
        if time.monotonic() - start > MAX_WALL_CLOCK:
            raise RuntimeError("wall-clock cap hit")
        resp = step_fn(task)          # one model call
        tokens += resp["tokens"]
        if resp["done"]:
            return resp["result"]
    raise RuntimeError("step cap hit")

step_fn is your single provider call. The caps live in the code that calls it, never in the prompt that asks it to stop. Hand this wrapper the exact loop from Issue #44726 (a model that never returns done) and it ends the session instead of the invoice:

stuck = lambda task: {
    "done": False,
    "tokens": 5000,
    "result": None,
}
run_agent(stuck, "summarise the repo")

RuntimeError: step cap hit

Use time.monotonic for elapsed time, not time.time. An NTP correction on a wall-clock timestamp will eventually give you a negative elapsed value in a long-running loop, and your kill switch will quietly stop firing.

The ceiling is necessary. It is not sufficient. A 12-step cap still lets a loop burn twelve full turns before it stops, and a compaction loop can do real damage in twelve turns. You want to catch the runaway earlier, at the moment it stops making progress.

No-progress detection: catch it before the ceiling

A ceiling asks "have you gone too far?" No-progress detection asks the better question: "are you still getting anywhere?" A healthy agent's turns look different from each other: new tool, new argument, new result. A stuck agent repeats itself.

The cheapest signal that generalizes well is a hash of each proposed action. Same tool with the same arguments, back to back, is the fingerprint of a stuck loop. Track the recent fingerprints and stop when one repeats too often.

import hashlib
import json
from collections import deque


class NoProgressDetector:
    def __init__(self, window=6, repeat_limit=3):
        self.recent = deque(maxlen=window)
        self.repeat_limit = repeat_limit

    def _fingerprint(self, tool_name, tool_input):
        blob = json.dumps(
            [tool_name, tool_input],
            sort_keys=True,
            default=str,
        )
        return hashlib.sha256(
            blob.encode()
        ).hexdigest()

    def check(self, tool_name, tool_input):
        fp = self._fingerprint(tool_name, tool_input)
        self.recent.append(fp)
        count = self.recent.count(fp)
        if count >= self.repeat_limit:
            return False, (
                f"repeated action {count}x: "
                f"{tool_name}"
            )
        return True, None

check returns (ok, reason). Call it before you run each tool call. If it comes back False, you have a loop that is cycling on the same action, and you stop now instead of waiting for the step cap.

The "narrating intent" loop from Issue #27281 needs one more signal: turns that produce no tool call at all. An agent that talks for three turns in a row without acting is not thinking, it is stalling. Count consecutive no-tool turns and treat that as no-progress too.

class StallDetector:
    def __init__(self, max_idle_turns=3):
        self.max_idle_turns = max_idle_turns
        self.idle_turns = 0

    def observe(self, made_tool_call):
        if made_tool_call:
            self.idle_turns = 0
            return True, None
        self.idle_turns += 1
        if self.idle_turns >= self.max_idle_turns:
            return False, (
                f"{self.idle_turns} turns with no "
                "tool call"
            )
        return True, None

Neither detector is perfect. A legitimate retry can repeat an action once or twice, which is why repeat_limit is 3 and not 1. Some tasks genuinely need a couple of pure-reasoning turns, which is why max_idle_turns gives slack before it fires. Tune both against your own traces. The point is that the model does not get a vote on either one.

The kill switch that ends the horror story

Detection is only half the job. When something fires, you have to actually stop, and stop in a way the caller can reason about. Do not raise a bare exception and lose the context. Return a structured stop reason so the layer above knows whether the agent finished or got killed, and can show the user the right thing.

Here is the outer loop wiring the ceiling, both detectors, and a clean stop together.

from dataclasses import dataclass


@dataclass
class StopReason:
    kind: str      # done | step_cap | no_progress | stall
    detail: str


def run(model_call, run_tool, messages, tools):
    progress = NoProgressDetector()
    stall = StallDetector()

    for step in range(MAX_STEPS):
        resp = model_call(messages, tools)
        messages.append(
            {"role": "assistant",
             "content": resp.content}
        )

        calls = [b for b in resp.content
                 if b.type == "tool_use"]

        ok, why = stall.observe(bool(calls))
        if not ok:
            return messages, StopReason("stall", why)

        if not calls:
            return messages, StopReason(
                "done", "model finished")

        results = []
        for c in calls:
            ok, why = progress.check(c.name, c.input)
            if not ok:
                return messages, StopReason(
                    "no_progress", why)
            results.append(run_tool(c))

        messages.append(
            {"role": "user", "content": results})

    return messages, StopReason(
        "step_cap", "hit MAX_STEPS")

model_call is your provider call. With the Anthropic SDK it wraps client.messages.create(...) with your messages and tools, and Claude is a sensible default when the loop is doing real tool use. The detectors sit between the model's proposal and the tool actually running, which is the only place a kill switch belongs: after the model decides, before the action lands.

The structured StopReason is what turns a kill into a usable outcome. done shows the final answer. step_cap, no_progress, and stall show a partial answer plus an honest "this run was capped — want me to continue?" hatch. The user is never left staring at a spinner that has secretly been dead for thirty-six hours.

What to log, so 2 a.m. never repeats

The reason the Issue #44726 user found out from a billing table is that nobody was watching the trajectory. You can do better with four fields per session:

session_id
stop_reason.kind and stop_reason.detail
steps, input_tokens, output_tokens, wall_seconds
per tool: call_count, plus a repeated_action flag when the detector tripped

Chart the stop-reason mix over time. You want done to dominate. A rising share of no_progress or stall is your early warning that the model is getting stuck on a class of task — long before finance sends the message that starts with "is this number right?"

None of this makes the agent smarter. It makes the agent stoppable, which for anything running unattended at 2 a.m. matters more. The model owns the control flow. Your job is to own the off switch.

Runaway loops are one of the failure modes that only show up in the trajectory, never in a single request — which is why they slip past the test pyramid you already have. Agents in Production walks through building and shipping multi-step agents with these rails in place, and Observability for LLM Applications covers the tracing and evals that let you watch trajectories instead of guessing from an invoice. Together they are The AI Engineer's Library, and this post is a slice of what both books argue: you are the operator of a system whose author is a language model.