Mukunda Rao Katta

Posted on May 25

tool-loop-guard-rs: Break Agent Loops Before They Drain Your Budget

#hermeschallenge #ai #rust #agents

The web search that would not stop

Production agent. User asked it to find recent news about a company. The agent called search_web with the query "company name quarterly results". Got a result. Parsed the result. Concluded it needed more context. Called search_web again with the exact same query. Got the same result. Concluded it still needed more context.

This ran 47 times before the session hit the total token limit and terminated.

The agent was not broken in an obvious way. It was running a legitimate reasoning loop. The problem was that the loop's exit condition depended on new information arriving, and the tool was returning the same information every time. The agent kept hoping the next call would be different. It was not.

The cost was not catastrophic but it was also not zero. Forty-seven LLM turns, each consuming the full conversation history that grew longer each turn. Plus 47 identical search API calls. The whole session cost about $2 for a task that should have cost $0.15.

More concerning: the agent was running on behalf of a user who expected a quick answer. They waited 8 minutes for a response that eventually failed.

tool-loop-guard-rs catches this pattern. When the same tool gets called with the same arguments more than N times within a window of M recent calls, it raises LoopDetected. You break the loop.

Shape of the fix

[dependencies]
tool-loop-guard-rs = "0.1"
serde_json = "1"

Add the guard to your agent loop:

use tool_loop_guard_rs::{LoopGuard, LoopDetected};
use serde_json::json;

// Flag if the same tool+args appear 3 times within the last 10 calls.
let mut guard = LoopGuard::new(3, 10);

loop {
    let tool_call = llm.next_tool_call().await?;

    match guard.check(&tool_call.name, &tool_call.arguments) {
        Ok(()) => { /* not a loop, proceed */ }
        Err(LoopDetected { tool, count, window_size }) => {
            eprintln!(
                "Loop detected: '{}' called {} times in last {} calls.",
                tool, count, window_size
            );
            // Return an error to the LLM as tool_result content,
            // or abort the session, or redirect to a different strategy.
            break;
        }
    }

    let result = dispatch_tool(&tool_call).await?;
    llm.add_tool_result(result).await?;
}

The guard is stateful. It maintains the call history internally. You call check before executing the tool. The history is updated inside check. If the call is clean, check returns Ok(()). If the pattern fires, it returns Err(LoopDetected {...}) and does NOT update the history (the loop was not executed, so it should not be recorded as a call).

Adjust the threshold and window per agent:

// Very tight: block on 2 repeats in the last 5 calls.
// For agents that should never repeat a tool call.
let mut guard = LoopGuard::new(2, 5);

// Loose: block on 5 repeats in the last 20 calls.
// For agents where occasional repeated lookups are expected.
let mut guard = LoopGuard::new(5, 20);

// Reset history when a new user message arrives
// (each user turn is a fresh session):
guard.reset();

You can also query history without checking:

let recent = guard.recent_calls();
for call in recent {
    println!("{} {:?}", call.tool, call.args);
}

What it does NOT do

The guard does not understand semantically similar calls. It checks for exact arg matches only. If the agent calls search_web({"query": "company results"}) and then search_web({"query": "company quarterly results"}), those are different keys. The guard does not know that those two queries are likely to return overlapping results. Semantic deduplication would require embedding the queries and computing similarity, which is a different problem and a much heavier dependency.

Inside the lib

The call history is a fixed-size ring buffer of (tool_name, sha256(canonical_args)) pairs. The buffer size is the window parameter. New calls push to the back; when the buffer is full, the oldest call is evicted from the front.

On each check call, the guard:

Computes the SHA-256 hash of the canonical JSON form of (tool_name, args).
Counts how many of the last window_size entries have the same hash.
If that count equals or exceeds the threshold, returns Err(LoopDetected).
Otherwise, appends the new entry to the buffer and returns Ok(()).

The hash-based comparison means the buffer entries are small (32 bytes each) regardless of how large the tool args are. A window of 100 calls uses 3.2 KB of memory.

Canonical JSON ordering is the same approach used in tool-result-cache-rs and llm-message-hash. Keys in JSON objects are sorted before serialization. Two calls with the same key-value pairs in different orders are treated as identical. This matters because LLMs sometimes produce JSON objects with keys in different orderings across turns even when the logical content is the same.

check("search_web", {"query": "company results", "limit": 10})
check("search_web", {"limit": 10, "query": "company results"})
   -> same hash, counted as the same call

The window check is O(window_size) per call. For typical agent window sizes (10 to 50 calls), this is fast. The implementation does not use a HashMap for the count because the window is small enough that a linear scan is faster than hash map overhead.

The guard is not thread-safe by design. Agent loops are typically single-threaded in the tool dispatch path. If you run parallel tool calls and need shared loop detection, wrap in Arc<Mutex<LoopGuard>>. For parallel tool dispatch where tools can run concurrently, you probably want a different detection strategy anyway (a parallel loop looks different from a sequential one).

When useful

Agents that run LLM-driven loops where the exit condition depends on what the tool returns. If the tool always returns the same thing, the exit condition may never be reached.
Long-running batch agents where a loop in one item should not block the whole job. Detect the loop early, mark the item as failed, and move on.
Customer-facing agents where an invisible loop causes a multi-minute wait. A fast loop detection cuts the wait and returns a degraded response instead of a timeout.
Agents that call rate-limited external APIs. A loop that makes 50 identical calls burns 50 quota units. Detecting the loop after 3 calls cuts that to 3.

When NOT

Agents where repeated tool calls are expected and valid. A poll_job_status tool is called repeatedly by design. Set the threshold high (or do not guard at all) for polling patterns.
Extremely short agent loops (2 to 3 turns total). A window of 10 calls will never fire in a loop that runs for 3 turns. The guard is most useful for loops that are expected to run for 10 to 100 turns.
Agents where the concern is about cost across multiple sessions, not within a single session. The guard is per-LoopGuard instance. It does not track cross-session patterns.

Install

[dependencies]
tool-loop-guard-rs = "0.1"
serde_json = "1"

crates.io: tool-loop-guard-rs
GitHub: MukundaKatta/tool-loop-guard-rs

Siblings

Crate / Package	What it does
tool-loop-guard (Python)	Python port with async support for Python agent loops
agent-deadline	Cooperative per-task deadline; kill loops on wall-clock time
llm-stop-conditions	Composable stop conditions for agent loops (turn count, cost, time)
tool-result-cache-rs	Cache results so repeated identical calls at least do not cost API quota
llm-circuit-breaker	Stop calling a provider that is consistently failing; complements loop detection

What is next

Per-tool thresholds are the most useful missing feature. Right now, one LoopGuard has one threshold that applies to all tools. A config like {"search_web": 3, "get_status": 20, "*": 5} would let you set a tight limit on search calls (which should converge) and a loose limit on polling calls (which are expected to repeat). Today you either use one universal threshold or create multiple guards for different tool categories.

A LoopDetected hook that runs a user-provided callback instead of returning an error would also help. Some agents want to respond to a detected loop by injecting a message into the conversation ("I notice I have searched for the same thing multiple times. Let me try a different approach.") rather than aborting. Right now, catching LoopDetected and injecting the message into the LLM context is the caller's job. A callback that fires on detection and can return a recovery message would make that pattern easier to implement consistently.

Part of the Hermes Agent Challenge sprint. All crates shipped on crates.io.

DEV Community