DEV Community

Joakim William Hauge
Joakim William Hauge

Posted on

Preventing Recursive Tool Loops in LangChain Agents

One of the fastest ways for LangChain agents to become unstable in production is not model quality.

It’s recursive tool loops.

A workflow starts normally:

  • search
  • retrieve
  • summarize

Then suddenly:

  • the same tool gets called repeatedly
  • retries compound
  • context grows
  • token usage spikes
  • execution drifts indefinitely

The agent technically remains “alive.”

Operationally, it stopped making progress a long time ago.

This article shows a simple way to detect and interrupt recursive tool loops in LangChain agents using TypeScript.


The Problem

A basic agent workflow often looks harmless:

```ts id="jlwm4"
const result = await agentExecutor.invoke({
input: userPrompt
});




But production agents can drift into patterns like:



```txt id="0jlwm4"
search_documents
→ search_documents
→ search_documents
→ search_documents
Enter fullscreen mode Exit fullscreen mode

or:

```txt id="1jlwm4"
search
→ summarize
→ retry
→ search
→ summarize
→ retry




This usually happens because:

* the model fails to converge
* tool outputs are ambiguous
* retries reinforce uncertainty
* the agent misinterprets partial progress

The result is:

## runaway execution.

# Why This Is Dangerous

Most AI workflows behave normally most of the time.
T
he problem comes from tail events:

* recursive retries
* unstable recovery behavior
* escalating context windows
* repeated tool invocation

A tiny percentage of unstable runs can consume a disproportionate amount of:

* inference cost
* latency
* compute
* operational attention

This is not just an observability issue.

It’s a runtime governance issue.

---

# Basic Strategy

We want to:

* track recent tool usage
* detect repetition patterns
* interrupt execution safely

before the workflow spirals.

The simplest version:



```txt id="2jlwm4"
“If the same tool is called too many times consecutively, stop execution.”
Enter fullscreen mode Exit fullscreen mode

Simple.
Effective.
Easy to implement.


Step 1 — Track Tool History

We’ll maintain lightweight runtime state:

```ts id="3jlwm4"
type ExecutionState = {
toolHistory: string[];
};




Initialize it:



```ts id="4jlwm4"
const state: ExecutionState = {
  toolHistory: []
};
Enter fullscreen mode Exit fullscreen mode

Step 2 — Detect Recursive Patterns

Now create a helper:

```ts id="5jlwm4"
function detectRecursiveLoop(
toolHistory: string[],
threshold = 3
): boolean {
if (toolHistory.length < threshold) {
return false;
}

const recent = toolHistory.slice(-threshold);

return recent.every(
tool => tool === recent[0]
);
}




This checks:



```txt id="6jlwm4"
Did the same tool run 3 times in a row?
Enter fullscreen mode Exit fullscreen mode

Step 3 — Wrap Tool Execution

Now intercept tool calls:

```ts id="7jlwm4"
async function guardedToolCall(
toolName: string,
execute: () => Promise
) {
state.toolHistory.push(toolName);

if (detectRecursiveLoop(state.toolHistory)) {
throw new Error(
Recursive loop detected for tool: ${toolName}
);
}

return execute();
}




---

# Step 4 — Use Inside LangChain Tools

Example:



```ts id="8jlwm4"
const result = await guardedToolCall(
  "search_documents",
  async () => {
    return searchTool.invoke(query);
  }
);
Enter fullscreen mode Exit fullscreen mode

That’s it.

Now your workflow can:

  • detect runaway repetition
  • interrupt unstable execution
  • prevent unnecessary cost escalation

Why Simple Detection Works Surprisingly Well

A lot of teams initially assume they need:

  • anomaly detection
  • reinforcement learning
  • advanced telemetry pipelines

But simple operational heuristics already eliminate many expensive failures.

Especially:

  • recursive retries
  • retry storms
  • repeated tool churn

You do not need perfect intelligence initially.

You need:

bounded execution.


Production Improvements

The minimal approach above works surprisingly well, but production systems usually add:

  • semantic similarity detection
  • token velocity monitoring
  • execution depth limits
  • tool-call budgets
  • runtime ceilings
  • timeout policies
  • adaptive thresholds

Example:

```txt id="9jlwm4"
search
→ search
→ search




is easy to detect.

More advanced loops look like:



```txt id="10jlwm4"
search
→ summarize
→ retry
→ search
→ summarize
→ retry
Enter fullscreen mode Exit fullscreen mode

These require broader trajectory analysis.


The Distributed Systems Parallel

Distributed systems eventually evolved:

  • retry limits
  • circuit breakers
  • bounded failure domains
  • timeout controls

because unconstrained retries became dangerous at scale.

Autonomous agent systems are beginning to encounter similar operational realities.

As agents become:

  • more autonomous
  • more persistent
  • more deeply integrated

runtime governance becomes increasingly important.


Final Thoughts

Most teams focus heavily on:

  • prompts
  • model quality
  • orchestration frameworks

But production AI systems also need:

  • bounded execution
  • runtime constraints
  • operational safeguards
  • economic stability

Because eventually:
the challenge is not just building autonomous agents.

It is building governable autonomous agents.

Top comments (0)