LLM agents repeat themselves. It is a real problem. The model calls a search tool with the same query it called two turns ago. It hits a write API three times because the first two calls timed out on the client side but succeeded on the server. It runs an expensive computation it already ran and got a result for. These failures cost money, cause side effects, and are hard to notice until something breaks.
Three different tools solve three different versions of this problem. This post shows how they differ and how to compose them.
Hook
Here are three agent failure modes that look similar but require different fixes:
The agent calls
search("open issues")on turn 4. Gets a result. Callssearch("open issues")again on turn 9. Still gets the same result. You paid twice for an identical API call.The agent calls
search("error logs")five times in six turns because it keeps trying variations and not finding what it wants. No individual call is a duplicate. But the pattern is a loop, and the loop needs to be broken.The agent calls
create_ticket(title="Deploy failed", body="..."). The HTTP call returns a 500 after 30 seconds. The agent retries. The ticket was actually created on the first attempt. Now you have two duplicate tickets.
These three scenarios need three different fixes. A result cache solves scenario 1. A loop guard solves scenario 2. An idempotency key solves scenario 3.
Main Code
import hashlib
import json
import time
from typing import Any
import anthropic
from tool_result_cache import ResultCache
from tool_loop_guard import LoopGuard, LoopDetectedError
from agentidemp import IdempotencyStore, IdemKeyExistsError
# Setup
cache = ResultCache(maxsize=256, ttl=300) # 5-minute LRU cache
loop_guard = LoopGuard(window_size=10, max_repeats=3) # block after 3 repeats in 10 turns
idemp_store = IdempotencyStore() # in-process idempotency registry
client = anthropic.Anthropic()
def make_idem_key(tool_name: str, args: dict) -> str:
"""Stable key based on tool + args content."""
payload = json.dumps({"tool": tool_name, "args": args}, sort_keys=True)
return hashlib.sha256(payload.encode()).hexdigest()[:16]
def dispatch_tool(tool_name: str, args: dict, side_effects: bool = False) -> Any:
"""
Route a tool call through all three dedup layers.
- side_effects=False: cache applies (safe to return stored result)
- side_effects=True: idempotency key applies (safe to send once)
- Loop guard applies in all cases
"""
# Layer 1: loop guard (always runs)
loop_guard.record(tool_name, args) # raises LoopDetectedError if threshold crossed
if not side_effects:
# Layer 2: result cache (read-only / idempotent tools)
result = cache.get(tool_name, args)
if result is not None:
print(f"[cache hit] {tool_name}")
return result
# Execute and cache
result = execute_tool(tool_name, args)
cache.set(tool_name, args, result)
return result
else:
# Layer 3: idempotency key (write tools, side-effectful calls)
idem_key = make_idem_key(tool_name, args)
try:
with idemp_store.scope(idem_key):
result = execute_tool(tool_name, args)
idemp_store.record_result(idem_key, result)
return result
except IdemKeyExistsError:
print(f"[idemp hit] {tool_name} already executed, returning stored result")
return idemp_store.get_result(idem_key)
def execute_tool(tool_name: str, args: dict) -> Any:
"""Actual tool execution. Replace with your real implementations."""
if tool_name == "search":
return {"results": [f"Result for: {args['query']}"]}
if tool_name == "create_ticket":
return {"ticket_id": f"TICKET-{hash(args['title']) % 10000}"}
if tool_name == "read_file":
return {"content": f"Contents of {args['path']}"}
return {"ok": True}
# Tool definitions for the LLM
tools = [
{
"name": "search",
"description": "Search for information. Read-only.",
"input_schema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
{
"name": "create_ticket",
"description": "Create a support ticket. Has side effects.",
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"body": {"type": "string"},
},
"required": ["title", "body"],
},
},
{
"name": "read_file",
"description": "Read a file from disk. Read-only.",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
},
},
]
SIDE_EFFECT_TOOLS = {"create_ticket", "send_email", "post_comment", "delete_record"}
def agent_loop(user_prompt: str) -> str:
messages = [{"role": "user", "content": user_prompt}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
# Collect all tool calls from this response
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
try:
has_side_effects = block.name in SIDE_EFFECT_TOOLS
result = dispatch_tool(block.name, block.input, side_effects=has_side_effects)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result),
})
except LoopDetectedError as e:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"Error: loop detected. {e}. Try a different approach.",
"is_error": True,
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
if __name__ == "__main__":
result = agent_loop(
"Search for 'deploy errors', then search for 'deploy errors' again, "
"then create a ticket titled 'Deploy fix needed'."
)
print(result)
What It Does NOT Do
This does not deduplicate across process restarts. The ResultCache and LoopGuard are in-memory. A crash clears them. For persistent dedup across restarts, back the cache with Redis and the idempotency store with a database.
The idempotency key here is content-addressed: same tool name and same args produce the same key. This is wrong for tools where timing matters. If send_email(to="user@example.com", subject="Hello") should only send once per session but can send again tomorrow, the key needs a session scope or timestamp suffix.
The loop guard works within a single agent run. It counts calls in a sliding window of recent turns. It does not detect loops across separate runs started by the same user over a long period.
Design Reasoning
The three-layer approach respects the difference between read and write operations.
A read tool is safe to cache. The agent calls search("open issues") and gets back 10 results. Calling it again five minutes later should return the same 10 results (cached). The real API is not called again.
A write tool is not safe to cache. If you cache a create_ticket call and return the stored result on a retry, the caller does not know whether the ticket was actually created this time or last time. The idempotency key solves this without caching: it lets the call through once, records the result, and returns the stored result on any subsequent identical call. The external API may have its own idempotency key support (Stripe, GitHub, and others do). When it does, pass the key through.
The loop guard is orthogonal to both. It does not care about results. It cares about call frequency. A tool that is called three times with different args in a short window is not a cache problem. It is a behavior problem. The guard surfaces it as an error the LLM can reason about and route around.
When This Applies / Does Not Apply
Use the result cache for any tool that is deterministic and read-only: database reads, API GETs, file reads, search queries. The TTL should match how stale the data can be. A search index that updates every hour can tolerate a 5-minute TTL. A live sensor feed cannot tolerate any cache.
Use the idempotency key for any tool that mutates state: writes, creates, sends, deletes. Any tool that appears in your SIDE_EFFECT_TOOLS set.
Use the loop guard everywhere. It catches behavior patterns that caching cannot prevent and that idempotency keys do not address.
Skip all three for agents that run once and discard state. The overhead is not worth it for a single-call pipeline.
Quick-Start Snippet
pip install tool-result-cache tool-loop-guard agentidemp-py
Minimal cache-only usage:
from tool_result_cache import ResultCache
cache = ResultCache(maxsize=128, ttl=60)
def my_tool(query: str) -> dict:
cached = cache.get("my_tool", {"query": query})
if cached:
return cached
result = expensive_api_call(query)
cache.set("my_tool", {"query": query}, result)
return result
Siblings Table
| Library | What it prevents | Applies to |
|---|---|---|
| tool-result-cache | Repeated computation of identical calls | Read-only tools |
| tool-loop-guard | Repeated call patterns within a session | All tools |
| agentidemp-py | Duplicate side effects on retry | Write tools |
| tool-call-cache | LRU cache variant with manual key control | Read-only tools |
| tool-side-effects-tag | Tag tools READ/WRITE/IDEMPOTENT for routing | All tools |
What's Next
The tool-side-effects-tag library lets you declare the side-effect class of each tool at registration time. That makes the side_effects routing in dispatch_tool automatic rather than manual. When a tool is tagged READ, the cache applies. When it is tagged WRITE, the idempotency store applies. When it is tagged IDEMPOTENT, both are skipped and the call goes through every time.
The other gap is cross-session dedup. An agent that creates a ticket on behalf of a user should not create a second ticket if the user sends the same request from a different device ten minutes later. That requires a session-scoped idempotency key backed by a shared store. The libraries here give you the primitives. The session-scoping logic is yours to add.
All repos are at MukundaKatta on GitHub.
Top comments (0)