Mukunda Rao Katta

Posted on May 25

Tool Call Deduplication: Stop Your Agent from Doing the Same Thing Twice

#hermeschallenge #ai #python #agents

LLM agents repeat themselves. It is a real problem. The model calls a search tool with the same query it called two turns ago. It hits a write API three times because the first two calls timed out on the client side but succeeded on the server. It runs an expensive computation it already ran and got a result for. These failures cost money, cause side effects, and are hard to notice until something breaks.

Three different tools solve three different versions of this problem. This post shows how they differ and how to compose them.

Hook

Here are three agent failure modes that look similar but require different fixes:

The agent calls search("open issues") on turn 4. Gets a result. Calls search("open issues") again on turn 9. Still gets the same result. You paid twice for an identical API call.
The agent calls search("error logs") five times in six turns because it keeps trying variations and not finding what it wants. No individual call is a duplicate. But the pattern is a loop, and the loop needs to be broken.
The agent calls create_ticket(title="Deploy failed", body="..."). The HTTP call returns a 500 after 30 seconds. The agent retries. The ticket was actually created on the first attempt. Now you have two duplicate tickets.

These three scenarios need three different fixes. A result cache solves scenario 1. A loop guard solves scenario 2. An idempotency key solves scenario 3.

Main Code

import hashlib
import json
import time
from typing import Any
import anthropic
from tool_result_cache import ResultCache
from tool_loop_guard import LoopGuard, LoopDetectedError
from agentidemp import IdempotencyStore, IdemKeyExistsError

# Setup
cache = ResultCache(maxsize=256, ttl=300)           # 5-minute LRU cache
loop_guard = LoopGuard(window_size=10, max_repeats=3)  # block after 3 repeats in 10 turns
idemp_store = IdempotencyStore()                    # in-process idempotency registry

client = anthropic.Anthropic()


def make_idem_key(tool_name: str, args: dict) -> str:
    """Stable key based on tool + args content."""
    payload = json.dumps({"tool": tool_name, "args": args}, sort_keys=True)
    return hashlib.sha256(payload.encode()).hexdigest()[:16]


def dispatch_tool(tool_name: str, args: dict, side_effects: bool = False) -> Any:
    """
    Route a tool call through all three dedup layers.

    - side_effects=False: cache applies (safe to return stored result)
    - side_effects=True: idempotency key applies (safe to send once)
    - Loop guard applies in all cases
    """

    # Layer 1: loop guard (always runs)
    loop_guard.record(tool_name, args)   # raises LoopDetectedError if threshold crossed

    if not side_effects:
        # Layer 2: result cache (read-only / idempotent tools)
        result = cache.get(tool_name, args)
        if result is not None:
            print(f"[cache hit] {tool_name}")
            return result

        # Execute and cache
        result = execute_tool(tool_name, args)
        cache.set(tool_name, args, result)
        return result

    else:
        # Layer 3: idempotency key (write tools, side-effectful calls)
        idem_key = make_idem_key(tool_name, args)
        try:
            with idemp_store.scope(idem_key):
                result = execute_tool(tool_name, args)
                idemp_store.record_result(idem_key, result)
                return result
        except IdemKeyExistsError:
            print(f"[idemp hit] {tool_name} already executed, returning stored result")
            return idemp_store.get_result(idem_key)


def execute_tool(tool_name: str, args: dict) -> Any:
    """Actual tool execution. Replace with your real implementations."""
    if tool_name == "search":
        return {"results": [f"Result for: {args['query']}"]}
    if tool_name == "create_ticket":
        return {"ticket_id": f"TICKET-{hash(args['title']) % 10000}"}
    if tool_name == "read_file":
        return {"content": f"Contents of {args['path']}"}
    return {"ok": True}


# Tool definitions for the LLM
tools = [
    {
        "name": "search",
        "description": "Search for information. Read-only.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"],
        },
    },
    {
        "name": "create_ticket",
        "description": "Create a support ticket. Has side effects.",
        "input_schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "body": {"type": "string"},
            },
            "required": ["title", "body"],
        },
    },
    {
        "name": "read_file",
        "description": "Read a file from disk. Read-only.",
        "input_schema": {
            "type": "object",
            "properties": {"path": {"type": "string"}},
            "required": ["path"],
        },
    },
]

SIDE_EFFECT_TOOLS = {"create_ticket", "send_email", "post_comment", "delete_record"}


def agent_loop(user_prompt: str) -> str:
    messages = [{"role": "user", "content": user_prompt}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text

        # Collect all tool calls from this response
        tool_results = []
        for block in response.content:
            if block.type != "tool_use":
                continue

            try:
                has_side_effects = block.name in SIDE_EFFECT_TOOLS
                result = dispatch_tool(block.name, block.input, side_effects=has_side_effects)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result),
                })
            except LoopDetectedError as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"Error: loop detected. {e}. Try a different approach.",
                    "is_error": True,
                })

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})


if __name__ == "__main__":
    result = agent_loop(
        "Search for 'deploy errors', then search for 'deploy errors' again, "
        "then create a ticket titled 'Deploy fix needed'."
    )
    print(result)

What It Does NOT Do

This does not deduplicate across process restarts. The ResultCache and LoopGuard are in-memory. A crash clears them. For persistent dedup across restarts, back the cache with Redis and the idempotency store with a database.

The idempotency key here is content-addressed: same tool name and same args produce the same key. This is wrong for tools where timing matters. If send_email(to="user@example.com", subject="Hello") should only send once per session but can send again tomorrow, the key needs a session scope or timestamp suffix.

The loop guard works within a single agent run. It counts calls in a sliding window of recent turns. It does not detect loops across separate runs started by the same user over a long period.

Design Reasoning

The three-layer approach respects the difference between read and write operations.

A read tool is safe to cache. The agent calls search("open issues") and gets back 10 results. Calling it again five minutes later should return the same 10 results (cached). The real API is not called again.

A write tool is not safe to cache. If you cache a create_ticket call and return the stored result on a retry, the caller does not know whether the ticket was actually created this time or last time. The idempotency key solves this without caching: it lets the call through once, records the result, and returns the stored result on any subsequent identical call. The external API may have its own idempotency key support (Stripe, GitHub, and others do). When it does, pass the key through.

The loop guard is orthogonal to both. It does not care about results. It cares about call frequency. A tool that is called three times with different args in a short window is not a cache problem. It is a behavior problem. The guard surfaces it as an error the LLM can reason about and route around.

When This Applies / Does Not Apply

Use the result cache for any tool that is deterministic and read-only: database reads, API GETs, file reads, search queries. The TTL should match how stale the data can be. A search index that updates every hour can tolerate a 5-minute TTL. A live sensor feed cannot tolerate any cache.

Use the idempotency key for any tool that mutates state: writes, creates, sends, deletes. Any tool that appears in your SIDE_EFFECT_TOOLS set.

Use the loop guard everywhere. It catches behavior patterns that caching cannot prevent and that idempotency keys do not address.

Skip all three for agents that run once and discard state. The overhead is not worth it for a single-call pipeline.

Quick-Start Snippet

pip install tool-result-cache tool-loop-guard agentidemp-py

Minimal cache-only usage:

from tool_result_cache import ResultCache

cache = ResultCache(maxsize=128, ttl=60)

def my_tool(query: str) -> dict:
    cached = cache.get("my_tool", {"query": query})
    if cached:
        return cached
    result = expensive_api_call(query)
    cache.set("my_tool", {"query": query}, result)
    return result

Siblings Table

Library	What it prevents	Applies to
tool-result-cache	Repeated computation of identical calls	Read-only tools
tool-loop-guard	Repeated call patterns within a session	All tools
agentidemp-py	Duplicate side effects on retry	Write tools
tool-call-cache	LRU cache variant with manual key control	Read-only tools
tool-side-effects-tag	Tag tools READ/WRITE/IDEMPOTENT for routing	All tools

What's Next

The tool-side-effects-tag library lets you declare the side-effect class of each tool at registration time. That makes the side_effects routing in dispatch_tool automatic rather than manual. When a tool is tagged READ, the cache applies. When it is tagged WRITE, the idempotency store applies. When it is tagged IDEMPOTENT, both are skipped and the call goes through every time.

The other gap is cross-session dedup. An agent that creates a ticket on behalf of a user should not create a second ticket if the user sends the same request from a different device ten minutes later. That requires a session-scoped idempotency key backed by a shared store. The libraries here give you the primitives. The session-scoping logic is yours to add.

All repos are at MukundaKatta on GitHub.

DEV Community