Per-Tool Call-Count Caps: Prevent Any Single Tool From Running Away

#hermeschallenge #ai #python #agents

Your agent has a web_search tool. During a long task, the model calls it 47 times. Your search API bill is $18.00 for a single agent run. There was no reason to search 47 times; the model looped.

You have a send_email tool. The model calls it 3 times in one run. Three emails went to the same user. That is a bug with real user impact.

tool-call-budgets enforces per-tool call-count caps so no single tool can be called more than N times per agent session.

The Shape of the Fix

from tool_call_budgets import ToolCallBudgets, BudgetExhausted

budgets = ToolCallBudgets({
    "web_search": 5,        # Max 5 searches per run
    "send_email": 1,        # Max 1 email per run
    "create_ticket": 3,     # Max 3 tickets per run
    "database_write": 10,   # Max 10 writes per run
})

def dispatch_tool(name: str, args: dict) -> dict:
    try:
        budgets.acquire(name)
    except BudgetExhausted as e:
        return {
            "error": f"Tool {name} budget exhausted ({e.limit} calls per session).",
            "calls_used": e.used,
            "limit": e.limit,
        }

    return tools[name](**args)

When web_search is called a 6th time, acquire() raises BudgetExhausted. The error is returned to the model so it knows it cannot search again. The model must find another approach.

What It Does NOT Do

tool-call-budgets does not rate limit by time. It limits by count within a session. If you need rate limiting across time (e.g., 10 searches per minute), use agent-rate-fence for sliding window limits.

It does not enforce budgets across sessions. One ToolCallBudgets instance, one session. Create a new instance per agent run. Cross-session limits (e.g., one email per user per day) require external state (Redis, database).

It does not differentiate between tool call success and failure. Every call to acquire() counts against the budget, whether the tool succeeds or fails. If you want to only count successful calls, call acquire() after the tool completes instead of before.

Inside the Library

The implementation is minimal:

from dataclasses import dataclass

@dataclass
class BudgetExhausted(Exception):
    tool: str
    used: int
    limit: int

    def __str__(self):
        return f"Tool '{self.tool}' budget exhausted: {self.used}/{self.limit} calls used"

class ToolCallBudgets:
    def __init__(self, limits: dict[str, int]):
        self._limits = dict(limits)
        self._counts: dict[str, int] = {}

    def acquire(self, tool: str) -> int:
        """Consume one call from the tool's budget. Returns remaining calls."""
        if tool not in self._limits:
            return -1  # Unlimited tools pass through

        limit = self._limits[tool]
        used = self._counts.get(tool, 0)

        if used >= limit:
            raise BudgetExhausted(tool=tool, used=used, limit=limit)

        self._counts[tool] = used + 1
        return limit - (used + 1)  # remaining calls

    def remaining(self, tool: str) -> int | None:
        """Returns remaining calls for a tool, or None if unlimited."""
        if tool not in self._limits:
            return None
        return self._limits[tool] - self._counts.get(tool, 0)

    def usage(self) -> dict[str, dict]:
        return {
            tool: {
                "used": self._counts.get(tool, 0),
                "limit": limit,
                "remaining": limit - self._counts.get(tool, 0),
            }
            for tool, limit in self._limits.items()
        }

    def reset(self) -> None:
        self._counts.clear()

    def is_available(self, tool: str) -> bool:
        if tool not in self._limits:
            return True
        return self._counts.get(tool, 0) < self._limits[tool]

Tools not in the limits dict are unrestricted — they pass through acquire() without counting. This means you can add budgets incrementally: start with budgets on your most expensive or side-effect-prone tools and leave others unrestricted.

The remaining() method lets you inject a hint into the prompt:

if (r := budgets.remaining("web_search")) is not None and r <= 2:
    system_prompt_addition = f"NOTE: You have {r} web searches remaining. Use them carefully."

When to Use It

Use it for any tool with a cost per call (search APIs, external data APIs, AI vision endpoints). A per-session cap converts an unbounded cost risk into a predictable maximum.

Use it for tools with side effects that should not repeat. One email per run. One payment per run. One ticket creation per run. The budget enforces the invariant.

Use it as a soft guardrail alongside tool-call-dedup. Dedup blocks exact-same calls. Budgets block any call past the limit, regardless of whether the arguments are different. Both together prevent cycles and runaway costs.

Skip it for tools that legitimately need many calls. A list_next_page tool or a read_file tool may validly be called many times in a single run. Setting a budget on these would break legitimate behavior.

Install

pip install git+https://github.com/MukundaKatta/tool-call-budgets

# Or from PyPI
pip install tool-call-budgets

from tool_call_budgets import ToolCallBudgets, BudgetExhausted

# Create per-run budgets
budgets = ToolCallBudgets({
    "web_search": 5,
    "get_page_content": 10,
    "send_email": 1,
    "send_sms": 1,
    "create_order": 1,
    "charge_payment": 1,
})

def handle_tool_call(block) -> dict:
    tool_name = block.name
    tool_input = block.input

    # Check budget before executing
    remaining = budgets.remaining(tool_name)
    if remaining == 0:
        return {
            "error": f"Budget for {tool_name} is exhausted for this session.",
            "usage": budgets.usage(),
        }

    try:
        budgets.acquire(tool_name)
    except BudgetExhausted as e:
        return {"error": str(e)}

    result = execute_tool(tool_name, tool_input)

    # Optionally inject remaining count into result
    if remaining is not None and remaining <= 2:
        result["_budget_warning"] = f"Only {remaining - 1} calls left for {tool_name}"

    return result

Sibling Libraries

Library	What it solves
`tool-call-dedup`	Block exact-duplicate calls within a session
`tool-loop-guard`	Sliding-window repeated-call detector
`agent-rate-fence`	Per-key time-window rate limiting
`tool-side-effects-tag`	Tag tools as IDEMPOTENT/DESTRUCTIVE to inform budget decisions
`agent-step-log`	Log call counts for post-run analysis

The call control stack: tool-call-budgets for per-session count caps, tool-call-dedup for exact duplicate prevention, tool-loop-guard for time-window cycle detection, agent-rate-fence for per-key rate limits.

What's Next

Budget refresh: budgets.refresh("web_search", new_limit=3) that resets the count for one tool, allowing a mid-run increase. Useful for multi-phase tasks where the first phase uses the normal budget and a second phase gets additional allowance.

Warning hooks: budgets.on_near_exhaustion(callback, threshold=0.2) that fires when a tool's budget is 20% remaining. Lets the caller inject a warning into the next prompt without polling remaining() on every call.

Budget inheritance: budgets.derive(overrides={"web_search": 10}) that creates a new ToolCallBudgets instance with the same limits except where overridden. Useful for creating a "premium tier" budget from a default budget without duplicating all the entries.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.