Composable Budget Caps for Agent Runs: Enforce Cost Limits at Multiple Levels

#hermeschallenge #ai #python #agents

You have a per-request budget of $0.10. You have a per-user daily budget of $2.00. You have a total system budget of $500/day. These are three different constraints. Each one should trigger a different action: abort the current run, reject new requests for this user, alert the on-call engineer.

Keeping all three checks in your agent loop means a tangle of conditionals. And when the product team adds a per-team budget next month, you are back in the loop editing code.

agent-budget-coordinator composes multiple budget caps into one check() call.

The Shape of the Fix

from agent_budget_coordinator import BudgetCoordinator, BudgetExceeded
from token_budget_py import BudgetPool

coordinator = BudgetCoordinator()

# Per-request cap
coordinator.add("per_request", lambda usd: usd >= 0.10)

# Per-user daily pool (token_budget_py manages the pool)
user_pool = BudgetPool(limit_usd=2.00)
coordinator.add("per_user_daily", lambda usd: not user_pool.try_reserve(usd))

# System-wide daily pool
system_pool = BudgetPool(limit_usd=500.00)
coordinator.add("system_daily", lambda usd: not system_pool.try_reserve(usd))

def before_llm_call(run_id: str, estimated_usd: float) -> None:
    try:
        coordinator.check(estimated_usd, context={"run_id": run_id})
    except BudgetExceeded as e:
        raise RuntimeError(f"Budget exceeded [{e.budget_name}]: {e.detail}")

Three budget constraints, one check() call. When any one of them triggers, BudgetExceeded is raised with the name of the budget that was exceeded and optional detail. The agent loop handles the exception once.

What It Does NOT Do

agent-budget-coordinator does not track spend itself. It evaluates conditions. You pass the current spend to check(). The tracking — how much has been spent on this run, this user, this system — lives in your pools or counters. The coordinator is a routing layer, not a ledger.

It does not handle async budget checks natively. The conditions are sync callables. For async budget checks (querying Redis, a database row), wrap them with asyncio.run() or use AsyncBudgetCoordinator which accepts async callables and uses asyncio.gather().

It does not enforce a specific response to budget exhaustion. When BudgetExceeded is raised, you decide what to do: abort the run, return a cached partial result, queue the request for later, notify the user. The coordinator raises the exception; your code decides the recovery.

Inside the Library

The coordinator is a list of (name, callable) pairs:

class BudgetCoordinator:
    def __init__(self):
        self._checks: list[tuple[str, Callable[[float], bool]]] = []

    def add(self, name: str, condition: Callable[[float], bool]) -> None:
        self._checks.append((name, condition))

    def check(self, usd: float, context: dict | None = None) -> None:
        for name, condition in self._checks:
            try:
                exceeded = condition(usd)
            except Exception as e:
                # A failing check is treated as exceeded (fail closed)
                raise BudgetExceeded(
                    budget_name=name,
                    detail=f"Check raised: {e}",
                    context=context,
                )

            if exceeded:
                raise BudgetExceeded(
                    budget_name=name,
                    detail=f"Limit exceeded at ${usd:.4f}",
                    context=context,
                )

Fail-closed semantics: if a budget check function raises an exception, the coordinator treats it as "exceeded." A broken budget check does not silently allow unlimited spending.

The conditions can close over any state. They are plain callables: lambdas, bound methods, closures over counters or pools:

# Closure over a running counter
run_usd = {"total": 0.0}

def update_and_check(usd: float) -> bool:
    run_usd["total"] += usd
    return run_usd["total"] >= 0.10

coordinator.add("per_request", update_and_check)

Priority: checks are evaluated in the order they were added. The first one to trigger raises immediately without evaluating the remaining checks. Order your checks from cheapest-to-evaluate to most-expensive.

When to Use It

Use it when you have more than one budget constraint and want to centralize the enforcement. Two constraints is the threshold where a coordinator pays off: it eliminates the if-else chain and makes adding a third constraint a one-liner.

Use it for multi-tenant systems. Per-request, per-user, per-team, per-system-daily are the four natural budget levels. The coordinator unifies them. Exceeded budget names in the exception tell you which level triggered so you can log appropriately.

Use it with token_budget_py for the actual pool management. token_budget_py provides thread-safe atomic reservation with try_reserve(). The coordinator provides the compositional layer on top.

Skip it for simple single-budget scenarios. If you have one budget check, just write the if statement. The coordinator adds value when the number of checks grows.

Install

pip install git+https://github.com/MukundaKatta/agent-budget-coordinator

# Or from PyPI
pip install agent-budget-coordinator

from agent_budget_coordinator import BudgetCoordinator, BudgetExceeded

coordinator = BudgetCoordinator()

# Add checks in priority order
coordinator.add("request_hard_limit", lambda usd: usd >= 0.50)
coordinator.add("user_soft_limit", lambda usd: user_daily_usd() >= 2.00)
coordinator.add("system_daily", lambda usd: system_daily_usd() >= 500.00)

async def run_agent(user_id: str, task: str) -> str:
    estimated = estimate_cost(task)

    try:
        coordinator.check(estimated, context={"user_id": user_id})
    except BudgetExceeded as e:
        if e.budget_name == "request_hard_limit":
            return "Task too large. Please break it into smaller parts."
        elif e.budget_name == "user_soft_limit":
            return "Daily limit reached. Resets at midnight UTC."
        else:
            # System-level — alert on-call
            alert_oncall(f"System budget exceeded: {e.detail}")
            return "Service temporarily unavailable."

    return await _run_agent_loop(user_id, task)

Sibling Libraries

Library	What it solves
`token-budget-py`	Thread-safe atomic USD/token pool with try_reserve()
`llm-cost-cap`	Pre-flight USD cost gate for single requests
`llm-budget-window`	Time-windowed (hourly/daily) token and USD budget
`llm-stop-conditions`	Composable stop conditions including MaxUsd
`agent-deadline`	Cooperative wall-clock time deadline

The budget stack: token-budget-py for pool accounting, llm-budget-window for time-window tracking, agent-budget-coordinator for composing multiple caps into one check point.

What's Next

Named budget groups: tag checks with a scope (request, user, system) and raise with the scope attached. Lets callers route the exception without string-matching the budget name.

Check metadata: each condition can return a BudgetStatus instead of a bool, including the current value and the limit. Useful for building a dashboard endpoint that shows how close each budget is to its limit.

Async native: AsyncBudgetCoordinator that accepts both sync and async conditions and uses asyncio.gather() to evaluate them concurrently. Concurrent evaluation is useful when checks involve network round-trips (e.g. checking Redis counters).

Built as part of the agent-stack family: composable Python primitives for production LLM agents.