DEV Community

Albert Mavashev
Albert Mavashev

Posted on

How to Add Budget Control to a LangChain Agent

How to Add Budget Control to a LangChain Agent

LangChain makes it easy to build agents that call LLMs, search the web, execute code, and chain tool calls together. What it doesn't give you is any way to cap how much a single agent run is allowed to spend.

That's fine when you're experimenting. It's a real problem when you're running agents in production — especially across multiple users or tenants. A single misbehaving agent loop can burn through hundreds of dollars before anyone notices.

This guide shows how to add per-run budget control to a LangChain agent using Cycles — without rewriting your agent logic.

tip Already using the callback handler?
If you want per-LLM-call budget tracking (a reservation around every model invocation), see Integrating Cycles with LangChain. This guide covers a different pattern: a single reservation around the entire agent run, plus optional tool-level checks.
:::

The problem

Here's a typical LangChain agent loop:

from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    return your_search_implementation(query)

llm = ChatOpenAI(model="gpt-4o")
agent = create_openai_functions_agent(llm, [search_web], prompt)
executor = AgentExecutor(agent=agent, tools=[search_web])

result = executor.invoke({"input": "Research the top 10 competitors..."})
Enter fullscreen mode Exit fullscreen mode

This works. But there's no limit on how many LLM calls the agent makes, how many tool invocations it triggers, or what it costs. If the agent gets stuck in a loop, retries a failing tool, or expands scope unexpectedly, it keeps running — and spending — until it either finishes or hits the provider's rate limits.

The fix: reserve before, commit after

The pattern Cycles uses is borrowed from database transactions:

  1. Reserve budget before the agent run starts
  2. Execute the agent if the reservation is granted
  3. Commit actual usage after — releases unused budget back to the pool
  4. Release the full reservation if the run fails

This gives you hard limits that are enforced before spend happens — not discovered afterward on your bill.

Prerequisites

pip install runcycles langchain langchain-openai
Enter fullscreen mode Exit fullscreen mode
export CYCLES_BASE_URL="http://localhost:7878"
export CYCLES_API_KEY="your-api-key"
export OPENAI_API_KEY="sk-..."
Enter fullscreen mode Exit fullscreen mode

Need an API key? Create one via the Admin Server — see Deploy the Full Stack or API Key Management.

Per-run budget wrapper

Wrap your AgentExecutor invocation in a single Cycles reservation:

import uuid
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from runcycles import (
    CyclesClient, CyclesConfig, ReservationCreateRequest,
    CommitRequest, ReleaseRequest, Subject, Action, Amount,
    Unit, BudgetExceededError, CyclesProtocolError,
)

client = CyclesClient(CyclesConfig.from_env())

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    return your_search_implementation(query)

def run_agent_with_budget(
    user_input: str,
    tenant: str,
    budget_microcents: int,
) -> dict:
    key = str(uuid.uuid4())

    # 1. Reserve budget for the entire run
    res = client.create_reservation(ReservationCreateRequest(
        idempotency_key=key,
        subject=Subject(tenant=tenant, workflow="research"),
        action=Action(kind="agent.run", name="research-task"),
        estimate=Amount(unit=Unit.USD_MICROCENTS, amount=budget_microcents),  # 1 USD = 100_000_000 microcents
        ttl_ms=120_000,
    ))

    if not res.is_success:
        error = res.get_error_response()
        if error and error.error == "BUDGET_EXCEEDED":
            raise BudgetExceededError(
                error.message, status=res.status,
                error_code=error.error, request_id=error.request_id,
            )
        msg = error.message if error else (res.error_message or "Reservation failed")
        raise CyclesProtocolError(msg, status=res.status)

    reservation_id = res.get_body_attribute("reservation_id")
    decision = res.get_body_attribute("decision")

    # 2. Execute the agent — optionally downgrade if budget is tight
    try:
        if decision == "ALLOW_WITH_CAPS":
            llm = ChatOpenAI(model="gpt-4o-mini")
        else:
            llm = ChatOpenAI(model="gpt-4o")
        agent = create_openai_functions_agent(llm, [search_web], prompt)
        executor = AgentExecutor(
            agent=agent, tools=[search_web], max_iterations=10,
        )

        result = executor.invoke({"input": user_input})

        # 3. Commit actual usage
        client.commit_reservation(reservation_id, CommitRequest(
            idempotency_key=f"commit-{key}",
            actual=Amount(
                unit=Unit.USD_MICROCENTS,
                amount=budget_microcents // 2,  # replace with real tracking
            ),
        ))
        return result

    except Exception:
        # 4. Release on failure — budget returns to the pool
        client.release_reservation(
            reservation_id,
            ReleaseRequest(idempotency_key=f"release-{key}"),
        )
        raise

# Run it
result = run_agent_with_budget(
    user_input="Research the top 10 competitors in the CRM space",
    tenant="acme",
    budget_microcents=5_000_000_000,  # $50.00
)
Enter fullscreen mode Exit fullscreen mode

::: info Crash safety
If the agent crashes before committing or releasing, the reservation expires automatically after ttl_ms and the held budget returns to the pool. See TTL, Grace Period, and Extend.
:::

Adding tool-level budget checks

Individual tools can also reserve budget before costly operations. If the tool's reservation is denied, it returns a skip message instead of failing:

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    tool_key = str(uuid.uuid4())

    # Reserve before the tool call
    res = client.create_reservation(ReservationCreateRequest(
        idempotency_key=tool_key,
        subject=Subject(tenant="acme", toolset="web-search"),
        action=Action(kind="tool.call", name="search-web"),
        estimate=Amount(unit=Unit.USD_MICROCENTS, amount=100_000_000),  # $1.00
        ttl_ms=30_000,
    ))

    if not res.is_success:
        return "Budget exhausted — skipping web search."

    tool_reservation_id = res.get_body_attribute("reservation_id")

    # Execute the tool
    results = your_search_implementation(query)

    # Commit actual usage
    client.commit_reservation(tool_reservation_id, CommitRequest(
        idempotency_key=f"commit-{tool_key}",
        actual=Amount(unit=Unit.USD_MICROCENTS, amount=40_000_000),  # $0.40
    ))

    return results
Enter fullscreen mode Exit fullscreen mode

Multi-tenant scoping

Use the Subject hierarchy to give each customer their own budget scope:

def run_for_customer(customer_id: str, user_input: str):
    return run_agent_with_budget(
        user_input=user_input,
        tenant=customer_id,
        budget_microcents=10_000_000_000,  # $100.00, or pull from the customer's plan
    )
Enter fullscreen mode Exit fullscreen mode

Each customer's spend is tracked independently. One customer burning through their budget doesn't affect others.

Graceful degradation with ALLOW_WITH_CAPS

When budget is running low, Cycles can return ALLOW_WITH_CAPS instead of a hard denial. Use the decision to switch to a cheaper model or limit tool access:

res = client.create_reservation(ReservationCreateRequest(
    idempotency_key=key,
    subject=Subject(tenant=tenant, workflow="research"),
    action=Action(kind="agent.run", name="research-task"),
    estimate=Amount(unit=Unit.USD_MICROCENTS, amount=5_000_000_000),  # $50.00
    ttl_ms=120_000,
))

if not res.is_success:
    # Budget denial arrives as a 409 BUDGET_EXCEEDED error — handle it here
    error = res.get_error_response()
    if error and error.error == "BUDGET_EXCEEDED":
        raise BudgetExceededError(
            error.message, status=res.status,
            error_code=error.error, request_id=error.request_id,
        )
    raise CyclesProtocolError(...)

# On success, decision is ALLOW or ALLOW_WITH_CAPS
decision = res.get_body_attribute("decision")

if decision == "ALLOW_WITH_CAPS":
    # Budget is tight — switch to a cheaper model
    llm = ChatOpenAI(model="gpt-4o-mini")
else:
    # ALLOW — full capacity
    llm = ChatOpenAI(model="gpt-4o")
Enter fullscreen mode Exit fullscreen mode

See Caps and Three-Way Decisions for more on how ALLOW_WITH_CAPS works and what cap fields are available.

What you get

With this pattern in place:

  • Per-tenant isolationSubject(tenant="acme") means each customer's budget is tracked and enforced independently
  • Graceful degradationALLOW_WITH_CAPS lets agents downgrade instead of stopping cold
  • Automatic reconciliation — committing less than the reserved amount releases the difference back to the pool
  • Crash safety — if the agent crashes before committing, the reservation expires automatically and budget is released

Next steps

Top comments (0)