DEV Community

Patrick Hughes
Patrick Hughes

Posted on • Originally published at bmdpat.com

Stop Runaway LLM Spend: AI Agent Cost Control (Python)

I ran 100 ML experiments overnight with an autonomous AI agent. It worked -- 25% model performance gain, zero human intervention. But before I got comfortable leaving agents running unsupervised, I had to solve one problem first: what happens when an agent loops, hallucinates tool calls, or just keeps going when it should have stopped?

The answer, without guardrails, is a large API bill and a lot of regret.

So I built AgentGuard -- a Python SDK that enforces budget, token, time, and rate limits on AI agents at runtime. This post covers the problem it solves, how it works, and how to drop it into your own agents in about four lines of code.


The Real Problem: Agents Don't Know When to Stop

Monitoring tools like LangSmith and Langfuse are useful. They tell you what happened. But they can't stop anything mid-run. By the time you get a cost alert from your dashboard, the damage is done.

The failure modes are predictable:

  • Loop traps: An agent keeps calling the same tool because each response is slightly different and the termination condition never triggers
  • Hallucinated tool chains: The agent invents multi-step plans that require dozens of API calls to execute
  • Runaway research tasks: A research agent finds one more source, then another, then another -- with no concept of diminishing returns
  • Clock drift: A task that should take 2 minutes runs for 45 because something upstream is slow

None of these are bugs in the traditional sense. They're emergent behaviors from giving a language model agency. The fix isn't better prompting -- it's enforcement at the infrastructure level.


How AgentGuard Works

AgentGuard installs as a lightweight wrapper around your existing LLM calls. It tracks cost, tokens, time, and tool call patterns in real time, and kills the agent the moment any limit is breached.

Install it:

pip install agentguard47
Enter fullscreen mode Exit fullscreen mode

Drop it into an existing OpenAI agent in four lines:

from agentguard import Tracer, BudgetGuard, patch_openai
tracer = Tracer(guards=[BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)])
patch_openai(tracer)
# From here, all OpenAI calls are tracked and enforced automatically
Enter fullscreen mode Exit fullscreen mode

That's it. Your existing agent code doesn't change. AgentGuard patches the OpenAI client and intercepts every call. When the cumulative cost hits $4.00 (80% of the limit), it fires a warning callback. At $5.00, it raises a BudgetExceededError and terminates the run.


The Full Guard Suite

A dollar budget isn't always the right constraint. AgentGuard ships five guard types:

BudgetGuard -- Hard dollar and token limits with configurable warning thresholds. Supports per-model pricing for OpenAI, Anthropic, Google, Mistral, and Meta out of the box.

BudgetGuard(max_cost_usd=10.00, max_tokens=100_000, warn_at_pct=0.75)
Enter fullscreen mode Exit fullscreen mode

LoopGuard -- Detects exact repeated tool calls. Useful when an agent is stuck calling the same function with the same arguments.

LoopGuard(max_repeats=3)
Enter fullscreen mode Exit fullscreen mode

FuzzyLoopGuard -- Detects similar (not identical) patterns. Better for agents that slightly vary their inputs but are functionally stuck.

FuzzyLoopGuard(max_tool_repeats=5)
Enter fullscreen mode Exit fullscreen mode

TimeoutGuard -- Wall-clock enforcement. If the agent hasn't finished in N seconds, it's terminated.

TimeoutGuard(max_seconds=300)
Enter fullscreen mode Exit fullscreen mode

RateLimitGuard -- Caps calls per minute. Useful for shared environments or when you're working within upstream API rate limits.

RateLimitGuard(max_calls_per_minute=60)
Enter fullscreen mode Exit fullscreen mode

Combine them:

tracer = Tracer(guards=[
    BudgetGuard(max_cost_usd=5.00),
    LoopGuard(max_repeats=3),
    TimeoutGuard(max_seconds=600),
])
Enter fullscreen mode Exit fullscreen mode

Framework Support

Most real agents aren't built with raw OpenAI calls. AgentGuard integrates with the frameworks people actually use:

  • LangChain: AgentGuardCallbackHandler plugs into the standard callback interface
  • LangGraph: @guarded_node decorator wraps individual nodes
  • CrewAI: AgentGuardCrewHandler via step callbacks
  • Direct patching: patch_openai() and patch_anthropic() for everything else

Tracing and Evaluation

Every run generates a JSONL trace file with full event history, span data, and cost attribution. You can run assertions against it after the fact:

from agentguard import EvalSuite
EvalSuite("traces.jsonl") \
    .assert_no_loops() \
    .assert_budget_under(tokens=50_000) \
    .assert_completes_within(seconds=30) \
    .run()
Enter fullscreen mode Exit fullscreen mode

Useful for testing agent behavior before deploying to production, or for building regression tests after you've tuned a prompt.


Why Not Just Use LangSmith / Langfuse?

Observability tools are necessary but not sufficient. They show you the trace after execution. AgentGuard acts during execution. It's the difference between a security camera and a deadbolt.

Feature LangSmith / Langfuse AgentGuard
Cost monitoring Yes Yes
Hard budget enforcement No Yes
Kill switch mid-run No Yes
Loop detection No Yes
Zero external dependencies No Yes
Self-hosted Partial Yes

AgentGuard isn't an alternative to observability -- it's what you add when you move from development to unsupervised production runs.


Putting It Into Production

The setup I use for serious overnight runs:

from agentguard import Tracer, BudgetGuard, LoopGuard, TimeoutGuard, patch_anthropic
import logging

def on_warning(event):
    logging.warning(f"AgentGuard warning: {event}")

tracer = Tracer(
    guards=[
        BudgetGuard(max_cost_usd=20.00, warn_at_pct=0.8, on_warn=on_warning),
        LoopGuard(max_repeats=4),
        TimeoutGuard(max_seconds=3600),
    ],
    trace_file="run_traces.jsonl"
)
patch_anthropic(tracer)
Enter fullscreen mode Exit fullscreen mode

That's a $20 hard cap, a warning at $16, loop detection, and a 1-hour timeout. If any of those trigger, the agent stops cleanly and I get a trace file explaining exactly what happened.


If you're building autonomous agents, this is the infrastructure piece most people skip until they get burned. MIT licensed, 93% test coverage, zero runtime dependencies.

pip install agentguard47 or check the full docs at bmdpat.com/tools/agentguard.

Top comments (0)