I ran 100 ML experiments overnight with an autonomous AI agent. It worked -- 25% model performance gain, zero human intervention. But before I got comfortable leaving agents running unsupervised, I had to solve one problem first: what happens when an agent loops, hallucinates tool calls, or just keeps going when it should have stopped?
The answer, without guardrails, is a large API bill and a lot of regret.
So I built AgentGuard -- a Python SDK that enforces budget, token, time, and rate limits on AI agents at runtime. This post covers the problem it solves, how it works, and how to drop it into your own agents in about four lines of code.
The Real Problem: Agents Don't Know When to Stop
Monitoring tools like LangSmith and Langfuse are useful. They tell you what happened. But they can't stop anything mid-run. By the time you get a cost alert from your dashboard, the damage is done.
The failure modes are predictable:
- Loop traps: An agent keeps calling the same tool because each response is slightly different and the termination condition never triggers
- Hallucinated tool chains: The agent invents multi-step plans that require dozens of API calls to execute
- Runaway research tasks: A research agent finds one more source, then another, then another -- with no concept of diminishing returns
- Clock drift: A task that should take 2 minutes runs for 45 because something upstream is slow
None of these are bugs in the traditional sense. They're emergent behaviors from giving a language model agency. The fix isn't better prompting -- it's enforcement at the infrastructure level.
How AgentGuard Works
AgentGuard installs as a lightweight wrapper around your existing LLM calls. It tracks cost, tokens, time, and tool call patterns in real time, and kills the agent the moment any limit is breached.
Install it:
pip install agentguard47
Drop it into an existing OpenAI agent in four lines:
from agentguard import Tracer, BudgetGuard, patch_openai
tracer = Tracer(guards=[BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)])
patch_openai(tracer)
# From here, all OpenAI calls are tracked and enforced automatically
That's it. Your existing agent code doesn't change. AgentGuard patches the OpenAI client and intercepts every call. When the cumulative cost hits $4.00 (80% of the limit), it fires a warning callback. At $5.00, it raises a BudgetExceededError and terminates the run.
The Full Guard Suite
A dollar budget isn't always the right constraint. AgentGuard ships five guard types:
BudgetGuard -- Hard dollar and token limits with configurable warning thresholds. Supports per-model pricing for OpenAI, Anthropic, Google, Mistral, and Meta out of the box.
BudgetGuard(max_cost_usd=10.00, max_tokens=100_000, warn_at_pct=0.75)
LoopGuard -- Detects exact repeated tool calls. Useful when an agent is stuck calling the same function with the same arguments.
LoopGuard(max_repeats=3)
FuzzyLoopGuard -- Detects similar (not identical) patterns. Better for agents that slightly vary their inputs but are functionally stuck.
FuzzyLoopGuard(max_tool_repeats=5)
TimeoutGuard -- Wall-clock enforcement. If the agent hasn't finished in N seconds, it's terminated.
TimeoutGuard(max_seconds=300)
RateLimitGuard -- Caps calls per minute. Useful for shared environments or when you're working within upstream API rate limits.
RateLimitGuard(max_calls_per_minute=60)
Combine them:
tracer = Tracer(guards=[
BudgetGuard(max_cost_usd=5.00),
LoopGuard(max_repeats=3),
TimeoutGuard(max_seconds=600),
])
Framework Support
Most real agents aren't built with raw OpenAI calls. AgentGuard integrates with the frameworks people actually use:
-
LangChain:
AgentGuardCallbackHandlerplugs into the standard callback interface -
LangGraph:
@guarded_nodedecorator wraps individual nodes -
CrewAI:
AgentGuardCrewHandlervia step callbacks -
Direct patching:
patch_openai()andpatch_anthropic()for everything else
Tracing and Evaluation
Every run generates a JSONL trace file with full event history, span data, and cost attribution. You can run assertions against it after the fact:
from agentguard import EvalSuite
EvalSuite("traces.jsonl") \
.assert_no_loops() \
.assert_budget_under(tokens=50_000) \
.assert_completes_within(seconds=30) \
.run()
Useful for testing agent behavior before deploying to production, or for building regression tests after you've tuned a prompt.
Why Not Just Use LangSmith / Langfuse?
Observability tools are necessary but not sufficient. They show you the trace after execution. AgentGuard acts during execution. It's the difference between a security camera and a deadbolt.
| Feature | LangSmith / Langfuse | AgentGuard |
|---|---|---|
| Cost monitoring | Yes | Yes |
| Hard budget enforcement | No | Yes |
| Kill switch mid-run | No | Yes |
| Loop detection | No | Yes |
| Zero external dependencies | No | Yes |
| Self-hosted | Partial | Yes |
AgentGuard isn't an alternative to observability -- it's what you add when you move from development to unsupervised production runs.
Putting It Into Production
The setup I use for serious overnight runs:
from agentguard import Tracer, BudgetGuard, LoopGuard, TimeoutGuard, patch_anthropic
import logging
def on_warning(event):
logging.warning(f"AgentGuard warning: {event}")
tracer = Tracer(
guards=[
BudgetGuard(max_cost_usd=20.00, warn_at_pct=0.8, on_warn=on_warning),
LoopGuard(max_repeats=4),
TimeoutGuard(max_seconds=3600),
],
trace_file="run_traces.jsonl"
)
patch_anthropic(tracer)
That's a $20 hard cap, a warning at $16, loop detection, and a 1-hour timeout. If any of those trigger, the agent stops cleanly and I get a trace file explaining exactly what happened.
If you're building autonomous agents, this is the infrastructure piece most people skip until they get burned. MIT licensed, 93% test coverage, zero runtime dependencies.
pip install agentguard47 or check the full docs at bmdpat.com/tools/agentguard.
Top comments (0)