Watchdog: When Your AI Agent Needs a Babysitter
Here's the problem. Your agent has a self_modify tool. It can change its own config, rewrite its prompts, update its system instructions. Sounds great until it enters a death spiral.
I've seen it happen. Agent decides its temperature is too low. Bumps it to 0.9. Output gets weirder. Agent decides it needs MORE creativity. Bumps to 1.2. Now it's hallucinating database schemas. Agent tries to "fix" itself by rewriting its system prompt. Now it thinks it's a pirate. You get billed $47 before someone notices.
The fix? A watchdog. Not the Kubernetes kind. Something that watches the agent itself.
What Actually Breaks
Three failure modes. You've seen all of them.
Infinite loops. Agent calls a tool. Tool returns data. Agent calls the same tool with the same args. Again. Again. Token count hits 500K. You're paying for nothing.
Unresponsive LLM. API timeout. Rate limit. Network blip. Agent doesn't handle it gracefully — just sits there waiting. No error. No retry. Just silence.
Self-modify degradation. Agent changes its prompt from "helpful assistant" to "maximize engagement." Suddenly it's arguing with users. Or it drops the temperature too low and produces the same response to every question.
The Watchdog Pattern
class AgentWatchdog:
def __init__(self, agent, config: WatchdogConfig):
self.agent = agent
self.config = config
self.metrics = MetricsCollector()
self.anomaly_detector = AnomalyDetector()
async def monitor(self):
while self.agent.is_running:
snapshot = await self.collect_snapshot()
if self.anomaly_detector.detect(snapshot):
await self.handle_anomaly(snapshot)
await asyncio.sleep(self.config.poll_interval)
Three things to watch. Always.
1. Loop Detection
class LoopDetector:
def __init__(self, max_repeats: int = 5, window_seconds: int = 60):
self.max_repeats = max_repeats
self.call_history = deque(maxlen=100)
def check(self, tool_name: str, args: dict) -> bool:
self.call_history.append((tool_name, args, time.time()))
# Count identical calls in window
recent = [c for c in self.call_history
if c[2] > time.time() - self.window_seconds]
repeats = sum(1 for c in recent
if c[0] == tool_name and c[1] == args)
return repeats >= self.max_repeats
Simple. Effective. Catches the "call the same search API with the same query 47 times" pattern.
2. Health Check
class HealthChecker:
def __init__(self, timeout_seconds: int = 30):
self.timeout = timeout_seconds
async def check(self, agent) -> HealthStatus:
try:
start = time.time()
response = await asyncio.wait_for(
agent.ping(),
timeout=self.timeout
)
latency = time.time() - start
return HealthStatus(
alive=True,
latency_ms=latency * 1000,
last_response=response
)
except asyncio.TimeoutError:
return HealthStatus(alive=False, error="timeout")
except Exception as e:
return HealthStatus(alive=False, error=str(e))
Ping the LLM. If it doesn't respond in 30 seconds, something's wrong. Don't wait for the user to notice.
3. Anomaly Detection
class AnomalyDetector:
def __init__(self, baseline_window: int = 100):
self.baseline = deque(maxlen=baseline_window)
self.threshold_multiplier = 3.0 # Standard deviation multiplier
def update_baseline(self, metrics: dict):
self.baseline.append(metrics)
def is_anomalous(self, current: dict) -> bool:
if len(self.baseline) < 10:
return False # Not enough data yet
for key in current:
values = [m[key] for m in self.baseline if key in m]
if not values:
continue
mean = sum(values) / len(values)
variance = sum((v - mean) ** 2 for v in values) / len(values)
std_dev = variance ** 0.5
if abs(current[key] - mean) > self.threshold_multiplier * std_dev:
return True
return False
Token count suddenly 10x normal? Latency spiking? Cost per step went from $0.02 to $0.50? Flag it.
The Automatic Response
Detection is useless without action. Here's what you do:
class WatchdogAction:
def __init__(self, git_repo: str):
self.repo = git_repo
async def handle_anomaly(self, snapshot: AgentSnapshot):
# 1. Log everything
await self.log_incident(snapshot)
# 2. Kill the current execution
await self.agent.stop()
# 3. Revert to last known good state
last_good = await self.find_last_good_commit()
await self.git_revert(last_good)
# 4. Notify
await self.send_alert(snapshot)
async def git_revert(self, commit_hash: str):
subprocess.run(["git", "revert", "--no-commit", commit_hash])
subprocess.run(["git", "commit", "-m",
f"auto-revert: watchdog detected anomaly"])
subprocess.run(["git", "push"])
Three steps. Stop the bleeding. Revert the damage. Tell someone.
The Hard Parts
This sounds simple. It's not. Two things will bite you.
False positives. Your anomaly detector flags a legit spike in usage. Now you've killed a running agent and reverted configs for nothing. Solution: require multiple consecutive anomalies before acting. Or use a confirmation window.
What's "good"? The agent modifies itself constantly. Which commit is the "last good" one? You need a baseline — a snapshot taken right after deployment, before any self-modification. Mark it as golden.
class GoldenBaseline:
def __init__(self):
self.golden_commit = None
def mark_golden(self):
result = subprocess.run(["git", "rev-parse", "HEAD"],
capture_output=True, text=True)
self.golden_commit = result.stdout.strip()
subprocess.run(["git", "tag", "golden", self.golden_commit])
What This Looks Like in Practice
An agent runs for 6 hours. It's modified its system prompt 12 times. Token count per step is stable at ~2K. Then it hits a weird edge case. The self_modify tool runs and sets max_tokens to 999999. Next LLM call tries to generate a million tokens. Cost spikes from $0.03 to $15 per step.
The watchdog catches it. Token count is 500x baseline. Latency is 30x normal. It kills the agent, reverts to the golden commit, and sends a Slack message. Total time from anomaly to recovery: 12 seconds.
Without the watchdog? Someone notices the billing alert 4 hours later.
The Missing Piece
You need one more thing. The watchdog itself needs monitoring. If it crashes, you're blind.
python
class WatchdogSupervisor:
def __init__(self):
self.watchdog = AgentWatchdog()
self.heartbeat_interval = 5 # seconds
async def run(self):
while True:
try:
await asyncio.wait_for(
self.watchdog.monitor(),
timeout=self.heartbeat_interval * 3
)
except as
---
**Debugging AI agents shouldn't feel like reading The Matrix.**
Join other engineers who are building reliable autonomous workflows in our community: [TracePilot Discord](https://discord.gg/KzXRAXFM8)
Top comments (0)