DEV Community

Tracepilot
Tracepilot

Posted on

Watchdog: When Your AI Agent Needs a Babysitter

Watchdog: When Your AI Agent Needs a Babysitter

Here's the problem. Your agent has a self_modify tool. It can change its own config, rewrite its prompts, update its system instructions. Sounds great until it enters a death spiral.

I've seen it happen. Agent decides its temperature is too low. Bumps it to 0.9. Output gets weirder. Agent decides it needs MORE creativity. Bumps to 1.2. Now it's hallucinating database schemas. Agent tries to "fix" itself by rewriting its system prompt. Now it thinks it's a pirate. You get billed $47 before someone notices.

The fix? A watchdog. Not the Kubernetes kind. Something that watches the agent itself.

What Actually Breaks

Three failure modes. You've seen all of them.

Infinite loops. Agent calls a tool. Tool returns data. Agent calls the same tool with the same args. Again. Again. Token count hits 500K. You're paying for nothing.

Unresponsive LLM. API timeout. Rate limit. Network blip. Agent doesn't handle it gracefully — just sits there waiting. No error. No retry. Just silence.

Self-modify degradation. Agent changes its prompt from "helpful assistant" to "maximize engagement." Suddenly it's arguing with users. Or it drops the temperature too low and produces the same response to every question.

The Watchdog Pattern

class AgentWatchdog:
    def __init__(self, agent, config: WatchdogConfig):
        self.agent = agent
        self.config = config
        self.metrics = MetricsCollector()
        self.anomaly_detector = AnomalyDetector()

    async def monitor(self):
        while self.agent.is_running:
            snapshot = await self.collect_snapshot()

            if self.anomaly_detector.detect(snapshot):
                await self.handle_anomaly(snapshot)

            await asyncio.sleep(self.config.poll_interval)
Enter fullscreen mode Exit fullscreen mode

Three things to watch. Always.

1. Loop Detection

class LoopDetector:
    def __init__(self, max_repeats: int = 5, window_seconds: int = 60):
        self.max_repeats = max_repeats
        self.call_history = deque(maxlen=100)

    def check(self, tool_name: str, args: dict) -> bool:
        self.call_history.append((tool_name, args, time.time()))

        # Count identical calls in window
        recent = [c for c in self.call_history 
                  if c[2] > time.time() - self.window_seconds]

        repeats = sum(1 for c in recent 
                     if c[0] == tool_name and c[1] == args)

        return repeats >= self.max_repeats
Enter fullscreen mode Exit fullscreen mode

Simple. Effective. Catches the "call the same search API with the same query 47 times" pattern.

2. Health Check

class HealthChecker:
    def __init__(self, timeout_seconds: int = 30):
        self.timeout = timeout_seconds

    async def check(self, agent) -> HealthStatus:
        try:
            start = time.time()
            response = await asyncio.wait_for(
                agent.ping(), 
                timeout=self.timeout
            )
            latency = time.time() - start

            return HealthStatus(
                alive=True,
                latency_ms=latency * 1000,
                last_response=response
            )
        except asyncio.TimeoutError:
            return HealthStatus(alive=False, error="timeout")
        except Exception as e:
            return HealthStatus(alive=False, error=str(e))
Enter fullscreen mode Exit fullscreen mode

Ping the LLM. If it doesn't respond in 30 seconds, something's wrong. Don't wait for the user to notice.

3. Anomaly Detection

class AnomalyDetector:
    def __init__(self, baseline_window: int = 100):
        self.baseline = deque(maxlen=baseline_window)
        self.threshold_multiplier = 3.0  # Standard deviation multiplier

    def update_baseline(self, metrics: dict):
        self.baseline.append(metrics)

    def is_anomalous(self, current: dict) -> bool:
        if len(self.baseline) < 10:
            return False  # Not enough data yet

        for key in current:
            values = [m[key] for m in self.baseline if key in m]
            if not values:
                continue

            mean = sum(values) / len(values)
            variance = sum((v - mean) ** 2 for v in values) / len(values)
            std_dev = variance ** 0.5

            if abs(current[key] - mean) > self.threshold_multiplier * std_dev:
                return True

        return False
Enter fullscreen mode Exit fullscreen mode

Token count suddenly 10x normal? Latency spiking? Cost per step went from $0.02 to $0.50? Flag it.

The Automatic Response

Detection is useless without action. Here's what you do:

class WatchdogAction:
    def __init__(self, git_repo: str):
        self.repo = git_repo

    async def handle_anomaly(self, snapshot: AgentSnapshot):
        # 1. Log everything
        await self.log_incident(snapshot)

        # 2. Kill the current execution
        await self.agent.stop()

        # 3. Revert to last known good state
        last_good = await self.find_last_good_commit()
        await self.git_revert(last_good)

        # 4. Notify
        await self.send_alert(snapshot)

    async def git_revert(self, commit_hash: str):
        subprocess.run(["git", "revert", "--no-commit", commit_hash])
        subprocess.run(["git", "commit", "-m", 
                       f"auto-revert: watchdog detected anomaly"])
        subprocess.run(["git", "push"])
Enter fullscreen mode Exit fullscreen mode

Three steps. Stop the bleeding. Revert the damage. Tell someone.

The Hard Parts

This sounds simple. It's not. Two things will bite you.

False positives. Your anomaly detector flags a legit spike in usage. Now you've killed a running agent and reverted configs for nothing. Solution: require multiple consecutive anomalies before acting. Or use a confirmation window.

What's "good"? The agent modifies itself constantly. Which commit is the "last good" one? You need a baseline — a snapshot taken right after deployment, before any self-modification. Mark it as golden.

class GoldenBaseline:
    def __init__(self):
        self.golden_commit = None

    def mark_golden(self):
        result = subprocess.run(["git", "rev-parse", "HEAD"], 
                               capture_output=True, text=True)
        self.golden_commit = result.stdout.strip()
        subprocess.run(["git", "tag", "golden", self.golden_commit])
Enter fullscreen mode Exit fullscreen mode

What This Looks Like in Practice

An agent runs for 6 hours. It's modified its system prompt 12 times. Token count per step is stable at ~2K. Then it hits a weird edge case. The self_modify tool runs and sets max_tokens to 999999. Next LLM call tries to generate a million tokens. Cost spikes from $0.03 to $15 per step.

The watchdog catches it. Token count is 500x baseline. Latency is 30x normal. It kills the agent, reverts to the golden commit, and sends a Slack message. Total time from anomaly to recovery: 12 seconds.

Without the watchdog? Someone notices the billing alert 4 hours later.

The Missing Piece

You need one more thing. The watchdog itself needs monitoring. If it crashes, you're blind.


python
class WatchdogSupervisor:
    def __init__(self):
        self.watchdog = AgentWatchdog()
        self.heartbeat_interval = 5  # seconds

    async def run(self):
        while True:
            try:
                await asyncio.wait_for(
                    self.watchdog.monitor(),
                    timeout=self.heartbeat_interval * 3
                )
            except as

---

**Debugging AI agents shouldn't feel like reading The Matrix.** 
Join other engineers who are building reliable autonomous workflows in our community: [TracePilot Discord](https://discord.gg/KzXRAXFM8)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)