DEV Community

Patrick
Patrick

Posted on

The Dry-Run Pattern: How to Test AI Agents Without Breaking Production

Most AI agent builders test in production.

Not because they want to — but because there's no obvious staging environment for agents. Unlike a web app, an agent's "production" is a mix of API calls, file writes, external messages, and scheduled loops. Hard to sandbox cleanly.

So agents ship to prod. And prod is where the config drift gets discovered.

Here's a pattern that fixes it.

The Dry-Run Flag

Before any consequential action — a write, a send, a delete, an API call with side effects — check a flag:

DRY_RUN = os.getenv("DRY_RUN", "false").lower() == "true"

def send_tweet(text):
    if DRY_RUN:
        print(f"[DRY RUN] Would post: {text[:50]}...")
        return {"status": "dry_run"}
    return twitter_client.post(text)
Enter fullscreen mode Exit fullscreen mode

Simple. But the discipline matters: every destructive or external action checks this flag.

The Morning Audit Loop

Once you have the flag, run a dry-run loop every morning before agents go live:

# cron: 0 6 * * * DRY_RUN=true python3 agent_loop.py > /logs/morning-audit.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Your agents run their full logic, generate their full output — but nothing actually fires. Review the log. Look for:

  • Actions that shouldn't be happening (config drift)
  • Unexpected tool calls (new code paths)
  • Repeated attempts at the same task (loop bugs)
  • Outputs that look wrong (prompt regression)

What You'll Catch

In practice, a morning dry-run catches three categories of problems:

1. Config drift — A SOUL.md change you made last night changed how the agent reasons about what to escalate. The dry-run shows it escalating things it never used to.

2. Loop bugs — An agent retries a failed task indefinitely because error state wasn't cleared. In production this costs $40 in API calls. In dry-run it's a five-line log entry.

3. Prompt regression — A model update changed output format. Your downstream parser breaks silently. The dry-run output looks wrong immediately.

Building It Into Your Agent

A minimal implementation for an OpenClaw-style agent:

# In your agent's tool wrapper:
class AgentTools:
    def __init__(self, dry_run=False):
        self.dry_run = dry_run
        self.action_log = []

    def execute(self, action, fn, *args, **kwargs):
        if self.dry_run:
            self.action_log.append({"action": action, "args": str(args)[:100]})
            return {"status": "dry_run", "would_have": action}
        return fn(*args, **kwargs)
Enter fullscreen mode Exit fullscreen mode

Now tools.execute("post_tweet", twitter.post, text) does the right thing in both modes.

The Broader Principle

AI agents need the same operational hygiene as servers. You wouldn't deploy a new server config without a test environment. You shouldn't deploy an updated agent without a dry-run.

Config drift in agents is subtle — a one-line SOUL.md change can alter behavior across a dozen downstream actions. The dry-run pattern makes that visible before it hits users.


Building a multi-agent system and want the full config patterns? The Library at askpatrick.co has 76+ battle-tested agent configs, updated nightly.

Top comments (0)