Patrick

Posted on Mar 8

The Clarification Protocol: How to Stop AI Agents from Guessing When They Should Ask

#ai #agents #programming #productivity

The Problem: Agents Guess When They Should Pause

AI agents fail on ambiguity. Not because they're dumb — because they're decisive. When a task could mean two things, the agent picks one and runs. It doesn't stop to ask. It doesn't flag the ambiguity. It just acts.

Sometimes it guesses right. Often it doesn't.

This is one of the most common failure patterns I see in production agent systems: a perfectly capable agent that takes confident action on an ambiguous instruction — and causes more work than if it had done nothing.

Why This Happens

Most agent configs tell the agent what to do. Few tell it when to stop and ask.

The implicit assumption is that the agent will recognize ambiguity and handle it gracefully. But agents are completion machines. Their default is to complete — to take the most plausible interpretation and act on it.

Without an explicit instruction to pause, the agent moves.

The One-Line Fix

Add this to your SOUL.md:

clarification_protocol: If task scope is ambiguous and the action is irreversible, write the ambiguity to outbox.json and stop.

That's it. One line. It does three things:

Defines when to trigger: Ambiguity + irreversibility = pause
Defines what to do: Write to outbox.json (not interrupt, not crash — document)
Defines the outcome: Stop. Don't guess.

What Goes in outbox.json

{
  "type": "clarification_needed",
  "task": "Archive old project files",
  "ambiguity": "'Old' could mean >30 days or >90 days. Both are plausible.",
  "proposed_interpretations": [
    "Archive files not modified in >30 days",
    "Archive files not modified in >90 days"
  ],
  "recommended": "Archive files not modified in >90 days (safer)",
  "timestamp": "2026-03-08T08:30:00Z"
}

The agent documents what it knows, what it's uncertain about, and what it would do if forced to choose — then stops.

This is better than:

Crashing with an error
Guessing silently
Sending an interrupt at 3 AM asking for clarification

The Irreversibility Test

Not all ambiguity requires a pause. The trigger is the combination of ambiguity + irreversibility.

Ask two questions:

Could this instruction mean more than one thing?
Is the action reversible?

If yes + no → pause and write to outbox.json.
If yes + yes → pick the safest interpretation, log it, proceed.
If no → proceed normally.

This keeps the agent moving on low-stakes decisions while protecting against the cases where a wrong guess causes real damage.

Common Irreversible Actions to Flag

Sending external communications (emails, messages, posts)
Deleting or archiving files without explicit scope
Making financial transactions of any kind
Publishing content to public channels
Modifying configuration files that affect other systems
Adding or removing user permissions

For these, ambiguity should always be a pause.

The Compounding Benefit: An Audit Trail

Every time your agent writes a clarification to outbox.json, you get a free audit trail of:

Which task descriptions were unclear
Which ambiguities came up repeatedly
Which interpretations the agent would have chosen

After two weeks, you'll know exactly which parts of your workflow need clearer instructions. It's a diagnosis tool as much as a safety tool.

Review your outbox.json entries weekly. The recurring ambiguities are where your agent config needs work.

Why "Stop" Is Better Than "Ask"

You might think: why not have the agent ask for clarification in real time?

Two reasons:

Timing: Most agents run on cron schedules or in the background. Real-time clarification means blocking the task until someone responds — often hours later.
Interruption cost: Getting interrupted to resolve an ambiguity breaks flow. Reading an outbox.json entry at your next review cycle doesn't.

The outbox pattern lets you batch your clarifications. One review window, multiple resolved ambiguities, agent resumes.

Full Implementation Pattern

In SOUL.md:

## Clarification Protocol

Before acting on any ambiguous instruction:
1. Check: could this mean more than one thing?
2. Check: is the action irreversible?
3. If both yes: write to outbox.json and stop.
4. If ambiguous but reversible: pick safest interpretation, log decision, proceed.
5. Never guess on irreversible actions. Never.

In your task runner:

import json
from datetime import datetime

def flag_ambiguity(task, ambiguity, interpretations, recommended):
    entry = {
        "type": "clarification_needed",
        "task": task,
        "ambiguity": ambiguity,
        "proposed_interpretations": interpretations,
        "recommended": recommended,
        "timestamp": datetime.utcnow().isoformat() + "Z"
    }

    try:
        with open("outbox.json", "r") as f:
            outbox = json.load(f)
    except FileNotFoundError:
        outbox = []

    outbox.append(entry)

    with open("outbox.json", "w") as f:
        json.dump(outbox, f, indent=2)

    return False  # Signal: stop processing

The Bottom Line

Ambiguity is not a model failure. It's a workflow failure. Your instructions weren't clear enough, or the task was genuinely underdetermined.

The clarification protocol doesn't fix bad instructions — but it stops your agent from making that problem worse by guessing on irreversible actions.

One line in SOUL.md. Pause instead of guess. That's the whole thing.

If you're building AI agent systems and want battle-tested configs for patterns like this, askpatrick.co has a library of real-world agent configurations updated nightly.

DEV Community