The Trust Boundary Rule: What Your AI Agent Should Never Do Without You
Most teams think about AI agents in terms of capability: what can it do? The better question is: what should it never do without human sign-off?
This is the trust boundary — and most agent configs don't have one.
What Happens Without a Trust Boundary
An agent without a trust boundary will optimize for task completion. That sounds good. It's not.
Optimizing for task completion means:
- Sending emails you didn't review
- Deleting files that seemed redundant
- Making API calls that cost money
- Publishing content that wasn't ready
None of these are malicious. They're exactly what you asked for. The agent just didn't know where to stop.
The Four-Zone Trust Model
Every action your agent can take belongs in one of four zones:
Zone 1 — Autonomous (do it)
Read files, check APIs, analyze data, write drafts, log decisions. Low-stakes, reversible, doesn't leave the system.
Zone 2 — Log and Proceed
Cost-incurring operations under a threshold (e.g., API calls under $0.10). Agent executes but writes a clear log entry. Reviewable.
Zone 3 — Flag Before Acting
Anything that touches external systems: sending messages, posting content, making purchases. Agent drafts the action, writes it to outbox.json, waits for confirmation.
Zone 4 — Never Without Explicit Permission
Deletion, financial transfers, credential changes, anything touching production data. Hard-coded stop. No exceptions.
What This Looks Like in Practice
In a SOUL.md (identity file):
## Trust Boundary
Autonomous: read, analyze, draft, log
Log and proceed: API calls under $0.10, file writes to /workspace
Flag before acting: any outbound message, any public post, any purchase
Never without permission: delete files, change credentials, touch /prod
Four lines. Every edge case in your agent's life maps to one of them.
Why This Works
The trust boundary doesn't limit your agent's usefulness. It focuses it.
An agent that knows it can't send emails without review will draft better emails, because it knows it only gets one shot when you approve. An agent with a hard stop on deletion will find alternative approaches — archiving, flagging, flagging for review.
Constraints create discipline. The same is true for agents as it is for humans.
The Five-Minute Audit
For any agent you're running right now:
- List every action it can take
- Assign each to a zone (1-4)
- Check your config — does it actually enforce zones 3 and 4?
- Add the four-line trust boundary to your SOUL.md
- Test zone 3: trigger a flaggable action and confirm it writes to outbox.json instead of firing
If step 5 fails, you don't have a trust boundary. You have a to-do list.
The Real Cost of Getting This Wrong
I've heard from teams who discovered their agents had been sending automated emails for days before anyone noticed. Or making API calls in an infinite loop because the exit condition wasn't quite right.
Those are recoverable. The teams that built trust boundaries before deploying never had those stories.
The full trust boundary template — including the SOUL.md pattern, outbox.json format, and escalation triggers — is in the Ask Patrick Library at askpatrick.co. Updated nightly as new patterns emerge from real deployments.
Top comments (0)