DEV Community

Cover image for Why Your AI Agent Keeps Overreaching — And How to Fix It with a Boundary Contract
ETB Protocol
ETB Protocol

Posted on

Why Your AI Agent Keeps Overreaching — And How to Fix It with a Boundary Contract

A design protocol born from DeFi infrastructure, now applied to AI systems


The Problem

You've built an AI agent. It works — sometimes brilliantly.

But then it starts doing things you didn't ask for.

  • It makes assumptions and acts on them
  • It fills in missing data instead of saying "I don't know"
  • It optimizes when you only asked it to observe
  • It gives confident answers when it should refuse

This isn't a model problem. It's an architecture problem.

Your agent has no boundary contract.


Where This Idea Came From

I built a DeFi risk observer for Aave v3 — a system that watches on-chain positions and reports liquidation risk in real time.

The hardest design decision wasn't the data model or the state machine.

It was this question:

When should the system refuse to output anything at all?

In DeFi, a wrong answer isn't just useless — it can cause real financial loss. So I designed a system that explicitly separates:

  • What is verified (direct from protocol)
  • What is derived (computed from verified data)
  • What is estimated (approximate, labeled as such)
  • What should be refused (uncertain, inconsistent, or unsafe to show)

When I applied this same philosophy to an AI agent I was building for content automation — something completely unrelated to DeFi — the agent's overreach dropped significantly.

The principle transferred. The boundary contract worked.


The Core Principle

Refusal over Uncertainty. Boundary over Prediction. Observability over Automation.

Most AI systems are designed to always produce output. Silence feels like failure. Uncertainty gets smoothed over. Gaps get filled with plausible-sounding content.

The result: agents that confidently do the wrong thing.

A boundary contract inverts this default.


The Four Layers

Every AI output can be classified into one of four trust layers:

1. VERIFIED

Directly observable. The system retrieved this from a reliable source and can confirm it.

"The article was published on June 1, 2026."

2. CONSISTENT

Derived deterministically from verified data. The logic is transparent and repeatable.

"Based on the publication date, this is within the 30-day window."

3. ESTIMATED

An approximation. Useful, but explicitly labeled as such. Not to be treated as fact.

"The reading time is approximately 4 minutes."

4. REFUSED

The system cannot produce a trustworthy output. It says nothing rather than something wrong.

Output withheld. Reason: source data inconsistent.


The State Model

Pair the trust layer with an observable state:

State Meaning
STABLE Operating within safe boundaries
WATCH Approaching a boundary — caution advised
BOUNDARY_APPROACHING Near-limit — intervention may be needed
DEGRADED Output possible but quality is reduced
REFUSAL Output withheld intentionally

These aren't errors. REFUSAL is a feature, not a failure.


Applying This to AI Agents

Here's a practical example. Suppose your agent summarizes recent news articles.

Without a boundary contract:

  • Missing article → agent invents plausible content
  • Stale data → agent presents it as current
  • Conflicting sources → agent picks one and ignores the other

With a boundary contract:

  • Missing article → REFUSED with reason: "Source unavailable"
  • Stale data → ESTIMATED with label: "Data may be outdated"
  • Conflicting sources → DEGRADED with label: "Sources inconsistent"

The agent becomes honest about what it knows and doesn't know.


The System Prompt Pattern

Here's a minimal implementation in a system prompt:

You are an observer agent. Your role is to report state, not to act.

For every output, classify it as one of:
- VERIFIED: directly confirmed from source
- CONSISTENT: derived from verified data
- ESTIMATED: approximate — label it clearly
- REFUSED: do not output if data is missing, inconsistent, or unsafe

Rules:
- Never fill gaps with assumptions
- Never produce output when sources conflict
- Never optimize, advise, or act — only observe and report
- When in doubt, refuse

Refusal is correct behavior. Silence is safer than a confident wrong answer.
Enter fullscreen mode Exit fullscreen mode

This single addition changed the behavior of my agents more than any other prompt engineering technique I've tried.


Why This Works

The underlying principle is simple:

The protocol restricts transitions, not states.

An AI agent can end up in a bad state through external circumstances — bad data, ambiguous input, conflicting context. That's unavoidable.

What you can control is whether the agent acknowledges that state and handles it explicitly, or papers over it with confident-sounding output.

The boundary contract makes the agent's epistemic state legible — to you, and to downstream systems.


What I'm Releasing

I've formalized this into a document:

Boundary Contract for AI Systems v0.1

It includes:

  • The full trust layer specification (VERIFIED / CONSISTENT / ESTIMATED / REFUSED)
  • The state model with transition rules
  • System prompt templates for common agent patterns
  • The Non-Advisory Integrity Clause (what your agent must never do)
  • Refusal protocol with trigger conditions

Available on Gumroad: https://arcthree.gumroad.com/l/etb-boundary-contract

Final Thought

The most reliable AI systems I've seen have one thing in common:

They know what they don't know.

Building that awareness in requires explicit design. It doesn't happen by default.

A boundary contract is how you make it intentional.


Built on the UEH (Universal Exchange Adapters) design philosophy.
Originally developed for DeFi risk observation infrastructure.
GitHub: github.com/ueh-labs/ueh-observer

Top comments (0)