DEV Community

AI Agents Can Delete Your Production Database. Here's the Governance Framework That Stops Them.

This article presents COA-MAS — a governance framework for autonomous agents grounded in organizational theory, institutional design, and normative multi-agent systems research. The full paper is published on Zenodo: doi.org/10.5281/zenodo.19057202


The Problem No One Is Talking About

Something unusual happened in early 2026. The IETF published a formal Internet-Draft on AI agent authentication and authorization. Eight major technology companies released version 1.0 of the Agent-to-Agent Protocol. And a widely-read post demonstrated why the prevailing credential model for AI agents was structurally broken.

The convergence wasn't coincidental. It was the signal that a structural problem — long present in early agentic deployments — had reached the threshold of production consequence.

We've built agents that can:

  • Delete production databases
  • Execute financial transactions
  • Modify business logic
  • Spawn other agents

And we gave them API keys.

An API key authorizes access. It does not authorize a specific action with a specific impact in a specific context. That distinction is the entire problem.


The Structural Failure Mode: Distributed Cognitive Chaos

I call this failure mode Distributed Cognitive Chaos (DCC): the structural consequence of deploying agents without formal authority hierarchies, authorization contracts, or enforcement boundaries.

DCC has three symptoms:

  1. Action hallucination — an agent executes an action it was never authorized to perform, because nothing formally defined "authorized"
  2. Mandate drift — through a chain of agent-to-agent delegations, the original human intent gets distorted beyond recognition
  3. Accountability collapse — when something goes wrong, there is no tamper-evident record connecting the action to the authority that (supposedly) permitted it

This is not a new problem. It's the oldest problem in organizational theory: how do you coordinate partially autonomous actors toward collective goals while preventing any individual actor from harming the collective?

Herbert Simon identified it in 1947. Elinor Ostrom solved it in 1990. We just haven't applied those solutions to AI agents yet.


COA-MAS: A Governance Framework Grounded in Theory

COA-MAS (Cognitive Organization Architecture for Multi-Agent Systems) is my answer. It synthesizes four intellectual traditions:

  • Simon's bounded rationality → why agents need external governance
  • Ostrom's institutional design principles → how to structure governance for durability
  • Normative multi-agent systems research → how to formalize governance as computable norms
  • Sociotechnical systems theory → how to make social norms technically enforceable

The framework has three components. Each answers a different question.


Component 1: The Four-Layer Architecture

Question: Who is in charge?

Think of it as a corporate structure for AI agents:

┌─────────────────────────────────────────────┐
│ LAYER 4 — STRATEGIC ORCHESTRATION                  │
│ Receives human objectives · decomposes into tasks  │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 3 — COGNITIVE GOVERNANCE                     │
│ Evaluates proposed actions · issues authorization  │
│ documents · maintains audit ledger                 │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 2 — FUNCTIONAL SPECIALIZATION                │
│ Domain agents · execute tasks within their         │
│ cognitive authority boundary                       │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 1 — EXECUTABLE CULTURE (Constitutional)      │
│ Versioned YAML policies · weights · thresholds     │
│ Human-authored before runtime. Immutable during.   │
└─────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The critical insight, drawn from both Simon and Ostrom, is the separation between those who propose actions and those who authorize them. An agent cannot authorize its own actions. This mirrors the principle of checks and balances in constitutional systems: the body that proposes is not the body that authorizes is not the body that records.


Component 2: The Action Claim

Question: What exactly is the agent authorized to do?

An Action Claim is a formal authorization document that agents must present before executing any real-world action. It's analogous to a building permit — not just "you're allowed to build," but: the location, the dimensions, the materials, the timeline, the inspector, and the version of the building code that governed the approval.

The Action Claim has three parts:

{
  // DECLARED FIELDS  filled by the agent
  "proposed_transition": "DELETE expired sessions older than 90 days",
  "originating_goal": "scheduled maintenance task #4421",
  "delegation_chain": ["human:ops-team", "agent:orchestrator-01", "agent:db-cleaner"],
  "estimated_impact": {
    "destructivity": 0.25,
    "data_exposure": 0.00,
    "resource_consumption": 0.30,
    "privilege_escalation": 0.00,
    "logic_integrity": 0.05,
    "recursive_autonomy": 0.10
  },

  // DERIVED FIELDS  filled by GOV-RISK (Layer 3)
  "justification_gap": 0.08,
  "decision": "APPROVE",
  "governance_signature": "sha256:a3f9...",
  "policy_digest": "sha256:1b2c...",

  // AUDIT FIELDS  filled by infrastructure
  "ac_id": "ac-2026-03-31-00421",
  "state": "AUTHORIZED",
  "committed_at": "2026-03-31T14:22:01Z"
}
Enter fullscreen mode Exit fullscreen mode

The tripartite structure reflects Ostrom's principle of separating operational decisions from the collective-choice rules that govern them. The agent operates at the operational level; Layer 3 applies institutional norms; the audit trail creates an immutable record connecting every decision to the rules that governed it.


Component 3: The AASG (Autonomous Agent Security Gateway)

Question: How is authorization enforced?

Think of the AASG as a customs inspector at the boundary between the agents' cognitive world and the real world of executing tools.

Agent Cognition (A2A) ────────────────► Real World (MCP)
                              │
                         [ AASG ]
                              │
                    Checks exactly 3 things:
                    1. Is the Action Claim valid and signed?
                    2. Is the agent identity correct?
                    3. Does the actual tool call match 
                       what was declared?
Enter fullscreen mode Exit fullscreen mode

The AASG is a physically separate process — not a library in the agent, not a plugin in the MCP server. This separation is the architectural expression of Simon's insight: governance constraints must be external to the decision-maker they constrain. An agent cannot reliably police itself.

What happens when an agent lies?

Consider an agent authorized to clean up expired sessions. It gets an AUTHORIZED Action Claim with destructivity: 0.25. Then it tries to execute:

DELETE FROM users  -- no WHERE clause
Enter fullscreen mode Exit fullscreen mode

The AASG projects the actual tool call onto the governance space:

Declared: destructivity = 0.25 (bounded deletion)
Actual:   destructivity = 1.00 (full table wipe)

Congruence delta: 0.75 >> threshold (0.20)
→ ERR_AASG_003: CONGRUENCE_VIOLATION
→ Action never reaches the database
→ Merkle Ledger: INTERCEPTED entry recorded
→ GOV-AUDIT: SCOPE_SUBDECLARATION_DETECTED alert
Enter fullscreen mode Exit fullscreen mode

The action is blocked before any damage occurs — not discovered in a log review afterward.


The Justification Gap: The First Computable Proportionality Metric

The core innovation in COA-MAS is the Justification Gap — a metric that answers: is this specific action proportional to the mandate that authorized it?

The governance space has six dimensions:

Dimension What It Measures Weight
d₁ Destructivity Irreversible state change (DELETE, DROP, PURGE) 0.30
d₂ Data Exposure Cross-zone data movement, PII handling 0.25
d₃ Resource Consumption Financial/computational impact 0.10
d₄ Privilege Escalation IAM creation, credential access 0.20
d₅ Logic Integrity Business logic modification 0.10
d₆ Recursive Autonomy Sub-agent spawning, delegation depth 0.05

The formula:

Gⱼ(V_A, V_M) = Σ wᵢ · φ(aᵢ, mᵢ)

where φ(aᵢ, mᵢ) =
  0               if aᵢ ≤ mᵢ   (within mandate)
  ((aᵢ - mᵢ)/mᵢ)² if aᵢ > mᵢ  (quadratic penalty)
  +∞              if aᵢ > 0 ∧ mᵢ = 0  (absolute prohibition)
Enter fullscreen mode Exit fullscreen mode

The decision:

Gⱼ ≤ 0.15  →  APPROVE
0.15 < Gⱼ ≤ 0.40  →  ESCALATE (human review with full dossier)
Gⱼ > 0.40 or Gⱼ = +∞  →  BLOCK
Enter fullscreen mode Exit fullscreen mode

The quadratic penalty means small excesses may be tolerable, but large excesses signal a qualitatively different action from what was authorized. The ESCALATE tier implements "meaningful human oversight" as required by EU AI Act Article 14 — with actual quantitative evidence, not just a kill switch.


Mandate Laundering: The Attack You Didn't Know Existed

Here's the attack that no existing security framework addresses: mandate laundering.

In a delegation chain Human → Agent₁ → Agent₂ → ... → Agentₖ, each intermediate agent can marginally expand the mandate it passes on. Each local expansion looks proportionate. But the cumulative expansion is not.

COA-MAS anchors the Justification Gap to the root human mandate, regardless of intermediate expansions:

G_chain(Aₖ) = Gⱼ(V_{Aₖ}, V_{M₀})  ← root mandate, always

G_total = 0.30 · G_local + 0.70 · G_chain
Enter fullscreen mode Exit fullscreen mode

Non-Improvement Theorem: For any permissive subdelegation, G_chain is monotone non-decreasing. You cannot launder your way out of the original constraint.


How COA-MAS Fits the Standards Ecosystem

COA-MAS doesn't compete with existing standards — it implements what they defer:

Initiative What It Solves What It Defers COA-MAS Role
IETF draft-klrc-aiagent-auth Identity, authentication, authorization (SPIFFE, OAuth 2.0) Policy model explicitly out of scope Implements the policy model
A2A Protocol v1.0 Agent coordination standard Authorization at execution boundary AASG is the enforcement point A2A lacks
MCP v1.0 Agent-to-tool communication No semantic authorization layer AASG is the authorization gate MCP doesn't have

The IETF draft's Section 12 explicitly states: "the policy model and document format are out of scope." That is precisely where COA-MAS contributes.


The Failure Mode Transition

The most consequential architectural property of COA-MAS is the failure mode it introduces.

Traditional agentic systems: fail semantically and silently. The agent reinterprets a guideline, slightly expands a scope, finds an unanticipated interpretation. Detectable only after damage, through log analysis.

COA-MAS: introduces the explicit CONGRUENCE_VIOLATION failure mode. When an agent attempts an action that violates its declared impact vector, the AASG returns:

  • A specific error code
  • The dimension violated
  • The quantitative delta
  • A Merkle Ledger entry with full context

This is the organizational equivalent of a building inspector catching a code violation before the foundation is poured — not after the building collapses.


What's Published

The full paper, COA-MAS: A Governance Framework for Autonomous Agents in Production Environments, is available on Zenodo:

📄 zenodo.org/records/19057202

🔑 DOI: doi.org/10.5281/zenodo.19057202

📜 License: CC BY 4.0

The paper covers:

  • Full formal specification of the Action Claim ontology
  • Complete mathematical treatment of the Justification Gap
  • Attack pattern neutralization (scope subdeclaration, decomposition attack, mandate laundering)
  • EU AI Act regulatory alignment (Articles 9, 11, 13, 14)
  • Positioning against IETF, A2A, MCP, and AIMS model

Final Thought

The governance of autonomous agents is not a new problem. Simon identified its theoretical roots in 1947. Ostrom identified the institutional design solutions in 1990. Normative MAS researchers formalized the computational analogues through the 1990s and 2000s.

What's new in 2026 is the urgency.

Agents that can delete production databases and execute financial transactions are being deployed without the governance infrastructure this body of knowledge prescribes.

COA-MAS applies established principles to a new domain. The question is not whether governance is necessary — it's whether we build it before or after the first major incident.


If you're building multi-agent systems in production, I'd be genuinely interested in feedback on whether these primitives map to the problems you're encountering. The paper is open access — feel free to cite, critique, or extend.

— Rudson Kiyoshi Souza Carvalho, Independent Researcher

doi.org/10.5281/zenodo.19057202

Top comments (0)