Rudson Kiyoshi Souza Carvalho

Posted on Mar 31

AI Agents Can Delete Your Production Database. Here's the Governance Framework That Stops Them.

#ai #security #architecture #multiagent

This article presents COA-MAS — a governance framework for autonomous agents grounded in organizational theory, institutional design, and normative multi-agent systems research. The full paper is published on Zenodo: doi.org/10.5281/zenodo.19057202

The Problem No One Is Talking About

Something unusual happened in early 2026. The IETF published a formal Internet-Draft on AI agent authentication and authorization. Eight major technology companies released version 1.0 of the Agent-to-Agent Protocol. And a widely-read post demonstrated why the prevailing credential model for AI agents was structurally broken.

The convergence wasn't coincidental. It was the signal that a structural problem — long present in early agentic deployments — had reached the threshold of production consequence.

We've built agents that can:

Delete production databases
Execute financial transactions
Modify business logic
Spawn other agents

And we gave them API keys.

An API key authorizes access. It does not authorize a specific action with a specific impact in a specific context. That distinction is the entire problem.

The Structural Failure Mode: Distributed Cognitive Chaos

I call this failure mode Distributed Cognitive Chaos (DCC): the structural consequence of deploying agents without formal authority hierarchies, authorization contracts, or enforcement boundaries.

DCC has three symptoms:

Action hallucination — an agent executes an action it was never authorized to perform, because nothing formally defined "authorized"
Mandate drift — through a chain of agent-to-agent delegations, the original human intent gets distorted beyond recognition
Accountability collapse — when something goes wrong, there is no tamper-evident record connecting the action to the authority that (supposedly) permitted it

This is not a new problem. It's the oldest problem in organizational theory: how do you coordinate partially autonomous actors toward collective goals while preventing any individual actor from harming the collective?

Herbert Simon identified it in 1947. Elinor Ostrom solved it in 1990. We just haven't applied those solutions to AI agents yet.

COA-MAS: A Governance Framework Grounded in Theory

COA-MAS (Cognitive Organization Architecture for Multi-Agent Systems) is my answer. It synthesizes four intellectual traditions:

Simon's bounded rationality → why agents need external governance
Ostrom's institutional design principles → how to structure governance for durability
Normative multi-agent systems research → how to formalize governance as computable norms
Sociotechnical systems theory → how to make social norms technically enforceable

The framework has three components. Each answers a different question.

Component 1: The Four-Layer Architecture

Question: Who is in charge?

Think of it as a corporate structure for AI agents:

┌─────────────────────────────────────────────┐
│ LAYER 4 — STRATEGIC ORCHESTRATION                  │
│ Receives human objectives · decomposes into tasks  │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 3 — COGNITIVE GOVERNANCE                     │
│ Evaluates proposed actions · issues authorization  │
│ documents · maintains audit ledger                 │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 2 — FUNCTIONAL SPECIALIZATION                │
│ Domain agents · execute tasks within their         │
│ cognitive authority boundary                       │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 1 — EXECUTABLE CULTURE (Constitutional)      │
│ Versioned YAML policies · weights · thresholds     │
│ Human-authored before runtime. Immutable during.   │
└─────────────────────────────────────────────┘

The critical insight, drawn from both Simon and Ostrom, is the separation between those who propose actions and those who authorize them. An agent cannot authorize its own actions. This mirrors the principle of checks and balances in constitutional systems: the body that proposes is not the body that authorizes is not the body that records.

Component 2: The Action Claim

Question: What exactly is the agent authorized to do?

An Action Claim is a formal authorization document that agents must present before executing any real-world action. It's analogous to a building permit — not just "you're allowed to build," but: the location, the dimensions, the materials, the timeline, the inspector, and the version of the building code that governed the approval.

The Action Claim has three parts:

{
  // DECLARED FIELDS — filled by the agent
  "proposed_transition": "DELETE expired sessions older than 90 days",
  "originating_goal": "scheduled maintenance task #4421",
  "delegation_chain": ["human:ops-team", "agent:orchestrator-01", "agent:db-cleaner"],
  "estimated_impact": {
    "destructivity": 0.25,
    "data_exposure": 0.00,
    "resource_consumption": 0.30,
    "privilege_escalation": 0.00,
    "logic_integrity": 0.05,
    "recursive_autonomy": 0.10
  },

  // DERIVED FIELDS — filled by GOV-RISK (Layer 3)
  "justification_gap": 0.08,
  "decision": "APPROVE",
  "governance_signature": "sha256:a3f9...",
  "policy_digest": "sha256:1b2c...",

  // AUDIT FIELDS — filled by infrastructure
  "ac_id": "ac-2026-03-31-00421",
  "state": "AUTHORIZED",
  "committed_at": "2026-03-31T14:22:01Z"
}

The tripartite structure reflects Ostrom's principle of separating operational decisions from the collective-choice rules that govern them. The agent operates at the operational level; Layer 3 applies institutional norms; the audit trail creates an immutable record connecting every decision to the rules that governed it.

Component 3: The AASG (Autonomous Agent Security Gateway)

Question: How is authorization enforced?

Think of the AASG as a customs inspector at the boundary between the agents' cognitive world and the real world of executing tools.

Agent Cognition (A2A) ────────────────► Real World (MCP)
                              │
                         [ AASG ]
                              │
                    Checks exactly 3 things:
                    1. Is the Action Claim valid and signed?
                    2. Is the agent identity correct?
                    3. Does the actual tool call match 
                       what was declared?

The AASG is a physically separate process — not a library in the agent, not a plugin in the MCP server. This separation is the architectural expression of Simon's insight: governance constraints must be external to the decision-maker they constrain. An agent cannot reliably police itself.

What happens when an agent lies?

Consider an agent authorized to clean up expired sessions. It gets an AUTHORIZED Action Claim with destructivity: 0.25. Then it tries to execute:

DELETE FROM users  -- no WHERE clause

The AASG projects the actual tool call onto the governance space:

Declared: destructivity = 0.25 (bounded deletion)
Actual:   destructivity = 1.00 (full table wipe)

Congruence delta: 0.75 >> threshold (0.20)
→ ERR_AASG_003: CONGRUENCE_VIOLATION
→ Action never reaches the database
→ Merkle Ledger: INTERCEPTED entry recorded
→ GOV-AUDIT: SCOPE_SUBDECLARATION_DETECTED alert

The action is blocked before any damage occurs — not discovered in a log review afterward.

The Justification Gap: The First Computable Proportionality Metric

The core innovation in COA-MAS is the Justification Gap — a metric that answers: is this specific action proportional to the mandate that authorized it?

The governance space has six dimensions:

Dimension	What It Measures	Weight
d₁ Destructivity	Irreversible state change (DELETE, DROP, PURGE)	0.30
d₂ Data Exposure	Cross-zone data movement, PII handling	0.25
d₃ Resource Consumption	Financial/computational impact	0.10
d₄ Privilege Escalation	IAM creation, credential access	0.20
d₅ Logic Integrity	Business logic modification	0.10
d₆ Recursive Autonomy	Sub-agent spawning, delegation depth	0.05

The formula:

Gⱼ(V_A, V_M) = Σ wᵢ · φ(aᵢ, mᵢ)

where φ(aᵢ, mᵢ) =
  0               if aᵢ ≤ mᵢ   (within mandate)
  ((aᵢ - mᵢ)/mᵢ)² if aᵢ > mᵢ  (quadratic penalty)
  +∞              if aᵢ > 0 ∧ mᵢ = 0  (absolute prohibition)

The decision:

Gⱼ ≤ 0.15  →  APPROVE
0.15 < Gⱼ ≤ 0.40  →  ESCALATE (human review with full dossier)
Gⱼ > 0.40 or Gⱼ = +∞  →  BLOCK

The quadratic penalty means small excesses may be tolerable, but large excesses signal a qualitatively different action from what was authorized. The ESCALATE tier implements "meaningful human oversight" as required by EU AI Act Article 14 — with actual quantitative evidence, not just a kill switch.

Mandate Laundering: The Attack You Didn't Know Existed

Here's the attack that no existing security framework addresses: mandate laundering.

In a delegation chain Human → Agent₁ → Agent₂ → ... → Agentₖ, each intermediate agent can marginally expand the mandate it passes on. Each local expansion looks proportionate. But the cumulative expansion is not.

COA-MAS anchors the Justification Gap to the root human mandate, regardless of intermediate expansions:

G_chain(Aₖ) = Gⱼ(V_{Aₖ}, V_{M₀})  ← root mandate, always

G_total = 0.30 · G_local + 0.70 · G_chain

Non-Improvement Theorem: For any permissive subdelegation, G_chain is monotone non-decreasing. You cannot launder your way out of the original constraint.

How COA-MAS Fits the Standards Ecosystem

COA-MAS doesn't compete with existing standards — it implements what they defer:

Initiative	What It Solves	What It Defers	COA-MAS Role
IETF draft-klrc-aiagent-auth	Identity, authentication, authorization (SPIFFE, OAuth 2.0)	Policy model explicitly out of scope	Implements the policy model
A2A Protocol v1.0	Agent coordination standard	Authorization at execution boundary	AASG is the enforcement point A2A lacks
MCP v1.0	Agent-to-tool communication	No semantic authorization layer	AASG is the authorization gate MCP doesn't have

The IETF draft's Section 12 explicitly states: "the policy model and document format are out of scope." That is precisely where COA-MAS contributes.

The Failure Mode Transition

The most consequential architectural property of COA-MAS is the failure mode it introduces.

Traditional agentic systems: fail semantically and silently. The agent reinterprets a guideline, slightly expands a scope, finds an unanticipated interpretation. Detectable only after damage, through log analysis.

COA-MAS: introduces the explicit CONGRUENCE_VIOLATION failure mode. When an agent attempts an action that violates its declared impact vector, the AASG returns:

A specific error code
The dimension violated
The quantitative delta
A Merkle Ledger entry with full context

This is the organizational equivalent of a building inspector catching a code violation before the foundation is poured — not after the building collapses.

What's Published

The full paper, COA-MAS: A Governance Framework for Autonomous Agents in Production Environments, is available on Zenodo:

📄 zenodo.org/records/19057202

🔑 DOI: doi.org/10.5281/zenodo.19057202

📜 License: CC BY 4.0

The paper covers:

Full formal specification of the Action Claim ontology
Complete mathematical treatment of the Justification Gap
Attack pattern neutralization (scope subdeclaration, decomposition attack, mandate laundering)
EU AI Act regulatory alignment (Articles 9, 11, 13, 14)
Positioning against IETF, A2A, MCP, and AIMS model

Final Thought

The governance of autonomous agents is not a new problem. Simon identified its theoretical roots in 1947. Ostrom identified the institutional design solutions in 1990. Normative MAS researchers formalized the computational analogues through the 1990s and 2000s.

What's new in 2026 is the urgency.

Agents that can delete production databases and execute financial transactions are being deployed without the governance infrastructure this body of knowledge prescribes.

COA-MAS applies established principles to a new domain. The question is not whether governance is necessary — it's whether we build it before or after the first major incident.

If you're building multi-agent systems in production, I'd be genuinely interested in feedback on whether these primitives map to the problems you're encountering. The paper is open access — feel free to cite, critique, or extend.

— Rudson Kiyoshi Souza Carvalho, Independent Researcher

doi.org/10.5281/zenodo.19057202

Top comments (2)

Andre Cytryn • Mar 31

the framing around Distributed Cognitive Chaos is sharp. the API key problem is real and undersolved: credential-based auth was designed for humans operating deliberately, not for agents executing chains of inferred intent.

the Ostrom reference is apt. her design principles for commons governance map surprisingly well to agent permission systems. the hard enforcement boundary between agents that operate across trust domains is the hardest part.

curious whether COA-MAS addresses the verification problem: once you have a governance layer, how do you audit that an agent actually constrained itself vs. just logged that it did? immutable audit trail is the key piece.

Rudson Kiyoshi Souza Carvalho • Mar 31

the verification problem is exactly the right question to push on.

COA-MAS addresses it architecturally rather than cryptographically. the AASG (Autonomous Agent Security Gateway) is a physically separate process, not a library inside the agent, not a plugin it can influence. the agent doesn't log that it complied; the AASG logs that it intercepted the tool call, validated the Action Claim, and either passed or blocked it, with no write access granted to the agent over that record.

the Merkle Audit Ledger chains those AASG-produced entries with a policy_digest, a hash of the exact policy version that governed each decision — so you can verify not just what happened, but what rules were in force when it happened.

the residual trust assumption is that the AASG process itself isn't compromised, which is the same assumption you make for any enforcement boundary like a firewall or a secrets manager. the framework doesn't eliminate that assumption, it makes it explicit and isolatable rather than hidden inside the agent's own reasoning loop.

the cross-trust-domain case you mention is the hardest open problem. the paper grounds agent identity in SPIFFE, but the full treatment of federated governance across trust boundaries is genuinely future work... I don't think anyone has a clean answer there yet.