I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug.
I published a 7-layer model for agent infrastructure on how I think about these problems. Two layers have strong industry standards: Google's A2A protocol handles agent-to-agent coordination (L5), and Anthropic's MCP standardises how agents discover and use tools (L3–L4). At the identity layer, the W3C DID standard defines decentralised identifiers. For governance, there's the NIST AI Risk Management Framework.
The rest of the stack — the layers that make autonomous agents trustworthy, auditable, and production-safe — still has gaps. These seven protocols fill them. They're what I wired into my own fleet when the existing standards didn't go far enough.
All are CC BY 4.0. Five have live reference implementations. Two are spec'd but still in the works.
Industry Standards This Builds On
| Standard | Layer | Organization |
|---|---|---|
| A2A Protocol | L5 Coordination | Google / a2aproject |
| Model Context Protocol | L3–L4 Discovery + Session | Anthropic |
| W3C DID Core | L2 Communication | W3C |
| NIST AI RMF | L7 Governance | NIST |
1. Trust Score — Should I Delegate to This Agent?
When one of my agents delegates work to another, it needs to know if the target is reliable. Not "does it respond" — does it actually complete tasks correctly and consistently.
Weighted across success rate, pitfall history, skill quality, and uptime.
from workswithagents import TrustScoreClient
ts = TrustScoreClient()
if ts.get("target-agent")["tier"] == "trusted":
delegate(task, to="target-agent")
2. Deployment Manifest — Declare a Fleet, Deploy With One Command
I got tired of manually tracking which agents run where, how many instances, and what capabilities they have. One YAML file, one command.
fleet:
name: "my-fleet"
agents:
- id: "builder"
capabilities:
- action: "build"
target: "spfx"
count: 3
wwa fleet deploy fleet.yaml
3. SLA Framework — Track Whether Agents Meet Their Promises
Three tiers: Best-Effort (free), Production (99.5% uptime, 90% task accuracy), Regulated (99.9% uptime, 95% accuracy, 7-year audit retention).
from workswithagents import SLAMetrics
sla = SLAMetrics("my-fleet", tier="production")
sla.report("agent-1", "task-42", duration_seconds=187, success=True)
status = sla.status() # {breaches: [], status: "ok"}
4. Handoff Protocol — Cryptographic Handoff Between Agents
When an agent passes a task to another, how do you know the output wasn't tampered with? Ed25519-signed handoffs with chain-of-custody verification. Built above MCP's tool-use layer.
from workswithagents import Handoff
h = Handoff(from_agent="planner", to_agent="scanner", payload={"plan": "..."})
signed = h.sign(planner_key)
verified = Handoff.verify(signed, planner_public_key)
5. Identity Protocol — Verifiable Agent Identity
Cryptographic agent identity with Ed25519 keypairs. Signed messages. Verification against registry. Extends the W3C DID standard with agent-specific key management and fleet-scoped verification.
from workswithagents import AgentIdentity
ai = AgentIdentity("my-agent")
ai.register()
sig = ai.sign({"type": "heartbeat"})
valid = AgentIdentity.verify("other-agent", message, signature)
6. Compliance-as-Code — Regulation as Executable Validation
NHS DTAC, FCA, GDS, GDPR — as rules agents can validate against at runtime. Extends frameworks like the NIST AI RMF from documentation into executable checks.
from workswithagents import ComplianceEngine
ce = ComplianceEngine()
dtac = ce.load("dtac-v2.1")
if dtac.validate(action).passed:
execute(action)
else:
escalate_to_human()
7. Onboarding Protocol — Systematic Agent Creation
Interview → generate → calibrate → benchmark → register. Instead of writing a prompt file and hoping, run a pipeline that produces a scored agent.
from workswithagents import OnboardingClient
ob = OnboardingClient()
result = ob.full_onboard(
"nhs-auditor",
"Audit agent actions for NHS DTAC compliance",
capabilities=["audit:compliance"],
skills=["compliance-as-code"]
)
The Stack
Where each protocol fits alongside existing industry standards:
L7 GOVERNANCE ← NIST AI RMF Compliance-as-Code · SLA Framework
L6 VERIFICATION (no standard yet) Agent Test Suite · Pitfall Registry
L5 COORDINATION ← A2A (Google) Trust Score
L4 SESSION ← MCP (Anthropic) Handoff Protocol
L3 DISCOVERY ← MCP (Anthropic) Trust Score · Capability Manifest
L2 COMMUNICATION ← W3C DID Identity Protocol
L1 EXECUTION (no standard yet) Onboarding Protocol · Deployment Manifest
A2A (Google) — agent-to-agent task coordination at L5. MCP (Anthropic) — tool discovery and context sharing at L3–L4. W3C DID — decentralised identity at L2. NIST AI RMF — governance framework at L7. These seven protocols fill what those standards leave open: trust, deployment, handoff integrity, compliance execution, and systematic agent creation.
Get Started
pip install workswithagents
All specs: workswithagents.dev/specs/
All code: CC BY 4.0
Top comments (0)