All For Science

Posted on Apr 26

I built a multi-agent system without governance. Here's the 3-layer stack I wish I'd had.

#security #python #ai #webdev

Let me describe a situation you've either been in or are about to be in.

You've built a multi-agent system. It works. The orchestrator dispatches tasks to specialist agents, they call external APIs, things happen. You ship it.

Then, three weeks later, you discover that your payment agent processed:

a $4,200 refund at 1:47am on a Saturday with no approval;
that a customer's data was accessed by an agent that technically shouldn't have had that scope;
and that you have absolutely zero logs to figure out what triggered any of it.

This is not a hypothetical. It's the default outcome if you ship agents without thinking about the three infrastructure layers that make them safe to run in production.

Here's what those layers are, and how they fit together.

Layer 1: Conduit — see the whole pipeline

The operational problem hits first.
You're managing agents that connect to MCP servers, call LLMs, trigger webhooks, and chain into each other. The configuration for all of this lives in JSON files, environment variables, and READMEs. When something breaks at 2am, you're staring at logs across five different services trying to reconstruct what happened.

Conduit replaces that with a visual pipeline studio. Connect your MCP servers once, build pipelines on a canvas, and get real-time execution logs for every step — latency, token cost, inputs, outputs. API keys go into an AES-256 encrypted vault and are decrypted in memory at runtime. The pipeline configuration is stored centrally, not scattered across machines.

The practical difference: when you need to debug a broken workflow, you open Conduit's trace view instead of manually correlating logs across services. Every step is there in execution order.

Layer 2: Codios — make agents verify each other

This is the one most teams skip until it bites them.

Your payment agent currently accepts a POST to /charge from anything on your network. Maybe that's fine today. It won't be fine when:

a misconfigured agent sends it a malformed request;
when a replay attack resubmits a token;
or when someone figures out they can call it directly.

Codios gives every agent an Ed25519 identity (did:key) and issues signed contracts between them. The contract specifies exactly what the caller is allowed to do on the callee:

  contract = codios.contracts.issue(                                            
      caller_did=order_agent.did,
      callee_did=payment_agent.did,                                             
      scopes=["payment:charge:max_10000usd"],  # refund deliberately excluded   
      ttl_seconds=3600
  )

python
The caller attaches this as a header. The receiver verifies it locally — no call to Codios, just a 10µs Ed25519 check:

  contract = verify_contract(                               
      token=request.headers.get("X-Codios-Contract"),
      required_scope="payment:charge",                                          
      platform_public_key=CODIOS_PUBLIC_KEY,
  )

Why this matters

The scope limit is cryptographically bound. The payment agent cannot process a refund using this contract, regardless of what the request body says. The scope is enforced at verification time, not checked against a database.

Two additional protections come with it automatically:
nonce validation (replay protection — each contract token can only be used once) and expiry (contracts are time-limited, issued contracts can be revoked).

Layer 3: A2A — govern what runs

Even with Conduit giving you visibility and Codios locking down inter-agent trust, you still need a layer that watches what agents do with their permissions and intervenes when something looks wrong.

A2A adds four modules:

Observe

5 lines to wrap any agent loop with distributed tracing. Every LLM call, every tool invocation, every agent handoff becomes a span with timing and I/O. You get a full audit trail without building one yourself.

Policy

YAML rules evaluated before actions execute. Block payments over $50K. Flag any agent reading PII fields. Deny external HTTP calls from agents that don't have that scope. Rules run in-process at <5ms.

Approval

For the actions where a human needs to decide. The agent creates an approval request and parks. The reviewer gets an email with Approve/Reject. The agent resumes when a decision is made. No blocking, no polling loop, full async.

Firewall

Scans every message before it reaches an LLM. If an agent reads customer-supplied data and passes it to a model, that data needs to be checked first. <2ms per scan, runs locally.

How they fit together in a real workflow

  User submits order                                                            
      │                                                                         
      ▼
  Conduit pipeline executes                                                     
      │                                                     
      ├─ Order Agent → [Codios contract check] → Inventory Agent ✓
      ├─ Order Agent → [Codios contract check] → Payment Agent ✓                
      │       │
      │       └─ Amount > $500? → [A2A Approval] → Human reviews → ✓            
      │                                                                         
      ├─ Fulfillment Agent reads shipping address
      │       └─ [A2A Firewall] scans for injection → ✓ → LLM call              
      │                                                                         
      └─ A2A Observe captures full trace of everything above

The summary

Layer	Tool	What It Stops
Build/operate	Conduit	Invisible pipelines, scattered config, no execution visibility
Trust	Codios	Unauthorized agent calls, replay attacks, scope creep
Govern	A2A	Runaway actions, missing audit trail, prompt injection

The Bottom Line

You don't need all three on day one. But if you're running agents in production without any of them, you're one incident away from having to explain to your CTO why an agent did something it shouldn't have — with no logs to back you up.

All three are live. Free tiers.

Happy to answer questions about implementation in the comments.

DEV Community