Authora Dev

Posted on Apr 8

Why multi-agent AI security is broken (and the identity patterns that actually work)

#ai #programming #devops #security

Last Tuesday, a “harmless” coding agent in staging opened a PR, fetched secrets from the wrong environment, and kicked off a deploy it was never supposed to touch.

Nothing “hacked” us. The agent did exactly what the system allowed.

That’s the part I think a lot of teams miss with multi-agent setups: the problem usually isn’t model quality. It’s identity.

Once you have more than one agent — planner, coder, reviewer, deployer, support bot, whatever — you need answers to very boring questions:

Who is this agent, exactly?
What is it allowed to do?
Can it act on behalf of someone else?
How do we prove what happened later?

If you don’t answer those, your “AI fleet” becomes a shared root account with vibes.

The pattern that breaks first: shared credentials

A lot of agent systems still look like this:

Agent A ----\
Agent B -----+----> same API key / same GitHub token / same MCP access
Agent C ----/

It works great until:

one agent gets prompt-injected
one workflow needs narrower permissions
you need an audit trail
you want approvals for risky actions
you need to revoke one agent without breaking all of them

Shared credentials are convenient, but they destroy attribution and least privilege.

The identity pattern that actually works

The most reliable pattern we’ve seen is:

Give each agent its own cryptographic identity
Issue short-lived delegated access
Enforce policy at the tool boundary
Log every action with agent identity + delegation chain

In practice, it looks like this:

[Human/User]
    |
    | delegates task
    v
[Planner Agent] -- short-lived token --> [Coder Agent]
    |                                        |
    | policy check                           | calls tool / MCP server
    v                                        v
[Approval / Policy Engine] -------------> [GitHub, CI, Cloud, DB]

Audit log = who delegated what to whom, for which action, when

That’s the difference between “an agent did something” and “the review agent, acting on behalf of the release workflow, was allowed to update only this repo for 10 minutes.”

What to implement first

You do not need a giant platform rollout to improve this.

1) Per-agent identity

Use a distinct identity for every agent process or role. Ideally, that identity is cryptographic, not just a string in config.

Ed25519 keys are a good fit here because they’re fast, small, and easy to verify.

Why it matters:

revocation is targeted
audit logs become useful
tools can verify the caller instead of trusting network location

2) Delegation, not credential sharing

If Agent A needs Agent B to perform work, don’t hand over a long-lived secret. Mint a scoped, short-lived token representing delegated rights.

OAuth token exchange / delegation-chain patterns are solid here. If you’re already using standards like RFC 8693, great. If not, even a simple internal delegation model is better than “just reuse the deploy token.”

3) Policy at the edge

Your tools should not trust every “internal” caller equally.

Put policy checks at the MCP server, gateway, or edge proxy:

this agent can read issues
that agent can open PRs
only approved agents can trigger deploys
production actions require human approval

If OPA fits your stack, use OPA. Seriously. You don’t need to reinvent policy engines for this.

4) Approval workflows for destructive actions

Treat delete, deploy, rotate, publish, and charge as special.

Agents are great at moving fast. That’s exactly why risky actions need explicit approval gates.

A tiny runnable example: generate an agent identity

Here’s a minimal Node example using Ed25519:

npm install tweetnacl tweetnacl-util
node agent-id.js

// agent-id.js
const nacl = require("tweetnacl");
const util = require("tweetnacl-util");

const keypair = nacl.sign.keyPair();

const publicKey = util.encodeBase64(keypair.publicKey);
const secretKey = util.encodeBase64(keypair.secretKey);

console.log("Agent public key:", publicKey);
console.log("Store secret key securely:", secretKey.slice(0, 24) + "...");

This isn’t a full identity system, but it’s the right direction: every agent gets its own keypair, and downstream systems verify who’s calling.

Common mistake: securing the model, not the workflow

Teams spend a lot of time on model guardrails and not enough on execution boundaries.

But in multi-agent systems, the blast radius usually comes from what the agent can do, not what it can say.

A secure fleet is mostly boring infrastructure:

identities
scoped tokens
policy checks
approvals
audit logs
isolation for untrusted execution

That’s true whether you’re orchestrating coding agents, support agents, or background task runners.

Try it yourself

If you want to tighten up your agent security without buying anything first:

Want to check your MCP server? Try https://tools.authora.dev
Run npx @authora/agent-audit to scan your codebase
Add a verified badge to your agent: https://passport.authora.dev
Check out https://github.com/authora-dev/awesome-agent-security for more resources

These are useful starting points even if you end up building the rest yourself.

The big shift is simple: stop thinking of agents as “features” and start treating them like workloads with identities.

That’s when multi-agent systems become governable instead of mysterious.

How are you handling agent identity in your stack today? Drop your approach below.

-- Authora team

This post was created with AI assistance.

DEV Community