Authora Dev

Posted on Mar 31

Preventing Agent Hijacking With Cryptographic Identity and RBAC

#ai #security #devops #authentication

If you’re letting AI agents call tools, open pull requests, touch production data, or coordinate work across services, you already have an identity problem.

A lot of agent systems still rely on soft trust: API keys in environment variables, tool access based on network location, or a vague assumption that “the agent running in this session is the same one we started with.” That works right up until it doesn’t. An agent gets replayed, a tool call is spoofed, a session token leaks, or a delegated workflow quietly gains more access than intended.

That’s agent hijacking in practice: an attacker, buggy integration, or misconfigured workflow causes actions to be executed by the wrong agent, with the wrong permissions, and without a reliable way to prove what happened.

The fix is not “more prompts.” It’s the same thing we’ve learned in every other security domain: strong identity, least privilege, and auditable authorization.

What agent hijacking actually looks like

In most real systems, hijacking doesn’t mean a dramatic Hollywood-style takeover. It usually looks more boring:

A long-lived API key gets reused by multiple agents
An MCP server trusts any client that can connect
An agent delegates a task, but the delegated worker inherits full upstream privileges
Tool calls aren’t signed, so you can’t prove which agent initiated them
Approval workflows happen in Slack or email with no cryptographic binding to the final action
Logs tell you what happened, but not who actually authorized it

Once agents are acting on your behalf, “close enough” identity stops being enough.

Why API keys and shared service accounts break down

A shared service account can identify an application. It does not identify an individual agent execution, a delegated subtask, or a bounded approval chain.

For agents, you usually need to answer questions like:

Which agent requested this tool call?
Was it the original planner agent or a delegated worker?
What exact permissions did it have at the time?
Did a human approve this step?
Can I revoke this one agent without breaking the whole system?

API keys are bad at this because they’re typically:

static
shared
over-scoped
hard to rotate
disconnected from execution context

A better model is to give each agent a cryptographic identity, then enforce RBAC or policy-based access at the tool boundary.

The security baseline: cryptographic identity + RBAC

At a minimum, an agent platform should support:

A unique cryptographic identity per agent
Signed requests or assertions
Short-lived delegated credentials
Role-based access control
Audit logs tied to identity, not just sessions

A practical implementation often uses public-key cryptography such as Ed25519 for agent identity. That gives you a keypair per agent, where the private key signs requests and the public key verifies them.

Then layer authorization on top:

reader: can query docs or status APIs
coder: can read/write specific repos
reviewer: can comment, not merge
deployer: can trigger staging deploys
approver-required: can execute only with human approval

This is where RBAC still shines. It’s understandable, debuggable, and usually enough to get started. If your environment is more dynamic, a policy engine like OPA is a good fit. If OPA matches your stack and team skills, use it.

A simple pattern for signed agent actions

Here’s the rough shape of what you want:

Agent gets a keypair
Agent requests a short-lived token with role claims
Agent signs a tool request
Tool verifies both:
- the signature
- the role or policy claims

Example in TypeScript using Ed25519-style signing:

import nacl from "tweetnacl";
import { encodeBase64, decodeBase64 } from "tweetnacl-util";

const keyPair = nacl.sign.keyPair();

const requestBody = JSON.stringify({
  tool: "create_pull_request",
  repo: "acme/api",
  branch: "agent/fix-auth",
  role: "coder",
  timestamp: Date.now()
});

const signature = nacl.sign.detached(
  Buffer.from(requestBody),
  keyPair.secretKey
);

const envelope = {
  body: requestBody,
  signature: encodeBase64(signature),
  publicKey: encodeBase64(keyPair.publicKey)
};

console.log(envelope);

Verification on the tool side:

import nacl from "tweetnacl";
import { decodeBase64 } from "tweetnacl-util";

function verifyEnvelope(envelope: {
  body: string;
  signature: string;
  publicKey: string;
}) {
  return nacl.sign.detached.verify(
    Buffer.from(envelope.body),
    decodeBase64(envelope.signature),
    decodeBase64(envelope.publicKey)
  );
}

This only proves the message was signed by the holder of the private key. In production, you still need to bind that key to:

a registered agent identity
a role set
a trust chain
an expiration time
optionally, a delegation chain

That’s where identity infrastructure matters.

Delegation is where things get dangerous

Many agent systems are multi-step by design:

planner agent receives a goal
planner delegates coding to a worker
worker delegates testing to another worker
final deployment requires approval

If every delegated agent gets the parent’s full permissions, you’ve built a privilege escalation machine.

Instead, use bounded delegation:

short-lived delegated tokens
narrowed scopes
explicit audience restrictions
traceable chains of who delegated to whom

Standards like RFC 8693 token exchange are useful here. The important idea is simple: a delegated worker should receive less access than its parent, not more.

For example:

Planner can access repo:read, repo:write, deploy:staging
Test worker gets only repo:read, ci:run
Docs worker gets only docs:write
No worker gets deploy:prod directly

That one design choice dramatically reduces the blast radius of a hijacked sub-agent.

MCP servers need zero-trust thinking

MCP is making tool use easier, but it also creates a bigger attack surface. If an MCP server assumes any connected client is trusted, it becomes a soft target.

A safer MCP model includes:

authenticating the calling agent
verifying cryptographic identity
checking role/policy before every tool invocation
logging decisions with actor identity
requiring approvals for high-risk tools

If you’re exposing an MCP server internally or publicly, treat it like any other production API: authenticate every request, authorize every action, and assume the network is hostile.

Getting started: a practical rollout plan

You do not need to rebuild your whole stack this week. Start with the highest-risk path.

1. Inventory agent actions

List what your agents can actually do:

read code
write code
open PRs
access tickets
query internal docs
run CI
deploy
touch customer data

This gives you the first draft of roles and scopes.

2. Split identities

Stop sharing one credential across multiple agents or workflows.

Each agent, worker, or execution context should have its own identity. Even if you start with a simple key registry, that’s better than one giant service account.

3. Add least-privilege roles

Define a small RBAC matrix before you add complexity:

roles:
  reader:
    - docs:read
    - repo:read

  coder:
    - repo:read
    - repo:write
    - pr:create

  tester:
    - repo:read
    - ci:run

  deployer:
    - deploy:staging

If your rules depend heavily on environment, repo, branch, or data sensitivity, move to OPA or another policy engine.

4. Use short-lived delegation

When one agent spawns another, mint a short-lived delegated credential with reduced privileges. Avoid passing parent credentials downstream.

5. Log identity and authorization decisions

Your logs should answer:

which agent acted
what it tried to do
what role or policy allowed it
whether approval was required
whether delegation was involved

If you can’t reconstruct that chain later, incident response will be painful.

Where Authora fits

If you’re building this yourself, the core ideas still apply: Ed25519 identities, scoped delegation, RBAC or OPA-backed policy, and verified tool access.

Authora works in this area with agent identity, authorization, delegation chains, MCP authorization, and auditability, but the main point here is architectural: agents need first-class identity and access control. Whether you implement that with your own stack, OPA, or a platform, the security model matters more than the branding.

Try it yourself

A few free tools that can help immediately:

Want to check your MCP server? Try https://tools.authora.dev
Run npx @authora/agent-audit to scan your codebase
Add a verified badge to your agent: https://passport.authora.dev
Check out https://github.com/authora-dev/awesome-agent-security for more resources

The biggest mindset shift is this: stop treating agents like invisible application glue, and start treating them like security principals.

Once an agent can act, it needs identity.
Once it has identity, it needs authorization.
And once it has authorization, you need a way to prove what happened.

That’s how you prevent hijacking from turning into a production incident.

-- Authora team

This post was created with AI assistance.

DEV Community