DEV Community

Mark Ivashinko
Mark Ivashinko

Posted on

Your AI agent is running as an identity nobody audited

Most AI security work starts after the agent is already in production and already has more access than anyone signed off on. The order is backwards, and it's backwards in a predictable way.

Here's the pattern, repeated across environments. A team ships an agent. Copilot Studio, a LangChain workflow, a Semantic Kernel orchestration, doesn't matter. It can read mail, call internal APIs, and write to a system of record. The permission model is a system prompt that says "only use these tools when appropriate." The data-flow design is whatever the SDK defaulted to. It demos well. It goes to prod.

Six weeks later someone asks which identity the agent actually authenticates as when it hits the internal API. The room goes quiet, because the answer is usually "a service principal with far more scope than this workload needs, and the same one three other things share."

The system prompt is not an authorization boundary

This is the core mistake, so it's worth being blunt about it. A system prompt is a request. An attacker who can get text into the model's context (a poisoned document in the RAG store, a crafted email the agent summarizes, a tool result it ingests) is negotiating with that request directly. Prompt injection is not a content-moderation problem you filter your way out of. It is a privilege-escalation problem. The model is the confused deputy, and the deputy is holding your API tokens.

If the only thing standing between "summarize this email" and "exfiltrate the CRM" is the model deciding to behave, there is no boundary. There's a suggestion.

What an actual permission model looks like

The boundary has to live in the architecture, outside the model, where the model's output is treated as untrusted input:

  • A scoped identity per agent. The agent authenticates as itself, with least-privilege access to exactly the tools and data its job requires. Not a shared service principal. Not the deploying user's token.
  • Tool-use authorization enforced by the host, not the prompt. The orchestration layer decides whether a tool call is permitted based on identity and policy, regardless of what the model "asked" for. The model proposes. The host disposes.
  • Data-flow boundaries that survive the model. If the agent can read a document the user can't, that's a leak the moment the model paraphrases it. Document permissions and sensitivity have to propagate through retrieval and into the response, not get laundered through the context window.
  • The AI surface wired into the same identity, telemetry, and access reviews as everything else. An agent that takes actions in your environment is a workload. It should show up in the same logs as every other workload, not sit in an unmonitored island because "it's just an AI feature."

None of this is exotic. It's the same authorization discipline you'd apply to any service that holds credentials and takes actions. The only new part is that the thing deciding what to do is a model that can be talked into things, which raises the stakes on getting the boundary outside it.

Build it and secure it in the same pass

The expensive version of this lesson is retrofitting permission boundaries onto an agent that already shipped, already has the broad token, and already has six weeks of behavior people depend on. The cheap version is designing the identity, the tool-use authz, and the data-flow controls into the system before it goes out. Same engineering either way. The only variable is whether you do it before or after the access is already loose.

This is the work WhiteBoxTek does on the AI side: agentic systems and RAG pipelines architected with the permission model built in from day one, across Copilot Studio, Semantic Kernel, AutoGen, LangChain, AI Foundry, Bedrock, and Vertex. Full breakdown of the approach: AI security architecture.

Top comments (0)