Paulo Victor Leite Lima Gomes

Posted on Jun 8

permission prompts are not an agent security strategy

#ai #agents #security #platformengineering

Docker published a practical guide last week on securing AI agents, and one sentence in it should be printed on a sticker for every engineering team adopting coding agents:

Permission prompts are not a security strategy.

That is not the whole guide, obviously. Docker talks about isolation, tool access, identity, credentials, runtime monitoring, MCP provenance, and multi-agent trust boundaries. Good. Those are the grown-up topics.

But the permission prompt line is the one that stuck with me, because it names a habit I keep seeing in agent products and internal demos.

This feels safe because a human is technically in the loop. It also feels familiar because developers already approve things all day: browser permissions, OAuth scopes, package installs, CI reruns, deploy buttons, cloud console warnings, and the occasional horrifying Terraform plan.

The problem is that prompts are usually a speed bump, not a boundary.

prompts train people to click yes

Security controls that depend on constant human attention tend to decay into theater.

Not because developers are careless. Because the workflow teaches them that approval is the price of progress.

If an agent asks for permission twenty times during a coding task, the first prompt might get real scrutiny. The second still gets a glance. By the tenth, the developer is reading for whether the action looks vaguely aligned with the task. By the twentieth, the developer is clicking approve because the alternative is babysitting a machine that was supposed to save time.

This is not a character flaw. It is interface design.

Agents make this worse because the sequence is dynamic. A coding agent may need to inspect files, install packages, run tests, generate migrations, call an MCP server, open documentation, and retry after a failure. Each step can be individually reasonable. The risky thing is the chain.

Approving one command at a time does not mean the whole workflow is safe. It just means the danger has been sliced into small enough pieces that each piece looks acceptable.

the agent needs a sandbox, not a chaperone

The better model is to give the agent autonomy inside a disposable boundary.

That boundary can be a container, a microVM, a remote development environment, or some other sandbox. The implementation matters, but the shape matters more: the agent gets enough room to do useful work, and the host machine does not become part of the blast radius.

This is why I like the container framing for agents. Not because containers are magic security dust. They are not. But because they force better questions: what filesystem can the agent see, what network can it reach, which credentials are present, and what happens when the environment is destroyed?

A prompt asks a tired human to make a local judgment. A sandbox makes the dangerous default impossible or at least contained.

If the agent needs to run tests or install packages, fine. But do that in an environment where the agent cannot casually read ~/.ssh, scrape unrelated repositories, phone home to arbitrary domains, or inherit a developer's entire cloud identity because that token happened to be in the shell.

That is the difference between supervising a risky process and designing a system where the risky process has less to break.

tool access is the real permission system

The next weak spot is tool access.

Most agent demos treat tools as capability candy. Add GitHub, Slack, Jira, docs, the database, the deployment system, a browser, and the internal MCP server someone wrote last month. The agent becomes more useful, the demo gets better, and the trust boundary quietly expands.

This is where permission prompts are especially misleading.

The important question is not whether the agent asked before using a tool. It is whether that tool should have been available for this task at all.

A frontend refactor agent does not need production database access. A dependency update agent does not need customer transcripts. A docs summarizer does not need deploy permissions. A local coding assistant probably does not need arbitrary internet access while reading private code.

This sounds obvious until you look at how teams wire tools. The easiest integration model is "give the agent everything and rely on the model to choose wisely." That is not a permission model. That is wishful thinking with JSON schemas.

Tool access should be scoped by task, environment, repository, data classification, and identity. Ideally, a gateway enforces that policy consistently instead of leaving every agent runtime to invent its own rules.

MCP makes this more urgent, not less. It standardizes how agents connect to tools, which means tool descriptions, server provenance, and tool behavior become part of the security surface. A malicious or sloppy tool is not just bad code. It is an instruction source the agent may trust.

If you would not install a random GitHub App across your production organization, do not casually hand the equivalent MCP server to every coding agent.

agents need their own identities

One of the fastest ways to make agent security unmanageable is to run agents as the developer. It is convenient. It also makes the audit story terrible.

If the agent uses my token, every action looks like me. If it pushes a branch, calls an API, reads a document, opens a ticket, or touches infrastructure, the system records Paulo did it. Maybe I asked the agent to do it. Maybe a prompt injection steered it. Maybe a tool description was poisoned. The logs do not know.

Service accounts exist because we learned this lesson already. Automated systems need identities that are scoped, auditable, revocable, and understandable. Agents are automated systems. They should have their own identities.

Not one universal "ai-agent" account with god permissions. Real identities with purpose: this repo migration agent, this incident triage agent, this documentation agent. Each one should have the minimum useful access for the job and a clear owner.

Short-lived credentials help. Runtime secret injection helps. Logs that connect the human request, agent identity, tool call, and result help even more.

The point is not to remove humans from accountability. The point is to make the machine actor visible enough that accountability means something.

what i would actually do

If a team asked me how to start securing coding agents, I would not begin with a giant AI governance committee. I would start with four defaults.

First, agents run in disposable environments. No direct host access unless there is a specific reason. No accidental inheritance of local secrets.

Second, network access is denied by default and allowlisted by task. Package registries, docs, and internal APIs should be explicit.

Third, tools are granted per workflow. Do not preload every MCP server because it might be useful someday. Useful someday is how access sprawl becomes normal.

Fourth, every agent action worth caring about is logged with enough context to explain it later: tool, parameters, identity, policy, outcome. Not because auditors are fun, but because production systems deserve evidence. Prompts cannot give you that memory after an incident.

Permission prompts can still exist. Sometimes they are useful. A prompt before deleting files is fine. A prompt before spending money is fine. A prompt before pushing a branch is fine.

But prompts should be the last-mile confirmation for unusual actions, not the main wall between the agent and the rest of your environment.

the punchline

Agents are becoming useful because they can act without asking us about every tiny step.

That is also why "just ask the human" is such a weak security model.

If the agent needs a chaperone for every action, it is not autonomous enough to deliver the workflow. If the agent can act autonomously, then the security model has to live in the environment: sandboxing, scoped tools, dedicated identities, network policy, credential hygiene, and logs.

The industry is slowly rediscovering an old systems lesson. You do not secure dangerous work by hoping every operator makes perfect decisions under interruption. You secure it by shaping the room where the work happens.

So yes, keep the prompt for the weird command.

But build the boundary first.

references

To test my projects, I use Railway. If you want $20 USD to get started, use this link.

Top comments (2)

Rahul S • Jun 8

The practical collapse point I keep hitting is credential inheritance during debugging. Team sets up the sandbox correctly — containerized agent, no host filesystem, network deny-by-default. Then someone needs to debug an API integration and they mount ~/.aws as a volume, pass GITHUB_TOKEN through an env var, or forward the SSH agent "just for this session." The sandbox technically still exists but it's now running with the developer's full cloud identity. This isn't even negligence, it's workflow pressure — the agent can't do useful integration work without some credentials, and the path of least resistance is always "pass through what I already have." The hard design problem isn't sandbox vs prompt, it's making the sandbox permeable enough to be useful without inheriting the host's ambient authority. Short-lived credentials scoped to the specific task and auto-revoked when the container exits are closer to the right primitive, but most teams don't have that infrastructure yet so they just toggle the isolation off when it gets in the way.

Armorer Labs • Jun 21

This framing matches what we kept hitting at Armorer Labs (we maintain Armorer Guard, a small local Rust tool-call scanner): the prompt is not the boundary, the tool boundary is. A few operator things that helped us in production:

Scope at the call, not at the session. The agent's session is "make this PR." The actual calls are git.push, gh.api.patch, http.request. If the permission check happens once at session start, you cannot tell which call actually mattered after the fact. Scope to the tool+verb+target tuple.
Pair every approved or denied side effect with a durable receipt (request, granted scope, page context, final state, URL). The receipt is what you inspect later; the prompt log is not.
Sandboxes without receipts are not enough. A container can leak via env, a network policy can be misconfigured, a tool wrapper can be bypassed. Receipts let you prove what ran, which is the actual question after an incident.

The thing we still struggle with is making the receipt path fast enough that it does not get bypassed when the agent is under load. Default-on, sub-millisecond in the hot path, async flush to durable storage.

Curious how you handle the "click yes is the default" problem at the UI layer. We end up relying on a runtime guard as the source of truth and treating user prompts as audit hints rather than enforcement.