Jens Ernstberger for Kontext

Posted on May 15 • Originally published at kontext.security on Apr 27

Agent Intent - No One Knows What It Means, But It's Provocative

#ai #security

Agent Intent - No One Knows What It Means, But It's Provocative

AI agent security has a language problem. The industry has converged on intent as the word for the new risk surface: intent detection, intent-aware authorization, intent verification, intent deviation, intent monitoring. The word is attractive because it sounds like the missing layer. It implies that a security system can understand why an agent is acting, not just what it is doing.

For security, platform, and AI teams deploying agents with tools, APIs, credentials, and production data, the practical question is runtime authorization: should this next action be allowed now?

In practice, intent is usually standing in for several different things at once: the user's goal, the agent's plan, the system prompt, the permissions attached to the agent, the model's reasoning trace, and the behavior a monitoring system expects to see.
Those are not the same thing. Some are policy. Some are evidence. Some are guesses. Some are not observable at all.

The result is a category of security claims that sound stronger than they are. If a product says it verifies agent intent, the first question should be: which kind of intent, measured from what signal, enforced at which boundary?

The better target is narrower. Runtime authorization does not need to read an agent's mind. It needs to decide whether the next action is safe to allow under the current identity, credential, resource, data-flow, session, and behavioral context.

TL;DR

AI agent runtime authorization is a safety evaluation layer, not an intent verification engine. It decides whether the next tool call, API request, credential request, or data access should run under the current identity, task, credential, resource, and session context.
Access control exists to make sharing safe. The old question still applies: who can do what to what, and when. Agents do not replace that model. They make the conditions around "when" much more dynamic.
Agents break the old threat model because they are non-deterministic actors. A user can be legitimate, the credential can be valid, the agent can be non-malicious, and the next action can still be unsafe.
Runtime authorization should evaluate safety, not correctness. It cannot prove that the agent solved the user's task correctly. It can decide whether a proposed action should be allowed, restricted, escalated, or denied before it executes.
The architecture should be layered. Deterministic policy handles the hard boundary. Real-time safety scoring handles the uncertain middle. Escalation handles actions that are not obviously forbidden but are too risky to approve automatically.
UEBA, taint analysis, and sequence modeling are three ways to score the same runtime safety problem. UEBA compares the agent to its baseline and peer group. Taint analysis follows untrusted influence into sensitive actions. Sequence modeling asks whether the current path is moving toward an unsafe state.
LLMs belong in the review loop, not as the primary control. A separate model can help judge ambiguous traces, summarize evidence, and propose future policy. It should not be the thing deciding every tool call in isolation.

Part 1: Why AI Agent Access Control Exists

Access control exists because corporate systems need to be shared, and not all sharing is safe.

The reason is basic. A company works because many people and systems use the same resources: code repositories, customer databases, payment systems, internal tools, cloud accounts, source documents, support queues, analytics platforms, and production infrastructure. If those resources are not shared, the organization cannot operate.

But if they are shared without limits, the environment becomes a public commons. Anyone inside the perimeter can read payroll data, modify source code, delete a database, grant themselves permissions, or impersonate an executive. The fact that a subject is inside the company does not mean every subject-object relationship should be permitted.

The basic access-control question is simple: who is acting, what are they trying to do, which resource are they touching, and under what conditions should that action be allowed?

The subject might be a human user, a service account, a workload, a CI job, a browser session, or an AI agent. The object might be a file, a repository, a CRM record, a database row, a cloud API, a payment, or a credential. The action might be read, write, delete, approve, export, deploy, mint, revoke, or delegate. Authorization turns policy over these relationships into an enforceable decision.

For AI agents, this turns classic AI agent access control into a runtime problem. The system still needs identity and least privilege, but it also needs authorization for agent tool use at the moment an agent asks to touch a resource.

This is not a moral system. It is not asking whether the actor is a good person, or whether the application is generally trustworthy. It asks whether this subject should be allowed to perform this action on this object under these conditions.

There are three root reasons every serious organization needs that control.

Subjects are not uniformly trustworthy. Some insiders are malicious. Some make mistakes. Some are compromised by external attackers. Some service accounts leak. Some laptops get infected. Some API keys end up in logs, config files, screenshots, or public repositories. A flat access model assumes every one of those failures remains harmless. That assumption does not survive contact with reality.
Breach containment is the goal. Modern security does not assume that every attacker can be kept out forever. It assumes some credential, host, or session will eventually fail. Access control defines what that failure can reach. Least privilege, segmentation, scoped credentials, explicit denies, and short-lived access do not make compromise impossible. They make compromise bounded.
Separation of duties prevents self-authorization. Sensitive workflows should not let the same subject request, approve, and execute a high-risk action end to end. The principle appears in finance, identity administration, production deployment, procurement, and data access. It prevents fraud, limits accidental damage, and creates accountability.

The familiar pipeline is identification, authentication, and authorization. Who are you? Prove it. What are you allowed to do?

Authorization is the last step, but it is where the business risk actually gets expressed. RBAC maps permissions to job functions. ABAC adds attributes such as device, time, location, risk, and resource sensitivity. ReBAC evaluates relationships between subjects and objects. MAC and DAC handle more rigid or owner-controlled environments. Each model has different tradeoffs, but all of them exist to resolve the same tension: resources must be shared, but only through controlled relationships. For a sharper distinction between authentication and AI agent authorization, the key point is that a verified identity still needs a separate permission decision.

For deterministic software, this worked reasonably well. A web application had known routes. A backend service had known API calls. A CI job had a known pipeline. A service account was bad if it was overprivileged, but the expected action surface could usually be described ahead of time. Static authorization was never perfect, but it matched the shape of the systems it was protecting.

Agents do not fit that shape.

Part 2: Why AI Agents Break Traditional Access Control Models

AI agents are not just users, services, or scripts with a new label. They are non-deterministic actors operating under delegated authority.

The difference matters. A user chooses an action and clicks a button. A backend service executes code written by developers. A script runs a known sequence of commands. An agent receives a goal, interprets context, chooses tools, asks for credentials, revises its plan, and produces actions that were not fully known when the session began.

The old question was often simple enough: does this user or service have permission to call this API? The agent version is harder: should this agent, acting for this user, in this session, after this sequence of observations and tool calls, in this context, be allowed to perform this action on this resource right now? That extra context is not decorative. It is the risk.

The Railway volume deletion incident is a cleaner example than a hypothetical attack. The agent was working on a staging-related task and hit a credential problem. Instead of stopping and asking for clarification, it searched for a usable Railway token, found one with broad API authority, and used it to call volumeDelete. The token was valid. The API accepted the request. The action was authorized. But the path was wrong for the task, and the result was a production volume deletion.

That is the threat model change. The failure was not that the agent lacked permission. The failure was that the agent used valid permission to execute a destructive interpretation of a benign task.

The individual action is often ambiguous. The context decides whether it is acceptable.

This is why "the user authorized it" becomes insufficient. The user did not authorize every intermediate step in advance. They authorized a task in natural language. The agent filled in the operational details. If those details include reading secrets, changing a hook, exporting data, or requesting a broader token, the user's original approval is not enough to settle the safety question.

"The agent has permission" is also insufficient. A credential proves that the agent can call something. It does not prove that this call is appropriate now. A support agent may have read access to customer records, but a bulk export of all customers is a different risk than reading one account during a support case. A developer agent may have write access to a repository, but changing a release workflow after reading untrusted instructions is different from editing a test. That is why scoped credentials for AI agents need to be issued for a specific user, task, resource, and operation instead of held as broad standing access.

The delegation chain makes this worse:

human -> agent -> tool -> downstream service -> business object

At each hop, context can disappear. The downstream service may see only a token. The tool may not know the user's original instruction. The identity provider may not know what the agent read before asking for access. The audit log may show a valid credential performing a valid action while missing the reason the session became unsafe.

Intent becomes attractive here because it appears to name the missing context. But the word hides more than it explains. Declared intent is what the agent was assigned or configured to do. User intent is what the human actually meant. Model reasoning is how the agent explained its next step. Behavioral intent is what the action sequence appears to imply. Authorization policy is what the system is willing to allow.

Those cannot be collapsed into one runtime check. Declared intent can be translated into policy. User intent is often ambiguous. Model reasoning can be benign even when the action is unsafe. Behavioral intent is an inference. Authorization policy is enforceable.

The operational root of this problem is semantic underdetermination: natural language tasks do not arrive with one fully determinate operational meaning. They are interpreted against implicit background assumptions. A recent paper on LLM-based configuration synthesis puts the problem plainly: "This is difficult to do even for relatively simple settings and is infeasible to expect users to do correctly for realistic tasks" (Mondal et al., HotNets 2025). The same paper cites a study where only 32% of LLM-proposed resolutions to ambiguous English sentences were considered correct by crowd-sourced evaluators (Liu et al., EMNLP 2023). Even models asked specifically to enumerate possible meanings of an ambiguous sentence often choose the wrong interpretation.

This also echoes Quine's indeterminacy of translation: multiple interpretations can be consistent with the same observable behavior and language while remaining incompatible with each other. The implication for agents is uncomfortable but practical. Even with rich observation of a user's behavior and language, multiple conflicting intent reconstructions may remain plausible. Runtime security cannot depend on discovering the one true intent if the available evidence does not determine it.

The threat model changes because an actor can be legitimate, authorized, and non-malicious while still creating unsafe effects. The agent may be confused. It may have followed poisoned context. It may have interpreted the task too broadly. It may be executing a plan that is coherent from its perspective and unacceptable from the organization's perspective.

This is not a normal service-account problem. It is a runtime authorization problem.

Part 3: What Runtime Authorization for AI Agents Should Actually Evaluate

The core distinction is safety versus correctness.

Definition: Runtime authorization for AI agents is the real-time policy layer that decides whether a proposed tool call, API request, credential request, or data access is safe enough to execute under the current identity, task, credential, resource, and session context.

Correctness asks whether the agent did the right thing. Did it implement authentication securely? Did it summarize the customer record faithfully? Did it choose the right database migration strategy? Did "clean up this repository" mean removing generated files, deleting unused code, or reorganizing the project? These are specification questions.

Runtime authorization cannot reliably answer those questions in the general case. The specification is informal. The implementation space is large. The agent's value is precisely that it translates ambiguous goals into concrete actions. If the organization already had a formal specification of every correct step, it would not need the agent to infer them.

Safety is narrower and more enforceable. It asks whether the next action should be allowed before it runs. That question can be evaluated from observable signals: scope, resource sensitivity, credential breadth, credential lifetime, delegated user context, data provenance, action type, destination, sequence history, velocity, previous denials, approval requirements, and blast radius.

These signals are imperfect, but they are enforceable. They can change the execution decision before the action runs.

The result is a control surface, not an abstract risk label. An action can be allowed, narrowed, downgraded to read-only, delayed for review, escalated to a human, or denied, and the outcome can be used to reduce the autonomy of the current session going forward.

This is also where the intent language breaks down. "Intent verification" implies that a system can compare the agent's current behavior to what the user truly meant. But intent derived from probabilistic inference is not a trustworthy security primitive. A credential proves the agent can act; it does not prove the action is appropriate given what the agent observed to get there. This is the structural gap described as a trust-authorization mismatch: static permissions are decoupled from an agent's changing runtime trustworthiness (Shi et al., 2025). What runtime authorization can enforce is a provenance boundary: what the agent is allowed to touch, how much authority it should receive, whether the action is consistent with the session so far, and whether the risk level requires escalation regardless of declared intent.

The distinction is plain: correctness asks whether the agent solved the task properly; safety asks whether this action is acceptable to execute now. Runtime authorization belongs to the second category. It should not claim the first.

That does not make the control weak. It makes the claim precise. Security controls routinely work by reducing blast radius rather than proving good intent. A database role does not know whether a query is part of a sound business decision. A network policy does not know whether an engineer is making the right architectural choice. A short-lived token does not know whether the code being deployed is correct. These controls enforce boundaries so that mistakes and compromise do not become unbounded.

Agent runtime authorization should do the same thing, but with more context.

Part 4: A Runtime Authorization Architecture for AI Agents

The right architecture is not one model call sitting in front of every tool invocation. It is a control loop: enforce what is known, score what is uncertain, escalate what is risky, and learn from repeated decisions.

Each layer has a different job. Mixing those jobs is how systems become either too rigid to use or too vague to trust.

Layer 1: Deterministic Authorization

The base layer should be deterministic. It should answer the questions that can be stated precisely: which identity authorized the agent, which agent instance is acting, which resource is in scope, which operation is requested, what credential would be issued, how long that credential should live, and which actions are never allowed.

RBAC, ABAC, ReBAC, OpenFGA-style relationship checks, credential scoping, explicit denies, and approval requirements belong in this layer. Their purpose is not to understand the agent's reasoning. Their purpose is to define the hard boundary.

If a local coding assistant requests organization-admin access, the answer should not depend on a model's interpretation of the prompt. If an agent tries to write outside its workspace, the system should not first ask whether the agent meant well. If a payment action requires approval, a plausible reasoning trace should not make that approval unnecessary. Contextual agent-security work makes the same point from the other direction: judging action safety requires the context in which the action takes place (Tsai & Bagdasarian, HotOS 2025).

This layer provides the part of the system that security teams can reason about directly. It is testable, auditable, and reviewable. It is also insufficient on its own because many unsafe agent sessions never violate a single obvious rule.

Layer 2: Real-Time Safety Scoring

Safety scoring handles the uncertain middle: actions that are allowed in some contexts and unsafe in others.

A GitHub write, database read, email send, or shell command may be routine in one session and dangerous in another. The question is whether this action, in this session, after this context, with this credential and this destination, should still be allowed automatically.

Several signals can help score that middle. They are not separate definitions of intent. They are different ways to ask the same operational question: is this action safe enough to execute automatically?

UEBA asks whether the entity is behaving unlike itself or its peers.

User and Entity Behavior Analytics compares current activity against a baseline. For agents, the entity might be an agent instance, agent type, workspace, project, user, or peer group. This is useful for drift: new resource classes, unusual volume, strange destinations, repeated denials, or activity outside the normal pattern. Its weakness is cold start. Short-lived and task-specific agents often need peer baselines before they have enough history of their own.

Taint analysis asks whether untrusted input influenced a sensitive action.

Agents read issue comments, emails, web pages, logs, README files, tool metadata, and third-party docs. That content can carry instructions. The question is not whether the text sounds malicious. The question is whether it influenced a shell command, credential request, file write, email send, API write, token exchange, or permission change. This is strong as a day-one control because it does not require historical telemetry. It catches influence chains. It does not catch every unsafe session, because not every escalation has an obvious untrusted-source to sensitive-sink path. This is the same reason securing LLM tool use with runtime policies needs provenance, not just prompt inspection.

Sequence analysis asks whether the session is moving toward an unsafe state.

Many failures are visible in the path, not the individual call. An agent may read tool metadata, inspect environment files, hit a denied request, ask for broader access, write a hook, and then change destination. Each step may have a benign explanation. Together, the path changes the safety profile. Risk-adaptive access-control work for agentic systems makes this uncertainty explicit by combining task context, resource risk, and model uncertainty when deciding whether to authorize a proposed task (Fleming et al., 2025).

Sequence analysis is useful when policy allows each individual call, UEBA has little history, and taint analysis has no clear influence chain. Its weakness is abstraction quality: if the event vocabulary is too coarse, it loses signal; if it is too detailed, it becomes noisy.

Layer 3: Escalation and Asynchronous Authorization

Risk scoring only matters if it changes what the agent can do.

The response cannot be limited to allow or deny. Agent systems need graduated control because many suspicious sessions should continue with reduced autonomy rather than stop entirely.

A runtime system can allow the action with a narrower credential, downgrade the session to read-only, increase logging, require approval, pause the session, revoke a token, quarantine a workspace, or send the trace for review (Shi et al., 2025). The goal is to reduce autonomy at the moment the risk becomes too high.

Another LLM can help at this boundary. It should not be the policy engine, and it should not be asked to divine intent from a single tool call. It can be useful as an asynchronous reviewer when the cheap path is not confident.

The reviewer should see evidence, not a vague prompt: the proposed action, the last relevant events, the credential requested, policy decisions so far, denials and retries, taint labels, sequence risk, baseline deviation, the relevant user instruction, and the relevant tool metadata. The question should stay narrow: is this action safe to allow under the evidence, and would a narrower authorization be sufficient?

This turns an LLM from a speculative gatekeeper into a trace reviewer. It can explain why a case is risky, identify missing evidence, recommend a narrower permission, or route the decision to a human. It can also be run multiple times or paired with cheaper classifiers if the organization wants higher confidence before interrupting a workflow. The reviewer LLM is not a ground-truth oracle, however. It introduces its own probabilistic inference step, subject to the same underdetermination problem as the agent itself. It is a confidence-raiser, not a certainty provider. The structured evidence framing matters precisely because it constrains what the reviewer can speculate about.

Layer 4: Policy Learning Over Time

The final layer is policy evolution.

Every escalation produces data. Some high-risk actions will be approved. Some will be denied. Some will reveal missing context. Some will show that a repeated pattern should become a hard rule. Some will show that a threshold is too noisy.

Auto-generated policy can be useful here, but only with discipline. A model can propose new rules from reviewed incidents, false positives, repeated approvals, and recurring denials. Those rules should be treated like code: reviewed, tested, versioned, rolled out gradually, and monitored. The system should not silently convert every model suggestion into enforcement.

The direction is important. Deterministic policy starts generic. Runtime risk scoring discovers where that policy is too loose or too noisy. Human and model review label the uncertain cases. Repeated decisions become new policy candidates. Over time, the hard boundary improves.

That is more concrete than intent verification. It says what the system enforces now, what it scores, when it escalates, and how it learns.

Part 5: A Better Definition of Intent

The industry is unlikely to stop using the word intent. It is too convenient, and it points at a real discomfort with static access control. The better move is to define it narrowly enough that it can be engineered.

In an authorization system, intent can only mean the observable relationship between the declared task, delegated authority, current identity, requested action, resource sensitivity, credential scope, data influence, session sequence, behavioral baseline, and safety boundary. This is not a refinement of the original term. It is a replacement. The philosophical literature has been precise about these distinctions for decades — Bratman on prior intentions, Grice on conversational implicature, Searle on illocutionary force — but none of those frameworks were designed to be enforced at a runtime boundary.

If those signals can be collected and enforced, they can affect authorization. If they cannot be observed, they should not be part of the runtime claim.

Runtime authorization should make a smaller claim and make it well: determine whether the next agent action is safe enough to allow now.

The useful definition is simple: agent intent is not what the model says it meant. It is the safety-relevant context that determines whether the next action should be allowed. That is less dramatic than intent verification. It is also more defensible.

Questions this argument answers

Why isn't agent intent a reliable security primitive?

"Intent" is too ambiguous to enforce consistently because the same external action can arise from many different internal model states, prompts, and plans. Security systems need observable inputs and repeatable decisions, so runtime authorization should evaluate the safety of the proposed action under current context rather than try to infer what the model meant.

What can runtime authorization evaluate with confidence?

Runtime authorization can evaluate concrete facts available at execution time, such as the agent identity, delegated user context, requested tool or API action, target resource, credential scope, data sensitivity, and recent behavioral signals. Those inputs are stable enough to support deterministic policy checks and bounded risk scoring, even when the model's internal reasoning remains opaque.

If runtime authorization cannot prove correctness, why is it still valuable?

The goal is not to prove that the model's plan is correct in an abstract sense. The goal is to prevent unsafe or unauthorized execution in the real world. A system can block high-risk writes, require escalation for sensitive operations, or narrow credentials before execution, which materially reduces damage even when the model still produces imperfect plans.

How do behavioral methods help without reading the model's mind?

Techniques such as UEBA, taint tracking, and sequence analysis do not need to reconstruct intent to be useful. They help estimate whether the current action sequence looks abnormal, whether untrusted inputs are influencing sensitive outputs, and whether the agent is entering a risky execution path that warrants denial or human review.

What changes when you treat runtime authorization as safety evaluation?

The architecture shifts from a single allow-or-deny policy engine toward a layered decision system that combines deterministic rules, contextual signals, and escalation paths. In that model, authorization is not just "does this identity have permission," but "is this action safe enough to execute right now under these conditions."

Jens Ernstberger is the founder of Kontext, building identity infrastructure for AI agents. Kontext CLI repository: github.com/kontext-security/kontext-cli.

DEV Community

Agent Intent - No One Knows What It Means, But It's Provocative

Agent Intent - No One Knows What It Means, But It's Provocative

Part 1: Why AI Agent Access Control Exists

Part 2: Why AI Agents Break Traditional Access Control Models

Part 3: What Runtime Authorization for AI Agents Should Actually Evaluate

Part 4: A Runtime Authorization Architecture for AI Agents

Layer 1: Deterministic Authorization

Layer 2: Real-Time Safety Scoring

Layer 3: Escalation and Asynchronous Authorization

Layer 4: Policy Learning Over Time

Part 5: A Better Definition of Intent

Why isn't agent intent a reliable security primitive?

What can runtime authorization evaluate with confidence?

If runtime authorization cannot prove correctness, why is it still valuable?

How do behavioral methods help without reading the model's mind?

What changes when you treat runtime authorization as safety evaluation?

Top comments (0)