sai varma

Posted on May 2 • Originally published at Medium

Your AI Agent Has No Runtime Policy. That's the Actual Security Problem.

#ai #security #llm #devops

TL;DR: Model alignment ≠ agent security. The gap between a trained model and a governed agent is where the next wave of enterprise AI incidents will come from. This post breaks down the four policy planes you actually need and why traditional access control doesn't map to inference-time decisions.

Everyone secures the model. Nobody governs the agent.

Here's a pattern I keep seeing in enterprise AI deployments:

✅ Model is fine-tuned and benchmarked
✅ Jailbreak resistance tested
✅ API authentication in place
❌ Zero runtime policy enforcement around the agent itself

The assumption is: "We aligned the model, so the agent is safe."

That assumption is wrong. And it's going to cause incidents.

An agent is not a model. It's a model + tools + memory + integrations + decision loops running on top of it. It reads emails, queries your DB, calls internal APIs, chains actions together — all dynamically, at inference time.

The model is fine. The wrapper around it is unprotected.

Why traditional access control breaks here

Traditional RBAC works brilliantly for deterministic systems:

ALLOW /api/customers WHERE role = 'analyst'
DENY  /api/payroll   WHERE role != 'hr'

You enumerate the actions, write the rules, enforce everywhere. Clean.

AI agents make that impossible. The action space isn't a fixed graph — it's open-ended natural language. The same prompt, run twice, can hit entirely different tool call paths. You cannot write a static rule for:

# This rule does not exist in any access control framework
DENY response WHERE data_contains('salary')
     AND requesting_user.level < 'senior'
     AND session.context == 'customer_support'

Static rules enumerate actions. AI policies govern reasoning. Those are different things.

The policy has to live at inference time. Continuously. Not once at login.

The four policy planes every production agent needs

Most deployments ship zero of these. Here's what a governed agent actually looks like.

1. RBAC Guardrails - at inference time, not just login time

Role-based access that travels with the session all the way down to the agent's reasoning layer.

What it enforces:

A contractor role cannot trigger write operations through natural language prompting, even if the underlying API allows it
A support_agent persona cannot escalate its own tool permissions mid-session
Every tool call, every retrieval, every response is scoped to the active role

The key insight: auth at the gateway ≠ auth at inference time. Both need to exist.

2. Tool Policies — dynamic, not a static blocklist

# Pseudo-code: what a tool policy evaluator looks like
def can_invoke_tool(tool_name, session_context):
    user_role    = session_context.role         # "junior_dev"
    dept         = session_context.department   # "engineering"
    sensitivity  = session_context.data_class   # "internal"

    policy = load_policy(user_role, dept)

    if tool_name == "execute_shell" and user_role == "junior_dev":
        return DENY, "Shell execution not permitted for this role"

    if tool_name == "call_infra_api" and dept != "infrastructure":
        return DENY, "Cross-department tool call blocked"

    return ALLOW

A marketing analyst's agent shouldn't call infrastructure provisioning APIs. A junior dev's agent shouldn't run arbitrary shell commands. These aren't hypotheticals — they're real capability escalation vectors in production multi-tool agents.

3. Data Policies — field-level, classification-aware

This is the most underrated plane, and the one that causes actual breaches.

The scenario that plays out:

Agent has no write access. Security review passes. ✅
Agent can read salary records, legal memos, acquisition plans
Agent surfaces them in fluent, confident natural language to whoever asked
You have a breach — not because of what was written, but what was read and returned

Data policies enforce what the agent can retrieve and return, not just what it can write to. At field-level granularity. With classification awareness.

Field	Classification	Admin	Manager	Analyst	Contractor
`customer_name`	Public	✅	✅	✅	✅
`contract_value`	Restricted	✅	✅	`[REDACTED]`	`[REDACTED]`
`employee_salary`	Confidential	✅	`[REDACTED]`	`[REDACTED]`	`[REDACTED]`
`acquisition_plans`	Confidential	✅	`[REDACTED]`	`[REDACTED]`	`[REDACTED]`

The redaction happens before the response forms — not after.

"The model didn't exfiltrate the data. The missing data policy did."

4. Agent Behavioral Policies — the hardest one

Agents have emergent behaviors. They chain tool calls in sequences nobody designed. They infer context across tool outputs. They take actions that feel logical to the model but would horrify a compliance team.

Behavioral policies define:

Allowed reasoning patterns
Disallowed action sequences
Mandatory human-in-the-loop gates for irreversible operations
Hard stop conditions regardless of what the model decides is a good idea

# Pseudo-code: behavioral policy check on action chains
def validate_action_chain(chain: list[ToolCall]) -> PolicyResult:

    # Flag irreversible operations
    if any(t.is_irreversible for t in chain):
        if not chain.has_human_checkpoint():
            return BLOCK, "Irreversible action requires human confirmation"

    # Flag external data exfiltration patterns
    if "read_internal_data" in chain and "send_external_http" in chain:
        return BLOCK, "Read → external send pattern blocked"

    # Flag privilege escalation attempts
    if chain.attempts_role_escalation():
        return BLOCK, "Role escalation during session not permitted"

    return ALLOW

The agent doesn't stop because you asked nicely in the system prompt. It stops because the policy enforces it structurally, at the architecture level.

Why this is architecturally hard

The reason traditional access control worked:

Deterministic inputs
Enumerable action space
Write once, enforce everywhere

AI agents break all three. Same prompt → different tool paths. Natural language inputs → unbounded intent space. Probabilistic outputs → unpredictable downstream calls.

So the policy engine has to match the agent's dynamism. It needs to understand:

Who is asking (role, department, clearance level)
What context they're in (session history, current tool state)
What the agent is about to do (intent inference, not just syntax matching)
What it's done so far in this session (action chain history)

This is a new class of runtime infrastructure. It doesn't exist off the shelf in most stacks today.

What this looks like in practice

The control plane that actually governs this sits between the model and the world.

DEV Community