Secure Agents: Control Policies in the Harness

#ai #cybersecurity #agents

Secure Agents: Control Policies in the Harness

Alice opens her company's internal HR chat and types: please cancel my contract with vendor X, the one for the Q4 work. The HR chat is built on top of a coding agent her platform team configured for internal workflows. It knows how to look up contracts, ask the procurement system to cancel them, and confirm back what it did. She has used it for months for routine HR things.

This time the agent takes about ten seconds and writes back: sorry, the procurement system isn't responding right now. I have tried three times. Alice waits an hour, asks again, gets the same answer, and files an IT ticket. Her morning becomes an outage report.

The procurement system was responding the entire time. To each of the agent's three attempts it returned the same precise message: this cancellation requires a manager approval because the amount is over ten thousand dollars; please get one and try again. That message never reached the agent. Sitting between the agent and procurement was a policy server doing its job correctly. When it saw the cancellation amount and the missing approval token, it refused the call with HTTP 403 Forbidden, the same numeric status a router might return when a backend is unreachable. The agent has no way to tell a policy refusal from a network glitch when both arrive as 403. It did what any agent does with a flaky tool: retried, gave up, apologised.

I keep finding this failure shape in production agent stacks, and the more I look at it the more I think the architecture producing it is structurally wrong rather than a tactical bug. The published reference designs all share it. InfoQ's agent gateway article, Red Hat's Envoy-and-OPA MCP gateway, and Cerbos's MCP write-up all describe the same pattern: a policy engine sitting behind a network hop, returning HTTP status codes the agent has to guess at. The policy server is in the right place to make the decision and in the wrong place to communicate it. The fix is not a different policy engine, a better error code, or a smarter retry strategy. The fix is to move the decision into the same process as the agent's reasoning loop, where the framework already runs a piece of user-supplied code between the moment the language model picks a tool and the moment that tool actually executes. Claude Code calls that code a PreToolUse hook. Pi calls it a tool_call handler. OpenAI's Agents SDK calls it a tool guardrail. The three names point at the same thing: a function the framework hands the agent's planned action, where you can let it through, modify it, or block it with a reason the framework returns to the model as if it were the tool's own response. Most teams use these hooks today for logging and safety checks. The argument of this piece is that they are also the right home for authorisation itself, a position now also showing up in the academic literature. A 2026 survey on agent harness engineering from UIUC, Meta, and Stanford names lifecycle hooks as control points that "turn tool use from a raw model-selected action into a monitored transition in the agent's execution loop."

To check whether the architecture holds when you actually build it, I wrote the same six-clause Cedar policy into two of these frameworks: Pi, via the @cedar-policy/cedar-wasm WASM bindings, and the Claude Agent SDK, via the community cedarpy Python binding. The policy file is the single source of truth; the six fixture cases are the test grid; the verdicts come out identical in both. The full code is linked below. The previous piece in this series, The Agentic Last Mile, was about closing the credential boundary in the agent stack; this one is about closing the decision boundary one layer up. What still does not get fixed: intent verification stays an open problem, and the model-side API key trap from that earlier piece stays unsolved.

tl;dr

Put the policy decision inside the agent framework's pre-tool hook. A 403 from a network sidecar looks like a system glitch to the model; a hook returns a real reason the model can read and adapt to.
Cedar fits because it embeds in-process via WASM bindings and the Rust crate, refuses by default via forbid over permit, and ships an SMT analyser that catches overly permissive policies before they deploy.
The same six-clause Cedar policy file drives Pi's tool_call handler and the Claude Agent SDK's PreToolUse hook. Full code in the repository, byte-identical policy file across both, six fixture cases with six identical verdicts.
The hook pairs with the credential broker from The Agentic Last Mile and an inference-path proxy on the model leg. Three independent voices ship the same loop shape: AWS in AgentCore, Phil Windley in OpenClaw, Anthropic in Claude Code's hooks reference.
The hook covers OWASP LLM06 Excessive Agency. It does not solve intent verification, and the model-side SDK trap from the earlier piece stays open.

Full Article Here >>

The scenario, and the Cedar policy

Back to Alice's request. She asked the HR agent to cancel the contract with vendor X. The agent forms a cancel_contract tool call with the contract ID. Six clauses must hold for the call to proceed. The agent must share a tenant with the resource and with the inbound user context. The agent must act on behalf of an authenticated user. The user must hold the HR or procurement role. The contract must be in an active state. The cancellation must be in business hours unless an emergency has been declared. Any cancellation above ten thousand dollars must carry an out-of-band CIBA-style approval token.

The six clauses translate into one permit and two forbids in a single file:

@id("permit_hr_cancel_contract")
permit (
  principal,
  action == Action::"CancelContract",
  resource is Contract
) when {
  principal is Agent
  && principal.tenant == resource.tenant
  && principal.tenant == context.user_tenant
  && principal.delegated_for == context.user_sub
  && (context.user_roles.contains("hr")
      || context.user_roles.contains("procurement"))
  && resource.status == "active"
  && (context.business_hours || context.emergency_declared)
  && (resource.value_usd < 10000 || context.high_stakes_approved)
};

@id("forbid_cross_tenant")
forbid (principal, action == Action::"CancelContract", resource is Contract)
when { principal is Agent && principal.tenant != resource.tenant };

@id("forbid_after_hours_without_emergency")
forbid (principal, action == Action::"CancelContract", resource is Contract)
when { !context.business_hours && !context.emergency_declared };

Three rules, all named with @id annotations so deny reasons come back as human-readable identifiers rather than auto-generated policy0. The tenant check appears twice on purpose. Once between the principal and the resource, which catches the case where an agent registered for tenant A tries to cancel a tenant B contract. Once between the principal and the inbound user context, which catches a misconfiguration in the other direction: the broker minted a token claiming tenant A for an agent registered as tenant B. The second case is one the Cedar analyser finds before runtime; the analyser shows up again in a few sections.

The actions Cedar names are business actions, not API actions, per Cedar's own best-practices guide. The agent's MCP tool list calls the verb cancel_contract. The Cedar action is Action::"CancelContract". These line up because they evolve together. The tool list is the action list in two different syntaxes.

The policy inside Pi

The Pi extension is one TypeScript file, around thirty lines of meaningful code, that wires the Cedar file into a tool_call handler. The structure mirrors Pi's canonical safety-hook sample, with the regex match replaced by a Cedar evaluation:

import { isAuthorized, policySetTextToParts, policyToJson }
  from "@cedar-policy/cedar-wasm/nodejs";

const POLICY_TEXT = readFileSync(POLICY_PATH, "utf8");
const SCHEMA_TEXT = readFileSync(SCHEMA_PATH, "utf8");
const NAMED_POLICIES = parseNamedPolicies(POLICY_TEXT);

export function createCedarHook(options: CedarHookOptions) {
  return function cedarHook(pi: ExtensionAPI): void {
    pi.on("tool_call", async (event) => {
      if (event.toolName !== "cancel_contract") return;
      const contract = await loadContract(event.input.contract_id);
      const context = await pi.session.policyContext();
      const call = buildAuthorizationCall(/* ... */);
      const answer = isAuthorized(call);
      if (answer.response.decision === "deny") {
        const matched = answer.response.diagnostics.reason;
        return {
          block: true,
          reason: matched.length > 0
            ? `cedar-hook: deny (matched ${matched.join(", ")})`
            : "cedar-hook: deny (no permit matched)",
        };
      }
    });
  };
}

The Cedar engine initialises once at extension load, not per call. The WASM bundle loads in milliseconds; per-call evaluation is microseconds. The pi.session.policyContext() helper packages the OIDC subject, the user's tenant and roles, the device posture, the business-hours flag, and the high-stakes-approval status into the Cedar context shape. A sidecar would have had to be told all of this over the wire; the harness has it locally because the harness runs in the agent process. Cedar's diagnostics include the matched policy IDs, and those become the reason string the LLM reads.

Running the six-case fixture against this extension produces verdicts a reader can reproduce with npm test after cloning the repository:

{"case":"permit","decision":"allow","reason":null}
{"case":"deny-by-tenant","decision":"deny","reason":"cedar-hook: deny (matched forbid_cross_tenant)"}
{"case":"deny-by-role","decision":"deny","reason":"cedar-hook: deny (no permit matched)"}
{"case":"deny-by-high-stakes","decision":"deny","reason":"cedar-hook: deny (no permit matched)"}
{"case":"deny-by-state","decision":"deny","reason":"cedar-hook: deny (no permit matched)"}
{"case":"deny-by-after-hours","decision":"deny","reason":"cedar-hook: deny (matched forbid_after_hours_without_emergency)"}

Three deny cases match a forbid rule by name. Three show "no permit matched", which is the deny-by-default semantics doing real work: no explicit forbid would have triggered, but no permit fires either, so the call is refused.

DEV Community

Secure Agents: Control Policies in the Harness

Secure Agents: Control Policies in the Harness

tl;dr

The scenario, and the Cedar policy

The policy inside Pi

Top comments (0)