Oluwajuwon Omotayo

Posted on Jun 22

I Built a Policy Enforcement Layer for Vercel's Eve Agent Framework. Here's What I Learned About AI Governance the Hard Way.

#ai #javascript #opensource #typescript

TL;DR

Vercel's Eve (2.2k stars, actively developed) is a filesystem-first framework for building durable backend agents — but it ships with no governance layer
I built eve-policy, an open source policy enforcement library that wires into Eve's needsApproval and execute lifecycle with four-tier semantics: deny, escalate, audit, allow
The most interesting design challenge: Eve's needsApproval is synchronous and session-less, while execute is async with full session context — which forced a two-phase evaluation model
Built-in profiles include OWASP Agentic Top 10 (2026) and a financial services baseline covering USD/NGN/KES CTR thresholds
Fail-closed by default: no matching rule means deny, not allow
Repo: github.com/kingztech2019/eve-policy · npm: npmjs.com/package/eve-policy

Why Eve needs a governance layer

Vercel's Eve is a framework for building durable backend agents — filesystem-first, TypeScript-native, deeply integrated with Vercel's infrastructure. You define an agent as a directory of files: instructions.md, tools/, skills/, channels/, subagents/. It compiles to inspectable artifacts, exposes a stable HTTP protocol with sessionId and continuationToken, and manages the full lifecycle of agent runs.

It is a genuinely well-designed framework. Eve thinks carefully about durability, composability, and the operational reality of running agents in production.

What it does not ship with is a governance layer.

That is not a criticism. Frameworks generally do not dictate compliance policy — that is a product concern, not a framework concern. But it creates a gap that anyone building a real AI agent with Eve will eventually hit.

If your agent can call a transfer_funds tool, who decides whether a ₦6.5 million transfer requires human approval? If your agent has a subagent that can write to a data store, what prevents that subagent from calling tools it should not? If your agent handles BVN data, how do you ensure it never appears in an audit log?

These are governance questions. They do not belong scattered across individual tool implementations — that creates drift. They do not belong inside the agent loop — that couples your business rules to the framework. They belong in a dedicated layer that sits between Eve's lifecycle events and your tools' implementations.

That is what eve-policy is.

The design challenge: Eve's two-phase lifecycle

The most technically interesting part of building eve-policy was figuring out how to fit governance into Eve's lifecycle correctly.

Eve exposes two lifecycle hooks on every tool:

needsApproval(ctx) — called synchronously before the turn completes, to decide whether to pause execution and wait for human approval. The context at this point is minimal: you have the tool name and the tool input, but no session yet. This hook must return a boolean synchronously — you cannot await anything here.

execute(input, ctx) — called asynchronously when the tool actually runs. By this point, the full session context is available: session.id, session.auth, session.parent (whether the caller is a subagent), and the complete turn history.

This split forced a two-phase evaluation model in eve-policy.

Phase 1 (approval time): Input-based controls fire in needsApproval. Does the amount exceed a reporting threshold? Does the input contain a card PAN pattern? Does the tool name match a blocked pattern? These questions can be answered from the tool input alone — no session needed, evaluation stays synchronous.

Phase 2 (execution time): Session-aware controls fire in execute. Is the caller a subagent? What is the principal type? Is this an already-approved tool? These questions require ToolContext.session, which is only available in execute.

The same policy evaluates twice, but the context it has access to is different each time. This means a rule like "escalate financial writes from subagents" will see isSubagentCall() return false in needsApproval (no session yet) and true in execute (session available). The first evaluation handles the approval decision; the second handles the execution decision.

Here is what that looks like in the evaluation flow:

Eve lifecycle                    eve-policy evaluation
─────────────────────────────────────────────────────

  turn starts
       │
       ▼
  needsApproval(ctx)  ──────►  evaluatePolicy(policy, approvalCtx)
       │                            input-based rules only:
       │                            amountExceeds, fieldMatches,
       │                            toolNameIs, ...
       │  escalate? → true
       │  otherwise → false
       │
       │  [human approves if needsApproval=true]
       │
       ▼
  execute(input, ctx) ──────►  evaluatePolicy(policy, executeCtx)
       │                            session-aware rules now available:
       │                            isSubagentCall(), principalTypeIs(),
       │                            alreadyApproved(), ...
       │
       ├── deny?     → throw PolicyDenialError  (tool never runs)
       ├── escalate? → run tool, write audit entry
       ├── audit?    → run tool, write audit entry
       └── allow?    → run tool

The key insight: governance decisions on the hot path must never block on I/O. All rule match predicates are synchronous by design. Async work (audit logging) runs fire-and-forget after the decision is made.

Installation and quick start

pnpm add eve-policy
# or
npm install eve-policy

Requirements: Node.js >= 18, Eve >= 0.10.0

import { definePolicy, withNamedPolicy } from "eve-policy";
import { deny, escalate, audit } from "eve-policy/rules";
import { amountExceeds, isSubagentCall, toolNameIs, and } from "eve-policy/rules";
import { FileAuditLogger } from "eve-policy/audit";

// Define your policy
const transferPolicy = definePolicy({
  name: "transfer-controls",
  version: "1.0.0",
  rules: [
    deny("no-self-transfer",
      ctx => ctx.toolInput.source === ctx.toolInput.destination,
      "Source and destination accounts cannot be the same"),

    escalate("ctr-threshold",
      amountExceeds("amount", 10_000),
      "Exceeds $10,000 CTR threshold — compliance review required",
      { riskLevel: "critical", owaspRisks: ["ASI02"] }),

    escalate("subagent-financial-write",
      and(isSubagentCall(), toolNameIs("transfer_funds")),
      "Financial writes from subagents require human approval",
      { riskLevel: "critical", owaspRisks: ["ASI03"] }),

    audit("all-transfers",
      toolNameIs("transfer_funds"),
      "All fund transfers logged for compliance"),
  ],
});

// Wrap your Eve tool — one line, drop-in replacement
const safeTool = withNamedPolicy(
  "transfer_funds",
  transferTool,
  transferPolicy,
  { auditLogger: new FileAuditLogger("/var/log/agent-audit.jsonl") }
);

// Use it in your agent exactly as before
// Eve calls needsApproval and execute automatically

withNamedPolicy returns a new ToolDefinition — same shape, same interface, fully compatible with Eve. The governance layer is invisible to the rest of the system.

The four effects

Every rule has an effect: deny, escalate, audit, or allow.

Effect	`needsApproval`	`execute`	Audit
`deny`	`false`	throws `PolicyDenialError`	always
`escalate`	`true`	runs tool if approved	yes
`audit`	`false`	runs tool	yes
`allow`	`false`	runs tool	opt-in

Rules are evaluated in declaration order. First match wins. When no rule matches, defaultEffect applies — and the default is "deny", not "allow". A governance layer with no matching rules should not silently permit execution.

If you want allow-by-default with specific restrictions, set defaultEffect: "allow" explicitly. This forces you to make the intention clear rather than discovering it by accident.

Composing policies

compose() is how production policies are built. It flattens rules from multiple component policies and applies deny-wins semantics for the default effect.

import { compose } from "eve-policy";
import { owaspTop10Policy, financialBaselinePolicy } from "eve-policy/profiles";

const myAgentPolicy = compose(
  owaspTop10Policy,        // 20 rules covering ASI01–ASI10
  financialBaselinePolicy, // CTR thresholds, sanctions, card data, KYC
  myDomainPolicy,          // your own business rules
);

// compose() guarantees:
// - Rules evaluated in order across all component policies
// - First match wins within a single evaluation
// - defaultEffect is the STRICTEST across all composed policies
// - If any component is deny, composed policy is deny

One important constraint: avoid always() catch-all rules inside composable policies. If a component policy catches every call with always(), it prevents later policies' specific rules from ever firing. Use defaultEffect for fallthrough behaviour instead.

Built-in profiles

OWASP Agentic Top 10 (2026)

owaspTop10Policy provides coverage for all ten risks in the OWASP Top 10 for Agentic Applications:

ASI01 (Goal Hijack): Deny shell tools, prompt injection patterns, goal-modification language
ASI02 (Tool Misuse): Deny unapproved file writes and card PAN in input; escalate network calls
ASI03 (Identity Abuse): Escalate privileged tools invoked from subagents
ASI04 (Supply Chain): Escalate runtime package installs and dynamic tool registration
ASI05 (Code Execution): Deny eval()/exec() patterns; escalate code runner tools
ASI06 (Context Poisoning): Escalate memory writes; audit memory reads
ASI07 (Inter-Agent Comms): Escalate cross-agent and cross-session invocations
ASI08 (Cascading Failures): Escalate bulk operations exceeding 100 items
ASI09 (Trust Exploitation): Deny impersonation language; escalate approval-bypass patterns
ASI10 (Rogue Agents): Deny self-replication and self-deployment; escalate autonomous scheduling

Financial services baseline

financialBaselinePolicy covers jurisdiction-agnostic financial controls:

Deny self-transfers, negative amounts, sanctioned countries (OFAC/FATF list), card PANs, CVV in input
Escalate USD/NGN/KES CTR thresholds, international wire transfers, KYC bypass language, subagent financial writes
Audit customer PII access, moderate-value transactions, report generation
Allow explicit read-only tools (get_balance, get_exchange_rate, check_kyc_status)

The audit trail

Every deny, escalate, and audit decision writes a structured entry to your configured logger. The schema:

interface AuditEntry {
  id: string;           // UUID v4
  timestamp: string;    // ISO 8601
  toolName: string;
  effect: "deny" | "escalate" | "audit" | "allow";
  ruleName: string;     // which rule fired (or "default")
  policyName: string;
  reason: string;
  riskLevel: "critical" | "high" | "medium" | "low";
  owaspRisks: string[]; // e.g. ["ASI01", "ASI03"]
  sessionId?: string;
  principalId?: string;
  sanitizedInput: Record<string, unknown>; // secrets redacted automatically
  evaluationMs: number;
  outcome: "denied" | "escalated" | "executed" | "approved";
}

Automatic sanitization replaces keys matching password, token, api_key, cvv, secret, ssn, nin, pin with [REDACTED]. Strings over 500 characters are truncated. The audit log itself should never become a liability.

Four backends ship out of the box: ConsoleAuditLogger, FileAuditLogger, MultiAuditLogger, and InMemoryAuditLogger (for tests).

Testing without a live runtime

evaluatePolicy runs the policy evaluation synchronously without needing an Eve runtime. Combined with InMemoryAuditLogger, you can write comprehensive policy tests with standard unit test tooling.

import { evaluatePolicy } from "eve-policy";
import { InMemoryAuditLogger } from "eve-policy/audit";
import type { PolicyContext } from "eve-policy";

function ctx(toolName: string, input: Record<string, unknown> = {}): PolicyContext {
  return { toolName, toolInput: input, approvedTools: new Set() };
}

describe("transferPolicy", () => {
  it("escalates NGN transfers above ₦5M", () => {
    const decision = evaluatePolicy(
      transferPolicy,
      ctx("transfer_funds", { amount: 6_000_000, currency: "NGN" })
    );
    expect(decision.effect).toBe("escalate");
    expect(decision.owaspRisks).toContain("ASI02");
  });

  it("denies self-transfers", () => {
    const decision = evaluatePolicy(
      transferPolicy,
      ctx("transfer_funds", { source: "acc-001", destination: "acc-001", amount: 100 })
    );
    expect(decision.effect).toBe("deny");
  });

  it("allows read-only balance check", () => {
    const decision = evaluatePolicy(transferPolicy, ctx("get_balance"));
    expect(decision.effect).toBe("allow");
  });
});

No network calls. No Eve process. Fast and deterministic.

OWASP coverage reporting

uncoveredOwaspRisks() tells you which ASI risks your policy does not address. Use it as a CI gate:

import { uncoveredOwaspRisks, owaspCoverageReport } from "eve-policy";

const gaps = uncoveredOwaspRisks(myPolicy);
if (gaps.length > 0) {
  throw new Error(`Policy missing OWASP coverage: ${gaps.join(", ")}`);
}

This is the kind of thing that prevents "we have a governance layer" from drifting into "we have a governance layer with an unnoticed gap in ASI07."

What is coming next

The jurisdiction-specific policy packs — NDPA 2023, CBN AML/CFT, NFIU reporting, POPIA, Kenya DPA — are being developed as part of a separate package called @comply54/adapter-eve, which will provide pre-composed African regulatory profiles for direct use with eve-policy. That package is under active development and not yet released.

The underlying policy corpus (in OPA/Rego format) is already open source as agt-policies-nigeria, which was contributed upstream to microsoft/agent-governance-toolkit earlier this month. The @comply54/adapter-eve package will port those same frameworks into eve-policy's TypeScript rule format.

Discussion

Two questions I am genuinely curious about from people building with Eve or similar agent frameworks:

First — how are you currently handling governance for tools that need human approval? Are you implementing needsApproval directly in each tool, centralising it somewhere, or leaving it for later?

Second — the fail-closed default (defaultEffect: "deny") felt like the obvious choice to me, but I have seen arguments for fail-open defaults in development environments. What is your team's approach?

Drop your thoughts in the comments. And if eve-policy is useful, a star on the repo helps others find it.

DEV Community