TL;DR
- Vercel's Eve (2.2k stars, actively developed) is a filesystem-first framework for building durable backend agents — but it ships with no governance layer
- I built
eve-policy, an open source policy enforcement library that wires into Eve'sneedsApprovalandexecutelifecycle with four-tier semantics: deny, escalate, audit, allow - The most interesting design challenge: Eve's
needsApprovalis synchronous and session-less, whileexecuteis async with full session context — which forced a two-phase evaluation model - Built-in profiles include OWASP Agentic Top 10 (2026) and a financial services baseline covering USD/NGN/KES CTR thresholds
- Fail-closed by default: no matching rule means deny, not allow
- Repo: github.com/kingztech2019/eve-policy · npm: npmjs.com/package/eve-policy
Why Eve needs a governance layer
Vercel's Eve is a framework for building durable backend agents — filesystem-first, TypeScript-native, deeply integrated with Vercel's infrastructure. You define an agent as a directory of files: instructions.md, tools/, skills/, channels/, subagents/. It compiles to inspectable artifacts, exposes a stable HTTP protocol with sessionId and continuationToken, and manages the full lifecycle of agent runs.
It is a genuinely well-designed framework. Eve thinks carefully about durability, composability, and the operational reality of running agents in production.
What it does not ship with is a governance layer.
That is not a criticism. Frameworks generally do not dictate compliance policy — that is a product concern, not a framework concern. But it creates a gap that anyone building a real AI agent with Eve will eventually hit.
If your agent can call a transfer_funds tool, who decides whether a ₦6.5 million transfer requires human approval? If your agent has a subagent that can write to a data store, what prevents that subagent from calling tools it should not? If your agent handles BVN data, how do you ensure it never appears in an audit log?
These are governance questions. They do not belong scattered across individual tool implementations — that creates drift. They do not belong inside the agent loop — that couples your business rules to the framework. They belong in a dedicated layer that sits between Eve's lifecycle events and your tools' implementations.
That is what eve-policy is.
The design challenge: Eve's two-phase lifecycle
The most technically interesting part of building eve-policy was figuring out how to fit governance into Eve's lifecycle correctly.
Eve exposes two lifecycle hooks on every tool:
needsApproval(ctx) — called synchronously before the turn completes, to decide whether to pause execution and wait for human approval. The context at this point is minimal: you have the tool name and the tool input, but no session yet. This hook must return a boolean synchronously — you cannot await anything here.
execute(input, ctx) — called asynchronously when the tool actually runs. By this point, the full session context is available: session.id, session.auth, session.parent (whether the caller is a subagent), and the complete turn history.
This split forced a two-phase evaluation model in eve-policy.
Phase 1 (approval time): Input-based controls fire in needsApproval. Does the amount exceed a reporting threshold? Does the input contain a card PAN pattern? Does the tool name match a blocked pattern? These questions can be answered from the tool input alone — no session needed, evaluation stays synchronous.
Phase 2 (execution time): Session-aware controls fire in execute. Is the caller a subagent? What is the principal type? Is this an already-approved tool? These questions require ToolContext.session, which is only available in execute.
The same policy evaluates twice, but the context it has access to is different each time. This means a rule like "escalate financial writes from subagents" will see isSubagentCall() return false in needsApproval (no session yet) and true in execute (session available). The first evaluation handles the approval decision; the second handles the execution decision.
Here is what that looks like in the evaluation flow:
Eve lifecycle eve-policy evaluation
─────────────────────────────────────────────────────
turn starts
│
▼
needsApproval(ctx) ──────► evaluatePolicy(policy, approvalCtx)
│ input-based rules only:
│ amountExceeds, fieldMatches,
│ toolNameIs, ...
│ escalate? → true
│ otherwise → false
│
│ [human approves if needsApproval=true]
│
▼
execute(input, ctx) ──────► evaluatePolicy(policy, executeCtx)
│ session-aware rules now available:
│ isSubagentCall(), principalTypeIs(),
│ alreadyApproved(), ...
│
├── deny? → throw PolicyDenialError (tool never runs)
├── escalate? → run tool, write audit entry
├── audit? → run tool, write audit entry
└── allow? → run tool
The key insight: governance decisions on the hot path must never block on I/O. All rule match predicates are synchronous by design. Async work (audit logging) runs fire-and-forget after the decision is made.
Installation and quick start
pnpm add eve-policy
# or
npm install eve-policy
Requirements: Node.js >= 18, Eve >= 0.10.0
import { definePolicy, withNamedPolicy } from "eve-policy";
import { deny, escalate, audit } from "eve-policy/rules";
import { amountExceeds, isSubagentCall, toolNameIs, and } from "eve-policy/rules";
import { FileAuditLogger } from "eve-policy/audit";
// Define your policy
const transferPolicy = definePolicy({
name: "transfer-controls",
version: "1.0.0",
rules: [
deny("no-self-transfer",
ctx => ctx.toolInput.source === ctx.toolInput.destination,
"Source and destination accounts cannot be the same"),
escalate("ctr-threshold",
amountExceeds("amount", 10_000),
"Exceeds $10,000 CTR threshold — compliance review required",
{ riskLevel: "critical", owaspRisks: ["ASI02"] }),
escalate("subagent-financial-write",
and(isSubagentCall(), toolNameIs("transfer_funds")),
"Financial writes from subagents require human approval",
{ riskLevel: "critical", owaspRisks: ["ASI03"] }),
audit("all-transfers",
toolNameIs("transfer_funds"),
"All fund transfers logged for compliance"),
],
});
// Wrap your Eve tool — one line, drop-in replacement
const safeTool = withNamedPolicy(
"transfer_funds",
transferTool,
transferPolicy,
{ auditLogger: new FileAuditLogger("/var/log/agent-audit.jsonl") }
);
// Use it in your agent exactly as before
// Eve calls needsApproval and execute automatically
withNamedPolicy returns a new ToolDefinition — same shape, same interface, fully compatible with Eve. The governance layer is invisible to the rest of the system.
The four effects
Every rule has an effect: deny, escalate, audit, or allow.
| Effect | needsApproval |
execute |
Audit |
|---|---|---|---|
deny |
false |
throws PolicyDenialError
|
always |
escalate |
true |
runs tool if approved | yes |
audit |
false |
runs tool | yes |
allow |
false |
runs tool | opt-in |
Rules are evaluated in declaration order. First match wins. When no rule matches, defaultEffect applies — and the default is "deny", not "allow". A governance layer with no matching rules should not silently permit execution.
If you want allow-by-default with specific restrictions, set defaultEffect: "allow" explicitly. This forces you to make the intention clear rather than discovering it by accident.
Composing policies
compose() is how production policies are built. It flattens rules from multiple component policies and applies deny-wins semantics for the default effect.
import { compose } from "eve-policy";
import { owaspTop10Policy, financialBaselinePolicy } from "eve-policy/profiles";
const myAgentPolicy = compose(
owaspTop10Policy, // 20 rules covering ASI01–ASI10
financialBaselinePolicy, // CTR thresholds, sanctions, card data, KYC
myDomainPolicy, // your own business rules
);
// compose() guarantees:
// - Rules evaluated in order across all component policies
// - First match wins within a single evaluation
// - defaultEffect is the STRICTEST across all composed policies
// - If any component is deny, composed policy is deny
One important constraint: avoid always() catch-all rules inside composable policies. If a component policy catches every call with always(), it prevents later policies' specific rules from ever firing. Use defaultEffect for fallthrough behaviour instead.
Built-in profiles
OWASP Agentic Top 10 (2026)
owaspTop10Policy provides coverage for all ten risks in the OWASP Top 10 for Agentic Applications:
- ASI01 (Goal Hijack): Deny shell tools, prompt injection patterns, goal-modification language
- ASI02 (Tool Misuse): Deny unapproved file writes and card PAN in input; escalate network calls
- ASI03 (Identity Abuse): Escalate privileged tools invoked from subagents
- ASI04 (Supply Chain): Escalate runtime package installs and dynamic tool registration
-
ASI05 (Code Execution): Deny
eval()/exec()patterns; escalate code runner tools - ASI06 (Context Poisoning): Escalate memory writes; audit memory reads
- ASI07 (Inter-Agent Comms): Escalate cross-agent and cross-session invocations
- ASI08 (Cascading Failures): Escalate bulk operations exceeding 100 items
- ASI09 (Trust Exploitation): Deny impersonation language; escalate approval-bypass patterns
- ASI10 (Rogue Agents): Deny self-replication and self-deployment; escalate autonomous scheduling
Financial services baseline
financialBaselinePolicy covers jurisdiction-agnostic financial controls:
- Deny self-transfers, negative amounts, sanctioned countries (OFAC/FATF list), card PANs, CVV in input
- Escalate USD/NGN/KES CTR thresholds, international wire transfers, KYC bypass language, subagent financial writes
- Audit customer PII access, moderate-value transactions, report generation
- Allow explicit read-only tools (
get_balance,get_exchange_rate,check_kyc_status)
The audit trail
Every deny, escalate, and audit decision writes a structured entry to your configured logger. The schema:
interface AuditEntry {
id: string; // UUID v4
timestamp: string; // ISO 8601
toolName: string;
effect: "deny" | "escalate" | "audit" | "allow";
ruleName: string; // which rule fired (or "default")
policyName: string;
reason: string;
riskLevel: "critical" | "high" | "medium" | "low";
owaspRisks: string[]; // e.g. ["ASI01", "ASI03"]
sessionId?: string;
principalId?: string;
sanitizedInput: Record<string, unknown>; // secrets redacted automatically
evaluationMs: number;
outcome: "denied" | "escalated" | "executed" | "approved";
}
Automatic sanitization replaces keys matching password, token, api_key, cvv, secret, ssn, nin, pin with [REDACTED]. Strings over 500 characters are truncated. The audit log itself should never become a liability.
Four backends ship out of the box: ConsoleAuditLogger, FileAuditLogger, MultiAuditLogger, and InMemoryAuditLogger (for tests).
Testing without a live runtime
evaluatePolicy runs the policy evaluation synchronously without needing an Eve runtime. Combined with InMemoryAuditLogger, you can write comprehensive policy tests with standard unit test tooling.
import { evaluatePolicy } from "eve-policy";
import { InMemoryAuditLogger } from "eve-policy/audit";
import type { PolicyContext } from "eve-policy";
function ctx(toolName: string, input: Record<string, unknown> = {}): PolicyContext {
return { toolName, toolInput: input, approvedTools: new Set() };
}
describe("transferPolicy", () => {
it("escalates NGN transfers above ₦5M", () => {
const decision = evaluatePolicy(
transferPolicy,
ctx("transfer_funds", { amount: 6_000_000, currency: "NGN" })
);
expect(decision.effect).toBe("escalate");
expect(decision.owaspRisks).toContain("ASI02");
});
it("denies self-transfers", () => {
const decision = evaluatePolicy(
transferPolicy,
ctx("transfer_funds", { source: "acc-001", destination: "acc-001", amount: 100 })
);
expect(decision.effect).toBe("deny");
});
it("allows read-only balance check", () => {
const decision = evaluatePolicy(transferPolicy, ctx("get_balance"));
expect(decision.effect).toBe("allow");
});
});
No network calls. No Eve process. Fast and deterministic.
OWASP coverage reporting
uncoveredOwaspRisks() tells you which ASI risks your policy does not address. Use it as a CI gate:
import { uncoveredOwaspRisks, owaspCoverageReport } from "eve-policy";
const gaps = uncoveredOwaspRisks(myPolicy);
if (gaps.length > 0) {
throw new Error(`Policy missing OWASP coverage: ${gaps.join(", ")}`);
}
This is the kind of thing that prevents "we have a governance layer" from drifting into "we have a governance layer with an unnoticed gap in ASI07."
What is coming next
The jurisdiction-specific policy packs — NDPA 2023, CBN AML/CFT, NFIU reporting, POPIA, Kenya DPA — are being developed as part of a separate package called @comply54/adapter-eve, which will provide pre-composed African regulatory profiles for direct use with eve-policy. That package is under active development and not yet released.
The underlying policy corpus (in OPA/Rego format) is already open source as agt-policies-nigeria, which was contributed upstream to microsoft/agent-governance-toolkit earlier this month. The @comply54/adapter-eve package will port those same frameworks into eve-policy's TypeScript rule format.
Discussion
Two questions I am genuinely curious about from people building with Eve or similar agent frameworks:
First — how are you currently handling governance for tools that need human approval? Are you implementing needsApproval directly in each tool, centralising it somewhere, or leaving it for later?
Second — the fail-closed default (defaultEffect: "deny") felt like the obvious choice to me, but I have seen arguments for fail-open defaults in development environments. What is your team's approach?
Drop your thoughts in the comments. And if eve-policy is useful, a star on the repo helps others find it.
Top comments (0)