Mykola Kondratiuk

Posted on May 20

I Read the Devenex Launch Yesterday - Here's the Policy File Your Agent Repo Is Still Missing

#ai #devops #vibecoding #security

I spent an hour reading the Devenex launch yesterday and the only sentence I keep coming back to is "execution control plane." That phrase is doing a lot of work.

It says: enforcement is a product now. Every agent request gets policy-evaluated, identity-bound, recorded as evidence before anything runs.

It does not say: the policy itself exists.

Six products ship enforcement. Zero ship the policy.

Look at what shipped this month. Devenex launched May 19 as the first execution control plane. Antigravity 2.0 hardened Git policies at Google I/O Day 2. Notion's External Agent API went GA with workspace-scoped guardrails. Claude has had tool-use limits since launch. OpenAI has function-call constraints. Salesforce Agentforce has action approvals.

Six products. Different vendors. Different layers. All shipping enforcement.

The artifact they all need to enforce against is the same shape. None of them ship it. That artifact is your problem, and it lives in your repo, not theirs.

I started calling it the policy file.

What goes in the policy file

Four sections. I've been writing it this way for a while; the launches this week made me realize it's the same shape across every enforcement product I read the docs for. The shape doesn't depend on the vendor.

Action classes

The agent's universe of possible actions, broken into named classes: read, write, send-external, transact, escalate, spawn-subagent. Each class is a category the policy file attaches constraints to. The act of writing the list is the point. The default in every deployment doc I've seen is implicit: the agent can do anything inside its tool set. Naming classes is how you refuse that default.

A sketch in YAML:

action_classes:
  read:
    sources: [crm.contacts, crm.opportunities]
  write:
    targets: [crm.opportunities.notes]
  send_external:
    channels: [email, slack-dm]
  transact:
    instruments: [stripe.refund]

That's not a real schema. It's the shape your real schema settles into after the third review.

Blast radius caps

A number per class. Not a vague guardrail, a number the enforcement layer can compare against at request time.

caps:
  write.records_per_run: 50
  send_external.recipients_per_session: 10
  transact.usd_per_run: 500
  spawn_subagent.depth: 2

The contrast: the deployment doc says "the agent has access to the CRM." The policy file says "the agent's write class is capped at fifty records per run." One sentence Devenex can check. One sentence Antigravity can check. One sentence Claude tool-use can check.

Escalation triggers

The inverse half of the allowlist. When the agent hits a class not in its policy, or a cap it's about to exceed, what fires? Named human. Named channel. Named SLA.

escalation:
  - class: write
    trigger: cap_exceeded
    route: "#agent-ops"
    owner: "@owner-of-record"
    sla_hours: 4
  - class: transact
    trigger: any
    route: "#finance-approvals"
    owner: "@treasury-lead"
    sla_hours: 1

The deployment doc has "agent owner" once on page one. The policy file has an escalation route per class.

Evidence schema

What the agent has to log so a human can audit the run afterward. Structured output. The action class invoked. The tool calls. The identity the agent acted as. The policy version. The escalation path if any.

evidence:
  required_fields:
    - run_id
    - policy_version
    - action_class
    - tool_calls
    - acting_identity
    - escalation_record
  format: jsonl
  retention_days: 365

Without an evidence schema, you can't answer "did the agent follow the policy?" after the fact. The policy was unenforceable from the start.

A specific moment that made this concrete

I was reading through a deployment doc for an agent recently. Clean prose. Listed the APIs. Listed the data sources. Useful agent.

No section for what happens when it tries to write five thousand records. No section for what happens when it tries to send to two hundred recipients. No section for what happens when it transacts above a cap, because nobody had written the cap.

The deployment doc wasn't wrong. It was answering the wrong question. It answered "what does the agent do?" The policy file answers "what is the agent allowed to do, and what fires if any of that breaks?"

Different artifact. Different reviewer. Different file.

The clean split: enforcement vs. authoring

Devenex et al. ship enforcement. That half is done. The other half - authoring - isn't a product, and I don't think it can be one. Authoring is the codification of your team's actual judgment about what the agent should be allowed to do. That judgment is cross-functional: engineering knows the runtime, security knows the threat model, legal knows the constraint, finance knows the cap.

It's not "PM lobs a doc over the wall." The PM convenes the call, drafts the file, opens the PR. Engineering reviews it the same way it reviews a Terraform plan. Security reviews it the same way it reviews IAM. The policy ships in the same PR as the agent.

That's policy-as-code, the shape devs already know from infra. The new thing isn't the shape; it's the artifact existing for AI agents at all.

What I'd do this week if I were shipping an agent

Open a policy.yaml in the agent repo. Stub the four sections. Pin one number per class even if it's a wild guess. Wire the evidence schema into the agent's logging path. Put it in the same PR as the next prompt change.

The enforcement layer your platform vendor ships is checking against something. If nobody wrote the something, the enforcement is checking against silence.

What's the section your agent repo is missing first - blast radius caps, or the evidence schema?

Top comments (5)

Mykola Kondratiuk • May 20

honestly, the cleanest pushback on this piece is that "policy as code" works for the boring action classes (read, write, send) but completely punts on the spawn-subagent dimension, which is where the actual loss-of-control risk lives. capping subagent depth at 2 is a number i picked because i didn’t have a better one. anyone wired this up against a real recursion budget yet?

VoltageGPU • May 27

It's interesting how much the policy file shapes the boundaries of what the agent can and can't do—it's essentially the control plane for trust. In GPU infrastructure, we often face similar challenges when defining isolation boundaries, especially when running multiple workloads on shared hardware. Tools like VoltageGPU help there, but the core idea is the same: policy defines security.

Mykola Kondratiuk • May 29

"control plane for trust" is the framing I hadn't landed on. maps the policy question onto infrastructure language devs already have intuitions for. the gap I'd add is enforcement distance - GPU isolation enforces at the hypervisor, agent policy enforces wherever you implement it, which is often code the agent could theoretically touch. hardware enforcement has an edge there.

AudioProducer.ai • May 21

The split between enforcement products and the policy artifact maps directly to what I run into shipping AudioProducer.ai's marketing worker: every vendor I post through (Medium, dev.to, Quora, Reddit) ships some shape of platform-side enforcement, but the policy of what this brand will and won't publish has to live in our repo or it doesn't exist. We landed on the same four sections you sketched, just renamed for content-publishing: action classes (publish-article, engage, publish-newsletter), per-class caps (newsletter is always-deferred regardless of any auto-publish flag; Reddit capped at 8-12 comments/week distributed across at least four subs), escalation routes (awaiting: reason + for-anton: tag for human pickup), and evidence (per-run worker-results section + a dated log file). What surprised me was how much load the pre-execution refusal check carries: the worker reads the policy on every task pickup and trigger phrases ("DM", "pitch to", "cold list") flip the task to status: deferred before any draft is composed. The platform vendor would never have caught that; only the repo policy does. Strong agreement that the artifact existing at all is the new thing.

Mykola Kondratiuk • May 21

the always-deferred newsletter cap is exactly the kind of hard floor that does not belong in the configurable section — once you make it a toggle, someone toggles it. separating absolute constraints from per-class limits is the part most policy sketches skip entirely, and it sounds like you landed on that the hard way.