DEV Community

Cover image for AGENTS.md Is Not Enough for Safe AI Agent Execution
Bobai Kato for Ota

Posted on • Originally published at ota.run

AGENTS.md Is Not Enough for Safe AI Agent Execution

Overview

AGENTS.md is useful.

It gives AI coding agents a place to find repo-specific guidance:

  • how to behave
  • what conventions matter
  • what areas need extra caution
  • what kinds of changes should trigger review

That is a meaningful improvement over sending an agent into a repo with no instructions at all.

But AGENTS.md is not enough.

It can tell an agent to be careful.

It cannot, by itself, make execution safe, verification trustworthy, or review inspectable.

For that, a repository needs more than instructions.

It needs:

  • declared safe commands
  • a canonical verification path
  • receipts that show what actually ran

That is the difference between agent guidance and execution governance.

Instructions Help. They Do Not Govern Execution.

An instruction file is still prose.

That means it can express intent, but it does not automatically create operational truth.

For example, AGENTS.md can say:

  • run the right checks before handoff
  • avoid destructive commands
  • do not edit generated files
  • ask before touching infrastructure

Those are good rules.

But notice what they leave unresolved:

  • which checks are the right ones
  • which commands are actually safe
  • which paths are protected structurally versus only suggested
  • what should count as evidence that verification happened
  • how to tell whether a failure came from code, setup, or drift

That is where many agent workflows still break down.

The agent may follow the spirit of the instructions and still take the wrong execution path.

Safe Commands Need To Be Explicit

One of the biggest gaps in agent-oriented repos is that they often declare guidance without declaring a safe command surface.

The repo may tell the agent:

Run tests before you finish.

But that still leaves a dangerous amount of interpretation.

Which task is safe?

Is it:

  • npm test
  • pnpm test
  • make check
  • docker compose run test
  • a narrower unit-test path
  • the CI workflow itself

And if several exist, which one is canonical for a routine code change?

The repo should not force the agent to infer that from scattered hints.

It should declare safe commands explicitly.

That means giving the repo a machine-readable answer to questions like:

  • what tasks exist
  • which ones are agent-safe
  • what they depend on
  • what runtime mode they use
  • what they are expected to verify

That is much stronger than asking an agent to "be careful" around shell commands it still has to interpret.

Verification Needs To Be A Path, Not A Suggestion

The second gap is verification.

Many repos still treat verification like a recommendation rather than a declared path.

An instruction file might say:

Make sure everything still works before handoff.

That sounds fine, but it is too loose for reliable agent execution.

A trustworthy repo should be able to say something more concrete:

  • this is the setup path
  • this is the finite verification workflow
  • these tasks are safe to run for routine work
  • this heavier path exists, but it is not the default

That is the difference between advice and governance.

Without a declared verification path, the agent may:

  • pass a narrow local check and miss the real gate
  • run a destructive path unnecessarily
  • skip a required service-backed test lane
  • choose the wrong runtime mode
  • report success without proving repo readiness

Receipts Are The Missing Trust Layer

Even explicit commands and verification paths are still weaker than they should be if nothing
records what actually ran.

This is where receipts matter.

A verification receipt is the difference between:

  • "the agent says it ran the checks"

and:

  • "the repo can show which task ran, under which contract, in which mode, with what outcome"

That is the trust boundary most agent workflows still lack.

Receipts help answer questions like:

  • what contract or workflow was selected
  • what task actually executed
  • what backend or runtime mode was used
  • whether setup ran first
  • whether readiness was reached
  • what evidence existed when the run failed

Without receipts, review still depends too heavily on:

  • agent narration
  • terminal screenshots
  • CI guesswork
  • someone remembering what the command probably was

With receipts, verification becomes inspectable.

What A Better Repo Looks Like

A stronger repository keeps these layers distinct:

  • AGENTS.md for human-written behavioral guidance
  • a contract surface for tasks, workflows, safe commands, and boundaries
  • receipts for execution evidence

For example:

- Prefer small diffs.
- Do not edit generated files manually.
- Escalate before changing deployment or billing flows.
- Use the declared verification path before handoff.
Enter fullscreen mode Exit fullscreen mode
agent:
  safe_tasks:
    - lint
    - typecheck
    - test
  verify_after_changes:
    - test

tasks:
  test:
    command:
      exe: pnpm
      args: [test]
    depends_on:
      - setup

workflows:
  verify:
    setup:
      task: setup
    run:
      task: test
Enter fullscreen mode Exit fullscreen mode

And then the execution layer should be able to produce evidence rather than only output:

ota run test --json
ota receipt --json --archive
Enter fullscreen mode Exit fullscreen mode

The exact tool does not matter as much as the structure:

  • instructions
  • safe commands
  • verification path
  • receipt

That is the minimum shape of trustworthy agent execution.

Why This Matters More Now

This was already useful when agents mostly suggested edits.

It becomes much more important when agents are expected to:

  • choose commands
  • prepare environments
  • run checks
  • interpret failures
  • decide whether work is complete

At that point, the problem is no longer just "does the agent have instructions?"

The problem is whether the repo can expose:

  • a safe execution surface
  • a deterministic verification path
  • evidence that the declared path actually ran

That is a higher bar than AGENTS.md alone can satisfy.

This Is The Stronger Split

If you only need the boundary between instructions and contracts, read:

This post is narrower.

Its claim is not just that AGENTS.md and ota.yaml do different jobs.

Its claim is that even a good instruction file is still not enough unless the repo also declares:

  • which commands are safe
  • which verification path is canonical
  • what receipt counts as evidence

Bottom Line

AGENTS.md is a good start.

But repo instructions alone do not make agent execution safe, reviewable, or trustworthy.

To get there, repositories also need:

  • explicit safe commands
  • declared verification workflows
  • receipts that preserve execution evidence

That is how you move from:

  • "the agent had guidance"

to:

  • "the repo had governed execution"

Original Post: https://ota.run/blog/agents-md-is-not-enough-for-safe-ai-agent-execution

Top comments (1)

Collapse
 
vinimabreu profile image
Vinicius Pereira

Strong agree on the core, and imo it's the same reason prompt-based guardrails fail: an instruction the agent can choose to ignore isn't a control, it's a suggestion w/ good intentions. Prose can't constrain execution, only the harness can. The model proposes, the code disposes.

Two places I'd push it further though. Declared safe commands only bite if the runner actually enforces the allowlist, i.e. the agent can run the declared tasks and literally nothing else. If it can still shell out to anything and you're just hoping it prefers the safe list, ota.yaml is AGENTS.md w/ nicer syntax, the teeth are in the runner refusing whatever isn't declared, not in the declaration itself. Same logic for receipts: a receipt is only worth something if the harness emits it, not the agent. If the agent writes its own receipt you're back to narration w/ extra steps, since the thing you're verifying is also the thing producing the evidence. Trust comes from the evidence being generated outside the agent's control. Nail those two and the instructions / contract / receipt split is genuinely the right shape.