If your team uses AI coding agents on real code, you've probably hit this:
The rules your agents are supposed to follow — what tools they're allowed to use, who can call who, when something should escalate — live in scattered prompts, READMEs, and Notion docs. Nothing in your CI pipeline fails when reality drifts from those documents.
So I made a small thing.
What it does
It turns the rules into YAML that lives in your repo:
# .agent-ops/registry/tool-acl.yaml
backend-builder:
tools:
- repo_read
- repo_write_backend
- run_backend_tests
security-reviewer:
tools:
- repo_read
- dependency_scan
blocked_tools:
- direct_email_send
- production_delete
A Python validator fails CI when:
- An agent declares a tool the ACL doesn't grant
- An agent calls another agent not in the call graph
- A sensitive action (email send, deploy, external post) is missing required evidence fields
There's also a ~100-line Python module you can import into your own agent runner to enforce the same rules at runtime, before tools execute:
from agent_ops_guard import AgentOpsGuard
guard = AgentOpsGuard(".")
guard.assert_tool_allowed("backend-builder", "repo_read")
guard.assert_call_allowed("orchestrator", "backend-builder")
If a check fails it raises PolicyDenied and your runner blocks the action before it happens.
Try it in 5 minutes
git clone https://github.com/RPSingh1990/agent-contract-tests
cd agent-contract-tests
python3 scripts/agent_ops_validate.py --strict
That runs the validator against the included example. Pure Python, no dependencies.
To drop the starter files into your own repo:
python3 scripts/agent_ops_init.py --target /path/to/your-repo
What this isn't
Being honest about scope:
- Not an agent framework — AutoGen, CrewAI, and LangGraph already do orchestration
- Not a sandbox — these are repo-level contracts, not process isolation
- Not an LLM eval suite — Promptfoo, DeepEval, and Inspect already do that
It's a small contract-test layer, narrow on purpose.
What I'm looking for
Single author, MIT, days old. The questions I actually have:
- Does the YAML shape match how teams structure agent rules in practice, or is the abstraction wrong?
- Are any of the checks process-for-process's-sake?
- What drift have you seen that this wouldn't catch?
Repo: https://github.com/RPSingh1990/agent-contract-tests
Issues and PRs welcome. So is "this is wrong because..."
Top comments (0)