I built a tiny CI tool to keep AI agent configs from drifting in my repo

#ai #python #devops #opensource

If your team uses AI coding agents on real code, you've probably hit this:

The rules your agents are supposed to follow — what tools they're allowed to use, who can call who, when something should escalate — live in scattered prompts, READMEs, and Notion docs. Nothing in your CI pipeline fails when reality drifts from those documents.

So I made a small thing.

What it does

It turns the rules into YAML that lives in your repo:

# .agent-ops/registry/tool-acl.yaml
backend-builder:
  tools:
    - repo_read
    - repo_write_backend
    - run_backend_tests

security-reviewer:
  tools:
    - repo_read
    - dependency_scan

blocked_tools:
  - direct_email_send
  - production_delete

A Python validator fails CI when:

An agent declares a tool the ACL doesn't grant
An agent calls another agent not in the call graph
A sensitive action (email send, deploy, external post) is missing required evidence fields

There's also a ~100-line Python module you can import into your own agent runner to enforce the same rules at runtime, before tools execute:

from agent_ops_guard import AgentOpsGuard

guard = AgentOpsGuard(".")
guard.assert_tool_allowed("backend-builder", "repo_read")
guard.assert_call_allowed("orchestrator", "backend-builder")

If a check fails it raises PolicyDenied and your runner blocks the action before it happens.

Try it in 5 minutes

git clone https://github.com/RPSingh1990/agent-contract-tests
cd agent-contract-tests
python3 scripts/agent_ops_validate.py --strict

That runs the validator against the included example. Pure Python, no dependencies.

To drop the starter files into your own repo:

python3 scripts/agent_ops_init.py --target /path/to/your-repo

What this isn't

Being honest about scope:

Not an agent framework — AutoGen, CrewAI, and LangGraph already do orchestration
Not a sandbox — these are repo-level contracts, not process isolation
Not an LLM eval suite — Promptfoo, DeepEval, and Inspect already do that

It's a small contract-test layer, narrow on purpose.

What I'm looking for

Single author, MIT, days old. The questions I actually have:

Does the YAML shape match how teams structure agent rules in practice, or is the abstraction wrong?
Are any of the checks process-for-process's-sake?
What drift have you seen that this wouldn't catch?

Repo: https://github.com/RPSingh1990/agent-contract-tests

Issues and PRs welcome. So is "this is wrong because..."