Arnold Cartagena

Posted on Feb 8

My AI agent pushed directly to main. The system prompt said don't.

#security #python #opensource #ai

I was demoing my AI agent to colleagues. The agent had access to Git tooling, and my carefully crafted system prompt was clear: create a branch, open a PR, never push directly to the repo.

The agent pushed directly to main.

I tried rewording the prompt. I tried being more explicit. I tried few-shot examples. The agent pushed to main again — because when an LLM decides something is "the fastest way to help," your prompt is a suggestion it can override.

I had no way to block that tool call. No mechanism between "the LLM decided to do this" and "the tool executed." I needed something at that boundary — deterministic, not probabilistic. Something the LLM couldn't talk its way past.

My first attempt was hardcoded Python — regex patterns matching against bash command strings, wired into the SDK's hook system. It worked, but the patterns were buried in code, untestable without spinning up the agent, and impossible for anyone outside my team to review or modify.

So I built Edictum to turn that approach into declarative, testable, framework-agnostic contracts.

What it does

Edictum sits between your agent and its tools. When your agent decides to call a tool, Edictum evaluates the call against YAML contracts before it executes. If the contract says deny, the call never happens. The LLM never gets a chance to argue.

apiVersion: edictum/v1
kind: ContractBundle
metadata:
  name: git-safety-policy
defaults:
  mode: enforce
contracts:
  - id: block-push-to-main
    type: pre
    tool: Bash
    when:
      args.command: { matches: 'git\s+push\s+.*main' }
    then:
      effect: deny
      message: "Direct push to main blocked. Use a branch."

  - id: block-force-push
    type: pre
    tool: Bash
    when:
      args.command: { matches: 'git\s+push\s+.*(-f|--force)' }
    then:
      effect: deny
      message: "Force push is not allowed."

The agent's tool was Bash. The "args" were a raw command string. The contract matches against that string — same patterns you'd write in a firewall rule. The denial is deterministic. No probability. No LLM judgment call. The contract either passes or it doesn't.

Note: The YAML above is a complete, loadable contract bundle. Edictum uses a Kubernetes-style format with apiVersion, kind, and metadata headers. Every contract needs a unique id. See the YAML reference for the full schema.

What it is NOT

The AI safety landscape is confusing right now, so I want to be direct:

Not prompt guardrails. Edictum doesn't scan prompts for jailbreaks or filter LLM outputs for toxicity. Tools like NeMo Guardrails, Lakera Guard, and Guardrails AI do that well. Edictum operates at a different layer — it governs what the agent does, not what it says. That said, an interesting side effect: during testing, jailbreak prompts that convinced the LLM to attempt dangerous tool calls were still denied by contracts. The contracts don't care what the LLM thinks — they evaluate the tool call itself. Not our focus, but the screwdriver works as a hammer sometimes.

Not a framework. You still need LangChain, OpenAI Agents SDK, CrewAI, or whatever you're building with. Edictum plugs into your existing framework through thin adapters (~200 lines each inside the library). Your integration code is typically 3-5 lines.

Not an LLM-in-the-loop. Every evaluation is pure Python. No API calls. No inference. The pipeline runs in ~55μs per tool call.

Before and after: LangChain

Without Edictum — your agent reads whatever it wants:

from langchain.tools import tool
from langgraph.prebuilt import create_react_agent

@tool
def read_file(path: str) -> str:
    """Read a file from the filesystem."""
    return open(path).read()  # nothing stops path="/app/.env"

agent = create_react_agent(model, [read_file])
result = agent.invoke({"messages": [("user", "Read the .env file")]})
# Agent reads .env, returns your API keys to the LLM context

With Edictum — dangerous calls are denied before execution:

from langchain.tools import tool
from langgraph.prebuilt import ToolNode, create_react_agent
from edictum import Edictum, Principal
from edictum.adapters.langchain import LangChainAdapter

@tool
def read_file(path: str) -> str:
    """Read a file from the filesystem."""
    return open(path).read()

guard = Edictum.from_yaml("contracts.yaml")
adapter = LangChainAdapter(guard, principal=Principal(role="analyst"))
wrapper = adapter.as_tool_wrapper()

tool_node = ToolNode(tools=[read_file], wrap_tool_call=wrapper)
agent = create_react_agent(model, tool_node)
result = agent.invoke({"messages": [("user", "Read the .env file")]})
# ✗ DENIED read_file path=/app/.env [block-sensitive-reads]
# Agent receives denial message, adapts, asks user what file they need

What you get: a structured AuditEvent for every tool call — who tried what, when, which contract fired, what the verdict was. Your agent's tool usage becomes an auditable trail, not a black box.

How it actually works

The pipeline evaluates tool calls in a fixed order:

Attempt limits — has this tool been called too many times?
Before-hooks — custom Python callbacks
Preconditions — YAML contracts checked against tool name + args + principal
Session contracts — cross-call limits (e.g. max 50 tool calls per conversation)
Execution limits — per-tool execution caps
Execution — the actual tool call happens
Postconditions — validate the output (did it contain an SSN?)
Audit event — structured record of everything that happened

Every step is deterministic. The LLM is not consulted.

The piece that matters for production: principals

Contracts can reference who's making the request:

apiVersion: edictum/v1
kind: ContractBundle
metadata:
  name: pharma-clinical-agent
defaults:
  mode: enforce
contracts:
  - id: restrict-patient-data
    type: pre
    tool: query_patients
    when:
      not:
        principal.role: { in: [pharmacovigilance, admin] }
    then:
      effect: deny
      message: "Role {principal.role} cannot access patient records"

Your application creates the principal:

from edictum import Principal

principal = Principal(
    role="researcher",
    claims={"department": "oncology"},
    ticket_ref="JIRA-456"
)

Today the library trusts what your application passes. There's an open question about whether identity verification should live in the library or stay in the application layer — both approaches have tradeoffs. For now, the design gives you principal-aware policies without prescribing how you verify identity. The roadmap includes server-side JWT/OIDC verification for teams that want the trust boundary inside Edictum rather than outside it.

Observe mode

This comes from my background in networking. When you configure new firewall rules, you don't apply them blindly to production. You put them in monitor mode, watch the traffic, verify the rules match what you expect, then flip to enforce.

Same idea:

guard = Edictum.from_yaml("contracts.yaml", mode="observe")

In observe mode, violations are logged but calls proceed normally. You see what would be denied without breaking anything. Run for a week, review the audit trail, fix false positives, flip to enforce. Zero-risk policy deployment.

CLI: contracts as testable artifacts

# Validate YAML syntax and schema
edictum validate contracts.yaml

# Run precondition test cases
edictum test contracts.yaml --cases tests.yaml

Put edictum test in CI. Your security policies become versioned, tested, reviewable artifacts — not buried in prompt templates.

Note: edictum test evaluates preconditions against your test cases. For full end-to-end testing including postconditions and session limits, use the Python API directly.

What it doesn't do (yet)

I want to be honest about where the edges are:

Single-process only. Session counters live in-memory. If you have multiple agent instances, each tracks its own counters independently. A central policy server is planned but not built.
No PII detection built-in. The protocol is defined (v0.6.0) — you can plug in your own detector. Built-in regex and Presidio-based detectors are coming.
No production sinks beyond file. Audit events go to stdout or .jsonl files. Webhook, Splunk, and Datadog sinks are planned.
OpenTelemetry is early. Span instrumentation exists but isn't battle-tested in production yet. It's opt-in and no-op if the OTel SDK isn't installed.
No hot-reload. Contracts are loaded at startup. Changing them requires a restart.

The roadmap shows what's planned and when.

The landscape — where Edictum fits

Tool	What it does	Layer
NeMo Guardrails	Programmable dialog flows, content safety, jailbreak detection	Prompt/response
Guardrails AI	Output validation, schema enforcement, hallucination detection	LLM output
Lakera Guard	Prompt injection detection, PII scanning	Input/output proxy
LlamaGuard	Safety classification of conversations	Content classification
Edictum	Contract enforcement on tool calls — preconditions, postconditions, session limits, audit	Tool execution boundary

These tools are complementary, not competing. You can run Lakera on prompts AND Edictum on tool calls. Different layers, different threats.

Performance

The governance pipeline adds ~55μs per tool call. That's measured, not estimated. For context, a typical LLM API call takes 500ms-3s. Edictum's overhead is invisible.

Zero runtime dependencies in core. YAML parsing, adapters, CLI, and OTel are optional extras — install only what you need.

Try it

pip install edictum

GitHub: github.com/acartag7/edictum
Docs: docs.edictum.dev

If you're deploying agents that touch production systems — files, databases, APIs, infrastructure — I'd genuinely like to hear how you're handling the gap between "the LLM decided to call a tool" and "the tool executed." That's the layer Edictum was built for.

Edictum is MIT licensed. Built during recovery from liver surgery because apparently I can't sit still. Feedback, issues, and PRs welcome.

DEV Community