NAOL ZEWUDU

Posted on Jan 28

Don’t Trust Your Agents. Trust Your Boundary: a runtime authorization layer for LLM tool calls.

#llm #agents #security #ai

**If your “agent” can do anything real, like issue refunds, change production data, update infrastructure, or email customers, you have crossed a line. The hard part is no longer whether the model can suggest the right action. The hard part is whether you can prove that a high-stakes action was authorized before it executed. Code: https://github.com/lemnk/Sudo-agent

That’s the trust gap in agentic systems:

The model’s reasoning is probabilistic.

The side effect is deterministic and irreversible.

Most teams try to close that gap with prompt guardrails, ad‑hoc allowlists, or IAM roles that become over‑provisioned the moment the agent needs to do more than one thing. Those tools help, but they don’t give you a single, strict, auditable boundary between “intent” and “execution”.

SudoAgent’s mission is deliberately narrow: provide a runtime authorization layer for tool and function calls, with deterministic control and provable accountability, by enforcing the rule:

Do not execute this action unless governance can be proven.

Why “logging” isn’t enough
Security teams already log things. But most logs are mutable operational artifacts, not evidence.

If you rely on a normal app log or a database audit table, you’re implicitly trusting the machine that wrote it. If the host is compromised, a determined attacker can often alter or delete those records after the fact.

Security research has treated this as a core problem for decades: how to keep audit records on an untrusted machine while making past entries hard to change without detection. Schneier and Kelsey’s classic work on secure audit logs frames this directly: you want prior log entries to be impossible to undetectably modify or destroy once written. (schneier.com)

So SudoAgent treats evidence as a first‑class artifact:

write a decision record before execution (fail‑closed)

write an outcome record after execution (best‑effort)

make the ledger tamper‑evident (hash chaining + canonicalization)

optionally add signatures for authenticity

This is also directionally aligned with where regulation and governance are going. The EU AI Act explicitly requires high‑risk AI systems to support automatic recording of events (logs), and to keep logs for an appropriate period (at least six months in Article 19’s summary). (ai-act-service-desk.ec.europa.eu)
NIST’s AI RMF is voluntary, but it sets a common language for trustworthy AI risk management and operationalization across governance functions. (nist.gov)

The core idea: a deterministic boundary
SudoAgent is not an agent framework. It doesn’t orchestrate multi‑step plans, rank tools, or “make agents smarter.” It is a boundary around the last step that matters: the call that causes side effects.

At runtime, for each tool/function call, SudoAgent does:

Redact first
Build a Context from redacted args/kwargs (to avoid secret leakage and to keep evidence safe to share).

2 Deterministic policy decision
A Policy returns one of:

. ALLOW

DENY

REQUIRE_APPROVAL

3 Optional approval
If approval is required, SudoAgent pauses synchronously until approved/denied. Approvals can be interactive (dev) or injected (Slack/UI/HTTP). Approval is bound to the exact decision using hashes.

4 . (Optional) budgets
Rate/spend limits that fail closed. If budgets can’t be evaluated safely, deny.

5 . Decision evidence (fail‑closed)
Write a decision entry to the tamper‑evident ledger before execution. If this fails, the action does not run.

Execute
Outcome evidence (best‑effort)
Write an outcome entry after execution; failures do not change the function result/exception.

This is a governance pipeline, not a prompt pipeline

Policy-as-code is necessary but not sufficient
Policy engines like Open Policy Agent (OPA) exist because policy is too important to bury inside application business logic. OPA’s Rego is purpose‑built for policy evaluation over structured data. (openpolicyagent.org)
Similarly, AWS open‑sourced Cedar with a high‑assurance focus, including formal modeling and proofs about correctness properties. (aws.amazon.com)

SudoAgent is compatible with this worldview, but aims at a different layer:

OPA/Cedar answer: “Should this be allowed?”

SudoAgent answers: “Can we prove the allow/deny process happened before execution, and can we verify the evidence later?”

The “later” matters. Audit and incident response are downstream consumers of your system, and they need more than “trust me.”The evidence model: tamper-evident ledger

SudoAgent’s v2 ledger is an append‑only chain of canonical JSON objects:

Each entry contains entry_hash.

Each entry also contains prev_entry_hash.

If someone modifies, deletes, inserts, or reorders entries, verification fails.

This is the same family of idea behind transparency logs (e.g., Certificate Transparency): append‑only data structures designed to be publicly auditable. (datatracker.ietf.org)

SudoAgent uses straightforward hash chaining (not a Merkle tree) because v2 optimizes for:

simplicity

inspectability

correctness

single-host deployments

The ledger can be verified at any time using sudoagent verify, which checks:

canonical JSON shape

schema/ledger versions

hash chain continuity

decision/outcome reference integrity

optional signature validity

Worked example: high-value refund
Imagine this rule: “Refunds above $500 require approval.”

In SudoAgent terms:

the Policy sees a redacted Context

if amount > 500, REQUIRE_APPROVAL

approval is bound to this exact decision

decision record is written before the refund executes

A simplified ledger decision entry looks like:

{
“schema_version”: “2.0”,
“ledger_version”: “2.0”,
“created_at”: “2026–01–26T10:00:00.000000Z”,
“event”: “decision”,
“request_id”: “req-123”,
“action”: “payments.refund_user”,
“agent_id”: “payments:refund-bot:prod-01”,
“decision”: {
“effect”: “allow”,
“reason”: “approval granted”,
“reason_code”: “POLICY_REQUIRE_APPROVAL_HIGH_VALUE”,
“policy_id”: “RefundPolicy:v2”,
“policy_hash”: “…”,
“decision_hash”: “…”
},
“approval”: {
“binding”: {
“request_id”: “req-123”,
“policy_hash”: “…”,
“decision_hash”: “…”
},
“approved”: true,
“approver_id”: “alice@example.com”
},
“prev_entry_hash”: “…”,
“entry_hash”: “…”
}

Then the outcome entry references the same decision_hash and request_id. If someone tries to “mix and match” an outcome to a different decision, verification fails.

This is the key security property: the approval is not “approve a refund.” It’s “approve this refund with these redacted parameters under this policy hash.”

Performance: what latency does it add?
SudoAgent adds measurable latency because it writes durable evidence (decision + outcome) to disk.

On a Windows dev machine with local SSDs (measured with a small benchmark):

JSONL ledger: ~45 ms p50, ~73 ms p95

SQLite WAL ledger: ~39 ms p50, ~60 ms p95

Approval path (auto‑approve): ~47 ms p50, ~86 ms p95

This is expected: each guarded call performs two durable writes plus hash chaining.

If you’re guarding high-stakes actions like refunds or infrastructure changes, this tradeoff is usually acceptable. If you need ultra-low latency, don’t guard every tiny tool call. Guard the boundary where side effects happen, and use policy to auto-allow low-risk actions.

Practical knobs:

Prefer SQLite WAL for multi‑process and often lower latency than JSONL.

Keep ledgers on fast local disks (avoid network filesystems).

Disable signing unless you need authenticity proofs.

Guard fewer, higher‑risk actions rather than everything.

Adoption guide: getting value quickly
1) Decide what’s “high-stakes”
Guard actions that can cause irreversible harm:

money movement

production writes

destructive commands

customer communications

Don’t start by guarding “everything.” Start by guarding the boundary that matters.

2) Pick an agent_id convention
Make agent_id stable and human-parsable. A good default:

team:service:instance

examples: payments:refund-bot:prod-01, support:triage:staging

This pays dividends in audits, dashboards, and incident response.

3) Version your policies on purpose
Include versions in policy_id:

good: RefundPolicy:v2

ambiguous: RefundPolicy

Historical evidence remains valid because the ledger stores policy_id and policy_hash at decision time.

4) Choose a ledger backend
JSONL ledger: simple, inspectable, single‑writer

SQLite WAL ledger: better for multi‑process on one host; easier querying

5) Treat approval as an integration boundary
v2 approval is synchronous by design. In production, most teams implement one of:

Slack approver (webhook + timeout)

HTTP approver service (timeout + audit)

UI approver (internal tool)

Timeout is an approver concern:

if approval times out, deny and log

agent retries later with a new request_id

6) Operationalize verification
Verification is not just a dev tool. Treat it like a health check:

run sudoagent verify on a schedule

alert on failures

export/filter/search ledger entries during incidents

7) Understand what SudoAgent does not protect
SudoAgent is not a sandbox and doesn’t prevent side effects inside the guarded function. It protects governance and evidence integrity — not host compromise.

Why this is the right primitive
The core move is simple, and it’s the right abstraction:

Stop trying to “trust the agent.” Trust a deterministic boundary with verifiable evidence.

The result is a system where you can answer hard questions with proof:

“Why did this refund run?”

“Under what policy version?”

“Who approved it?”

“Can we detect tampering in the audit trail?”

“Can we export a receipt for compliance?”

That’s the difference between “we logged it” and “we can prove it.”

References (selected)
EU AI Act (Regulation (EU) 2024/1689): record‑keeping and log retention summaries (Articles 12 and 19). (ai-act-service-desk.ec.europa.eu)

NIST AI RMF 1.0 and playbook (trustworthy AI risk management). (nist.gov)

Schneier & Kelsey (1999): secure audit logs as tamper‑evident records on untrusted machines. (schneier.com)

Certificate Transparency (RFC 6962) and CT explainer: append‑only transparency logs and auditability. (datatracker.ietf.org)

Open Policy Agent (Rego): policy language designed for structured authorization decisions. (openpolicyagent.org)

AWS Cedar: open‑source policy language with formal modeling emphasis. (aws.amazon.com)**

DEV Community

Don’t Trust Your Agents. Trust Your Boundary: a runtime authorization layer for LLM tool calls.

Top comments (0)