YAML Policies and SQLite Audit Trails - What Production AI Governance Actually Looks Like

#ai #governance #sqlite #yaml

Most AI governance conversations stop at "we log everything." That is observability, not governance. Observability tells you what happened after the fact. Governance stops the bad thing before it executes.

Today we shipped two features that make that distinction concrete: a YAML policy engine and a SQLite audit brain. Here is what they do and why they matter.

The Problem

We run 13 AI agents in production. Each agent has different permissions, different risk levels, and different access needs. A bookkeeper agent should never call external APIs. A legal counsel agent should have stricter escalation rules than a content writer. A translator does not need write access to financial files.

Hardcoding these rules inside the governance server works at 3 agents. It breaks at 13. It is impossible at 100.

YAML Policy Engine

Every agent now has a policy file:

/governance/policies/
    global.yaml              # defaults for everyone
    roles/
        bookkeeper.yaml      # role-level rules
        legal.yaml
        coach.yaml
        ...10 roles total
    agents/
        harry.yaml            # agent-specific overrides
        aram.yaml
        tamara.yaml
        ...13 agents total

Resolution is deterministic: global defaults, then role policy, then agent override. Agent-level wins.

A policy file looks like this:

version: "1.0"
agent_id: harry-bookkeeper
role: bookkeeper

tools:
  allowed:
    - Bash
    - Read
    - Write
    - calculator
  denied:
    - WebFetch
    - ExternalAPI
    - email-send

escalation:
  level: strict
  kill_threshold: 2
  high_risk_kill: 1

runtime:
  max_runtime_minutes: 30
  max_cost_usd: 1.00

When Harry tries to call WebFetch, the governor checks his policy and blocks it. No code change required. Edit the YAML, reload, done.

This is what governance looks like when you need to scale: configuration, not code.

SQLite Audit Brain

Every decision the governor makes now writes to a persistent SQLite database. Not a log file. A queryable, indexable, exportable database.

Schema:

audit_events - every allow/deny decision with agent ID, session, tool, policy rule, risk category, escalation level
agents - registered agents with last seen timestamps
heartbeats - agent health state
killswitch_events - every forced termination with reason
policy_versions - which policy was active when

You can query it:

GET /ns/audit/query?agent_id=harry&allowed=0
GET /ns/audit/stats
GET /ns/audit/export?format=json

The stats endpoint right now returns real data:

{
  "total_events": 77,
  "allowed": 69,
  "denied": 8,
  "kill_events": 1,
  "by_category": [
    {"category": "filesystem.delete", "count": 6},
    {"category": "policy.tool_denied", "count": 1}
  ]
}

That is not a demo. That is production data from a live system.

Why This Combination Matters

YAML policies without audit trails are unenforceable. You changed a policy, but can you prove what was active when an incident happened? No.

Audit trails without externalized policies are inflexible. Every change requires a code deploy. At scale, that is a bottleneck that becomes a liability.

Together, they give you:

Configurable governance (YAML)
Persistent proof (SQLite)
Queryable history (API endpoints)
No code changes for policy updates

If you are deploying AI agents in a regulated industry - banking, healthcare, government, legal - this is the minimum viable governance layer. Anything less is a compliance gap waiting to be discovered.

The Stack

NS Governor v2.1 (Python, 749 lines)
SQLite audit module (347 lines)
YAML policy engine (212 lines)
24 policy files
3 Claude Code hooks (pre-tool, post-tool, stop)
Running on a $48/month VPS

Open source: npmjs.com/package/mcp-nervous-system

Arthur Palyan builds AI governance infrastructure at Levels Of Self. The Nervous System MCP is production-deployed and listed in the Anthropic MCP directory.