Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

#security #ai #python #opensource

Working in business process automation and exploring AI agents — reading
the research, following the tooling, watching how teams were starting to
deploy them — two topics kept surfacing in almost every serious
conversation: governance and liability.

Who is accountable when an agent makes a consequential decision? How do
you prove what context it had? What happens when it acts on data it
shouldn't have accessed? How do you satisfy a regulator who wants to see
the record? I'm not a veteran agent engineer — I came to these questions
from the automation and process side, developing my understanding of how
AI agents actually work in production and where the real friction is. But
the governance questions didn't require deep hands-on experience to
recognise. They were showing up everywhere: in the research, in the
compliance conversations, in the gap between what the tooling offered and
what real accountability would actually require.

I found those questions genuinely interesting. I also found that the
existing tooling didn't have good answers for them. Observability
platforms record outputs. Memory stores optimise for recall. Neither was
designed around the question that kept coming up: what can you actually
prove, to whom, and how?

Aevum is my attempt at a best current answer. Not a final one — the
field is moving fast and the right architecture will keep evolving. But
a principled one, built around the properties that governance and
liability actually require: consent as a precondition, tamper-evident
audit as a structural property, and deterministic replay as the mechanism
that turns a log into evidence.

What Aevum is

Aevum is an open-source context kernel for AI agents. It sits between
your agent and the data it accesses. Every read and write is
policy-governed. Every decision is recorded in a tamper-evident sigchain.
Any past session can be replayed deterministically.

It is not a memory store. It is not an observability platform. It is the
governance and auditability layer underneath both.

from aevum.core import Engine
from aevum.core.consent.models import ConsentGrant

engine = Engine()

engine.add_consent_grant(ConsentGrant(
    grant_id="g1",
    subject_id="user-42",
    grantee_id="billing-agent",
    operations=["ingest", "query"],
    purpose="billing-inquiry",
    classification_max=1,
    granted_at="2026-01-01T00:00:00Z",
    expires_at="2030-01-01T00:00:00Z",
))

result = engine.ingest(
    data={"invoice_id": "INV-001", "amount": 1500.00},
    provenance={
        "source_id": "billing-system",
        "chain_of_custody": ["billing-system"],
        "classification": 1,
    },
    purpose="billing-inquiry",
    subject_id="user-42",
    actor="billing-agent",
)

print(result.audit_id)           # urn:aevum:audit:0196...
print(result.status)             # ok
print(engine.verify_sigchain())  # True

No consent grant means no operation. Not a warning — an error, every
time, at the kernel level. The five absolute barriers (crisis detection,
classification ceiling, consent, audit immutability, provenance) are
hardcoded. They cannot be disabled by configuration, policy, or
administrator override.

The replay distinction

LangSmith's "replay" re-runs a trace against a new model version. That
is re-execution. LangGraph Time Travel restores a checkpoint. That is
state recovery. Neither produces a replayable audit artifact.

Aevum's replay reads from the immutable provenance graph — not the live
knowledge graph. Two calls to engine.replay with the same audit_id
return identical data regardless of how much time has passed or how the
live graph has changed. That guarantee is what makes it useful as
compliance evidence, not just a debugging tool.

Why governance questions are becoming engineering questions

EU AI Act Article 12 enforcement begins August 2, 2026. High-risk AI
systems must support automatic, tamper-evident recording of events. The
regulation does not specify a format — but tamper-evident hash-chaining
is the implementation that simultaneously satisfies Article 12, Article
15 (accuracy and robustness), ISO/IEC 42001, and SOC 2 PI1.2.

OWASP's Top 10 for Agentic AI Applications (December 2025) classifies
memory and context poisoning (ASI06) as a top risk. Aevum's consent
enforcement addresses this structurally — a poisoned entry cannot be
written without a valid consent grant for the actor, subject, and
purpose. The barrier fires at the kernel level before the model sees
the data.

These aren't future concerns. The governance questions that kept coming
up in the research are now arriving as engineering requirements with
deadlines attached.

What's in v0.3.0

Five governed functions: ingest, query, review, commit, replay
Five absolute barriers, hardcoded in barriers.py
Ed25519 sigchain + SHA3-256 hash chaining
Cedar in-process policy + OPA HTTP sidecar support
MCP integration via aevum-mcp
Agent autonomy levels L1–L5 (DeepMind taxonomy), enforceable by policy
A2A task format compatibility
280 tests, mypy strict, ruff clean
Apache-2.0, no telemetry, runs fully offline

pip install aevum-core

Documentation: https://aevum.build/?utm_source=devto&utm_medium=post
GitHub: https://github.com/aevum-labs/aevum

What Aevum is not

It is not a memory store — pair it with Mem0, Zep, or your own store.
It is not an observability platform — it exports to OpenTelemetry.
It is not a compliance report generator — it produces the evidence,
your compliance team interprets it.

This is a best current answer, not a final one. The concepts behind the
replay/observability distinction are at
https://aevum.build/concepts/replay-vs-observability/ and the Article 12
implementation guide is at https://aevum.build/concepts/audit-trails/

Feedback welcome — especially from anyone working through the same
governance questions from a different angle.