DEV Community

t49qnsx7qt-kpanks
t49qnsx7qt-kpanks

Posted on

what eu ai act article 12 actually requires from your agent logs (and why most stacks fail it)

what eu ai act article 12 actually requires from your agent logs (and why most stacks fail it)

the eu ai act enforcement date is august 2. that's 66 days from now. most teams building on agent infrastructure have read the compliance guides and moved on. the problem is reading a guide isn't the same as passing an audit.

article 12(2) is the logging requirement everyone summarizes as "keep audit logs." but the actual text is more specific: systems must "technically allow automatic recording of events" across three categories — situations where the system might present risk, data for post-market monitoring, and data for operational monitoring. tamper-resistant, over the system's lifetime.

the word "automatic" is doing a lot of work in that sentence.


what "automatic" actually means

most observability stacks today are opt-in. you add a logger call, you instrument a span, you pipe to splunk. nothing wrong with that for debugging. but opt-in logging doesn't satisfy article 12(2) for a high-risk ai system because the regulation requires the system to log events it might present risk — not events the developer remembered to instrument.

the difference:

# what most teams ship
logger.info("agent completed task", task_id=task.id, result=result)

# what article 12(2) requires
# automatic recording at every decision point that could present risk
# timestamped, tamper-resistant, bound to the agent identity
# survives agent restarts, retries, and multi-hop chains
Enter fullscreen mode Exit fullscreen mode

the second pattern requires logging infrastructure that's baked into the runtime, not bolted on after. you can't retrofit it before august 2.


the three categories teams miss

help net security's breakdown of article 12(2)'s three categories is accurate, but the implementation implications aren't obvious until you're building toward them.

category 1 — situations presenting risk: this isn't "log errors." it means capturing decision rationale at points where the agent had discretion. for a buying agent: why did it approve a $4,000 vendor invoice? for a data-retrieval agent: why did it query this record and not that one? the log has to answer the auditor's "why" — not just the developer's "what."

category 2 — post-market monitoring data: your agent is in production. a regulatory event occurs six months from now. the log from that day needs to be intact, unmodified, and attributable to the specific model version and configuration running at that time. that's not syslog. that's provenance.

category 3 — operational monitoring data: this one most teams already have covered because it maps to normal observability (latency, error rates, throughput). the trap is treating it as the only category and calling the logging "done."


what tamper-resistant means in practice

tamper-resistant is the hardest requirement to satisfy on a deadline. it's not the same as "write to s3" or "pipe to a siem." tamper-resistant means an auditor can verify, without trusting you, that a log entry hasn't been modified since it was written.

the standard approach is a hash chain: each log entry includes the hash of the previous entry, so modifying any entry invalidates all subsequent hashes. this is the same mechanism that makes blockchains hard to alter — but you don't need a blockchain to implement it.

gridstamp ships this as a fleet-level stamping primitive. every agent decision gets a sha-256 stamp, chained to the prior event, written to an append-only log with 3ms p99 latency under fleet simulation at 14.55M operations. 221 tests, production-ready.

the point isn't to pitch the product — the point is that this is a solved problem. the implementation exists. the question is whether your stack has it before august 2.


the audit trail your compliance team will actually ask for

when an auditor arrives (or when your enterprise customer's legal team runs a pre-procurement review), the question isn't "do you have logs." the question is: "show me every decision this agent made on date X, in context, with the model version, the input, the output, and evidence that the log hasn't been modified."

that requires four things to be true simultaneously:

  1. agent identity is bound to the log — not just a session id, but the specific model, version, and configuration
  2. decision rationale is captured at the event level — not reconstructable from adjacent logs, but explicit
  3. the chain is unbroken — no gaps, no restarts that reset the counter
  4. the log is immutable after write — append-only, hash-chained, not deletable by the application layer

most teams building agents today have 1 and maybe 3. categories 2 and 4 are where audit reviews fail.


the practical path before august 2

if you're deploying agents in the eu and need to hit article 12(2) compliance, the sequence that's worked:

  1. audit what you're currently logging — list every log.info / span / event your agent emits. map it to article 12(2)'s three categories. find the gaps.
  2. add decision-point logging at the runtime level — not in application code. in the layer that wraps every llm call or tool invocation. this is the only way to catch risk-presenting decisions you didn't anticipate.
  3. implement hash chaining before you go to production — retrofitting tamper-resistance to an existing log stream is painful. build it in at the start.
  4. test with a simulated audit — have someone ask "show me every decision agent X made on day Y that touched financial data." if you can't answer in under 10 minutes from the logs alone, you're not compliant.

the bizsuite ai-audit delivers a 48-hour compliance gap assessment against article 12(2) requirements. $997 flat. you get a written report of what your stack covers, what it misses, and a remediation path with specific implementation steps. https://getbizsuite.com/ai-audit

if you're 66 days out and haven't done that gap assessment, the time to do it is now — not in july.

Top comments (0)