The gap EU AI Act Article 12 leaves open

David Julian Rizo Lopez — Thu, 02 Jul 2026 22:13:30 +0000

It's late 2026. An AI system at a European lender rejected a consumer's loan application back in April. The applicant has complained, and a supervisor is now asking a simple question:

Show me what the system decided, on what basis — and prove this record hasn't been altered since it was created.

You reach for your logs. And you find the uncomfortable truth: you have plenty of logs, and none of them answer the question.

The logs you have are the wrong logs

Most AI deployments produce two kinds of logs. Application logs record API calls and latency. Inference logs record the model's input and output tokens. Both are useful for debugging. Neither captures the thing a regulator, an auditor, or a court actually asks about: the decision — who decided, under what authority, on what basis, and what happened next.

EU AI Act Article 12 requires high-risk AI systems to automatically record events over the system's lifetime. Credit scoring, hiring, insurance pricing, and benefits decisions fall under Annex III, and the high-risk obligations are landing across 2026 and 2027. Deployers must retain those logs for at least six months — longer in finance, where they fold into existing record-keeping law.

Here is the gap. Article 12 is precise about the objective and almost silent on the mechanics. It does not specify a format. It does not say who is responsible for integrity. And it does not use the word tamper-evident. But consider what a log is worth as evidence if it can be silently edited after the fact. If the same system that made the decision also writes — and can rewrite — the record of it, then the record is just a story the system tells about itself. Its evidentiary value is close to zero.

A log says "trust me, this is what happened." Evidence says "you don't have to trust me — here's the proof." That difference is the whole problem, and it is under-served.

What a decision record looks like when it's built to be proven

There is a well-understood way to close this gap, borrowed from how the web secures itself (Certificate Transparency, RFC 6962) and how transparency logs prove integrity: keep the records append-only, hash each complete record into a Merkle tree — with the record's position in the log bound inside the hashed leaf, so nothing can be reordered or rewritten silently — and let anyone verify any single record against a small published root, without trusting the party that produced it. Change one byte of one record and the math stops adding up. Tampering becomes visible, automatically.

The Open Decision Standard (ODS) applies exactly this to decisions. Below is the real output of a ~250-line, dependency-free reference demo. It records a credit-scoring rejection, anchors it, proves it — and then tampers with it.

[1] DECISION recorded
    record_id : loan-2026-04-19-0098
    authority : policy_hash 58f7a207e5617d78… (which rules governed)
    rationale : DTI 0.58 exceeds 0.45 ceiling; thin file; 2 recent delinquencies.
    action    : REJECT (score 0.31 < 0.5)
    seq #1  merkle_leaf 5932912a087cba70… (SHA-256 over the canonical stored record)

[2] OUTCOME recorded (linked to the decision, append-only)
    parent_id : loan-2026-04-19-0098
    seq #2  (append-only; its sequence_number fixes its position in the Merkle log)

[3] CHECKPOINT — Merkle root over all records (the anchor a regulator keeps)
    tree_size   : 2
    merkle_root : f930c641be518ffde1b3e56553ac2c05a76ec7a674f162044d2da21b7ed68aab

[4] INCLUSION PROOF for the decision record
    verifies against the checkpoint root: True  <- provable, not asserted

(The hash values differ on each run — records are timestamped — so what reproduces is the behaviour, not the exact digits: the proof verifies.)

Notice what the decision record holds that an inference log never does: the authority the decision ran under (a hash of the governing policy), the rationale, the action, and — appended right after it, linked by parent_id — the outcome. Each record's store-assigned sequence number lives inside the hashed leaf, so its position in the log is part of what gets proven. That is the accountability layer, captured as a record instead of reconstructed from fragments after the fact.

Now the part that matters. Someone edits the stored decision — quietly flips the rejection to an approval:

[5] Now someone quietly edits the stored decision: REJECT -> APPROVE
    leaf recomputed over the edited stored record
    integrity check       : FAIL — edit changed the leaf; proof no longer matches the root
    inclusion proof now   : FAILS against the regulator-held root

There is no separate integrity field to forge: the canonical stored record is the leaf pre-image. The edit is mathematically visible. Anyone holding the checkpoint root — the regulator, the auditor, the applicant's lawyer — can detect it, without trusting the operator and without access to the rest of the records. The record became evidence.

Where this sits

The landscape is moving fast, and ODS is deliberately not trying to own all of it. The IETF and the Linux Foundation are building the agent-identity and delegation-provenance layers — which agent acted, under whose authorization. ODS is the layer above that: the verifiable record of what was decided, on what basis, with what outcome. It is designed to interoperate with that emerging stack, not to replace it. Article 12 told everyone to keep records; it did not tell them how to make those records worth anything when someone finally asks. That is the gap ODS is built to fill.

Run it, then tell me where it's wrong

ODS is an open standard (Apache 2.0). The reference demo is intentionally tiny and has zero dependencies — Python 3.9+, stdlib only — so you can read every line and confirm there is no magic: just canonical records, domain-separated Merkle leaves (RFC 6962), and inclusion proofs applied to the events that carry accountability:

git clone https://github.com/ODS-Foundation/ods-specification
cd ods-specification/examples/quickstart
python orpi_demo.py

The full schema, reference validator, and a conformance suite that certifies implementations against the standard — with honest, scope-qualified verdicts, so it never claims to prove more than it does — live in the same repository.

If you work in AI governance, compliance, RegTech, or you build or buy high-risk AI in the EU: I would value your honest feedback. Run the demo and tell me where the model breaks, what prior art I'm missing, or where the threat model is wrong. That critique is more useful to the standard right now than agreement.

DEV Community: David Julian Rizo Lopez

The gap EU AI Act Article 12 leaves open

The logs you have are the wrong logs

What a decision record looks like when it's built to be proven

Where this sits

Run it, then tell me where it's wrong