An open-source scanner just tested AI agent code against the EU AI Act — 97% failed
When an open-source EU AI Act scanner ran against real agent code, 97% of projects came back non-compliant. That number showed up in a Show HN thread this week and the comments went quiet in a particular way — not disbelief, but recognition.
The August 2 deadline is 62 days out. Most teams building agents in production have been assuming compliance is a documentation problem. The scanner result says it's a code problem.
What the scanner actually checks
The tool parses agent code against the structural requirements in Annex III and Articles 12–15 of the Act. The specific gaps it flags most often:
No automatic event logging. Article 12 requires that high-risk AI systems generate logs automatically — not as a developer-added wrapper, not as an optional observability flag, but as a first-class output of the system. Most agent frameworks treat logging as a side concern. The scanner identifies this by checking whether logging is baked into the agent execution path or bolted on externally.
No human oversight hook. Article 14 requires that authorized persons can intervene, override, or stop the system. An agent that runs a payment, sends an email, or modifies a record without any programmatic intervention point fails this check. "We'd just restart the pod" is not an intervention mechanism that survives an audit.
No immutable retention guarantee. Logs must be retained for a minimum of six months with tamper resistance. Writing to a database table where rows can be deleted, or to a log aggregator with a default 30-day retention, doesn't satisfy this. The scanner checks whether the codebase has any configuration or code path that makes log deletion possible.
Undocumented tool call parameters. When an agent calls an external API, the Act requires the call, its arguments, and the response to be captured in the audit trail. This matters because the regulator's question after an incident isn't "what did the agent do" — it's "what did the agent have access to when it decided what to do."
Why 97% makes sense
If you've spent time in agent codebases, this number doesn't feel wrong. Most agent infrastructure was designed to solve a delivery problem — get the agent to produce the right output. Audit trail, governance hooks, and structured logging are compliance requirements, and most teams haven't been under compliance pressure until now.
The shift that August 2 represents isn't technical. The Act has been public since 2024. What changes on August 2 is enforcement. The European Commission's powers to investigate, issue fines, and require system changes activate on that date for GPAI model providers, with Annex III obligations for high-risk deployers already live.
Six percent of worldwide annual revenue is the maximum fine. For an agent startup doing $5M in revenue, that's $300K. For an enterprise running agent workflows at scale, the exposure is material.
What compliant looks like in code
Three changes close most of what the scanner flags:
1. Log at the execution layer, not the application layer. Put your logging in the framework or runner, not in each agent's task logic. That way it's automatic by default and not dependent on every developer remembering to call log.record().
2. Gate every external action through a policy evaluation step. Before the agent calls an API, modifies a record, or initiates a payment, there should be a step that evaluates whether this action is authorized. That step's inputs, rule applied, and result go into the audit log. This is also where human-approval gates live for high-risk actions.
3. Write to append-only storage with a six-month TTL floor. The simplest implementation is S3 with Object Lock in Compliance mode and a six-month retention policy. The key is that deletion should require a deliberate, logged, authorized action — not just a DELETE call.
None of this requires rebuilding your agent framework. It requires treating the execution context, the policy evaluation result, and the output as a first-class record, not an afterthought.
The 62-day math
Teams that started this work in Q1 are in reasonable shape. Teams that haven't started are looking at roughly two months to:
- Audit every agent they have in production against Annex III
- Identify which qualify as high-risk
- Implement structured logging at the execution layer
- Configure tamper-resistant retention
- Document the system architecture for the conformity assessment
That's tight for a single engineering team running other priorities. It's impossible if you're also discovering the requirements from scratch.
The BizSuite AI-Audit is a 2-hour working call that surfaces exactly the gaps the scanner finds, with a prioritized remediation plan delivered in 48 hours. The wedge is $997. If you're already in the 97% and need to move fast, that's the fastest way to know what you're actually dealing with before you start patching code.
The scanner found the problem. The question is what you do with 62 days left.
— jeremiah
https://getbizsuite.com/ai-audit.html
Top comments (0)