We Scanned 100,000 AI Agent Files for EU AI Act Compliance. 90% Failed.
Over the past few months, AIR Blackbox has scanned 100,000 Python files containing AI agent code. The results aren't great.
90% of scanned code fails at least one of the six technical checks mapped to the EU AI Act.
The deadline is August 2026. Most teams aren't close to ready.
What We Checked
AIR Blackbox maps the EU AI Act's technical requirements to six concrete code-level checks:
| Article | What It Requires | What We Scan For |
|---|---|---|
| Art. 9 | Risk management | Risk assessment patterns, error handling |
| Art. 10 | Data governance | Data validation, input/output logging |
| Art. 11 | Technical documentation | Docstrings, model cards, architecture docs |
| Art. 12 | Record-keeping | Audit logging, decision trails |
| Art. 14 | Human oversight | Override mechanisms, escalation paths |
| Art. 15 | Robustness | Error recovery, fallback handling, input validation |
These aren't legal opinions. They're technical checks — like a linter for AI governance.
What the Data Shows
Across 100K files scanned:
- 90% fail at least one article check
- Article 12 (Record-keeping) is the most common failure — almost nobody has tamper-evident audit trails
- Article 14 (Human oversight) is second — most agent code has no override mechanism or escalation path
- Article 11 (Technical docs) is third — agents ship without model cards or architecture documentation
- Articles 9 and 15 (Risk management and Robustness) are where teams do best, likely because error handling is already standard practice
The pattern is clear: developers write solid error handling but skip governance infrastructure entirely. Audit trails, human-in-the-loop controls, and documentation are treated as afterthoughts.
The Gap Between "Works" and "Audit-Ready"
Most AI agent code does what it's supposed to do. It calls an LLM, processes the response, takes an action. It handles errors. It retries on failure.
But "works in production" and "audit-ready for the EU AI Act" are two different standards. The Act requires you to prove your system has governance — not just functionality.
That means:
- Every LLM call logged with tamper-evident hashing (not just
print()statements) - A human can intervene and override any automated decision
- Technical documentation exists before deployment, not after an audit request
- Data governance isn't just "we validated inputs" — it's a documented pipeline
How to Check Your Own Code
pip install air-compliance
air-compliance scan your_project/
10 seconds. Six checks. Runs entirely on your machine — no code leaves your environment.
You'll get a report showing which articles pass, which fail, and specific findings with line numbers.
Going From "Fail" to "Pass"
Scanning is step one. Fixing is step two.
AIR Blackbox includes drop-in trust layers for the major agent frameworks:
pip install air-langchain-trust # LangChain
pip install trust-crewai # CrewAI
pip install trust-autogen # AutoGen
pip install trust-openai-agents # OpenAI Agents SDK
pip install air-rag-trust # RAG pipelines
One import. Audit chain built in. HMAC-SHA256 tamper-evident logging. Human oversight hooks wired up.
Here's what adding a trust layer to a LangChain agent looks like:
from air_langchain_trust import TrustableChain
# Wrap your existing chain — everything else stays the same
chain = TrustableChain(your_existing_chain)
result = chain.invoke({"input": "your prompt"})
# Now every call is logged, hashed, and auditable
What This Doesn't Do
To be direct about limitations:
- This is not legal compliance. It's technical readiness. You still need legal review.
- Passing all six checks means your code has the technical patterns the Act requires. It doesn't mean a regulator has signed off.
- The scanner checks code patterns, not runtime behavior. A well-structured codebase that crashes in production still has governance gaps.
Think of it as: audit-ready, not audit-certified.
August 2026 Is Coming
The EU AI Act's technical requirements take effect August 2026. Every company deploying AI systems in the EU — or serving EU users — needs to comply.
100,000 files scanned. 90% aren't ready.
If you want to see where your code stands:
-
Scan your code:
pip install air-compliance && air-compliance scan . - Try the demo: airblackbox.ai
- Browse the source: github.com/airblackbox
Everything is open source. Apache 2.0.
AIR Blackbox is an open-source toolkit for AI agent compliance scanning, trust layers, and audit-ready logging. Built for developers who'd rather fix governance gaps now than explain them to a regulator later.
Top comments (1)
The Article 12 audit logging gap is the finding that hits hardest — most agent frameworks treat logging as observability but not as a compliance artifact. The August 2026 deadline is real and this is the first scan I've seen that quantifies how far behind the ecosystem actually is.