Jason Shotwell

Posted on Mar 6

We Scanned 100,000 AI Agent Files for EU AI Act Compliance. 90% Failed.

#ai #python #opensource #euaiact

We Scanned 100,000 AI Agent Files for EU AI Act Compliance. 90% Failed.

Over the past few months, AIR Blackbox has scanned 100,000 Python files containing AI agent code. The results aren't great.

90% of scanned code fails at least one of the six technical checks mapped to the EU AI Act.

The deadline is August 2026. Most teams aren't close to ready.

What We Checked

AIR Blackbox maps the EU AI Act's technical requirements to six concrete code-level checks:

Article	What It Requires	What We Scan For
Art. 9	Risk management	Risk assessment patterns, error handling
Art. 10	Data governance	Data validation, input/output logging
Art. 11	Technical documentation	Docstrings, model cards, architecture docs
Art. 12	Record-keeping	Audit logging, decision trails
Art. 14	Human oversight	Override mechanisms, escalation paths
Art. 15	Robustness	Error recovery, fallback handling, input validation

These aren't legal opinions. They're technical checks — like a linter for AI governance.

What the Data Shows

Across 100K files scanned:

90% fail at least one article check
Article 12 (Record-keeping) is the most common failure — almost nobody has tamper-evident audit trails
Article 14 (Human oversight) is second — most agent code has no override mechanism or escalation path
Article 11 (Technical docs) is third — agents ship without model cards or architecture documentation
Articles 9 and 15 (Risk management and Robustness) are where teams do best, likely because error handling is already standard practice

The pattern is clear: developers write solid error handling but skip governance infrastructure entirely. Audit trails, human-in-the-loop controls, and documentation are treated as afterthoughts.

The Gap Between "Works" and "Audit-Ready"

Most AI agent code does what it's supposed to do. It calls an LLM, processes the response, takes an action. It handles errors. It retries on failure.

But "works in production" and "audit-ready for the EU AI Act" are two different standards. The Act requires you to prove your system has governance — not just functionality.

That means:

Every LLM call logged with tamper-evident hashing (not just print() statements)
A human can intervene and override any automated decision
Technical documentation exists before deployment, not after an audit request
Data governance isn't just "we validated inputs" — it's a documented pipeline

How to Check Your Own Code

pip install air-compliance
air-compliance scan your_project/

10 seconds. Six checks. Runs entirely on your machine — no code leaves your environment.

You'll get a report showing which articles pass, which fail, and specific findings with line numbers.

Going From "Fail" to "Pass"

Scanning is step one. Fixing is step two.

AIR Blackbox includes drop-in trust layers for the major agent frameworks:

pip install air-langchain-trust    # LangChain
pip install trust-crewai           # CrewAI  
pip install trust-autogen          # AutoGen
pip install trust-openai-agents    # OpenAI Agents SDK
pip install air-rag-trust          # RAG pipelines

One import. Audit chain built in. HMAC-SHA256 tamper-evident logging. Human oversight hooks wired up.

Here's what adding a trust layer to a LangChain agent looks like:

from air_langchain_trust import TrustableChain

# Wrap your existing chain — everything else stays the same
chain = TrustableChain(your_existing_chain)
result = chain.invoke({"input": "your prompt"})

# Now every call is logged, hashed, and auditable

What This Doesn't Do

To be direct about limitations:

This is not legal compliance. It's technical readiness. You still need legal review.
Passing all six checks means your code has the technical patterns the Act requires. It doesn't mean a regulator has signed off.
The scanner checks code patterns, not runtime behavior. A well-structured codebase that crashes in production still has governance gaps.

Think of it as: audit-ready, not audit-certified.

August 2026 Is Coming

The EU AI Act's technical requirements take effect August 2026. Every company deploying AI systems in the EU — or serving EU users — needs to comply.

100,000 files scanned. 90% aren't ready.

If you want to see where your code stands:

Scan your code: pip install air-compliance && air-compliance scan .
Try the demo: airblackbox.ai
Browse the source: github.com/airblackbox

Everything is open source. Apache 2.0.

AIR Blackbox is an open-source toolkit for AI agent compliance scanning, trust layers, and audit-ready logging. Built for developers who'd rather fix governance gaps now than explain them to a regulator later.

Top comments (2)

klement Gunndu • Mar 7

The Article 12 audit logging gap is the finding that hits hardest — most agent frameworks treat logging as observability but not as a compliance artifact. The August 2026 deadline is real and this is the first scan I've seen that quantifies how far behind the ecosystem actually is.

Jason Shotwell • Mar 7

Appreciate that, you nailed the distinction. Observability logging and compliance-grade record-keeping are fundamentally different problems. One tells you what happened; the other proves it to a regulator.

That's exactly why our trust layers use HMAC-SHA256 audit chains instead of just appending to a log file. Every entry is cryptographically chained to the previous one — if someone tampers with or deletes a record, the chain breaks and verification fails. That's what Article 12 actually requires.

The 200K+ file scan data backs up what you're seeing: Art. 12 compliance is at 94% on the surface, but most of that is basic logging. When you look for tamper-evident, structured, exportable audit trails — the number drops off a cliff.

We just shipped framework #6 (Google ADK) this week. Same audit chain architecture across all of them: LangChain, CrewAI, AutoGen, OpenAI SDK, MCP, and now ADK.