DEV Community

Jason Shotwell
Jason Shotwell

Posted on

We Scanned 100,000 AI Agent Files for EU AI Act Compliance. 90% Failed.

We Scanned 100,000 AI Agent Files for EU AI Act Compliance. 90% Failed.

Over the past few months, AIR Blackbox has scanned 100,000 Python files containing AI agent code. The results aren't great.

90% of scanned code fails at least one of the six technical checks mapped to the EU AI Act.

The deadline is August 2026. Most teams aren't close to ready.

What We Checked

AIR Blackbox maps the EU AI Act's technical requirements to six concrete code-level checks:

Article What It Requires What We Scan For
Art. 9 Risk management Risk assessment patterns, error handling
Art. 10 Data governance Data validation, input/output logging
Art. 11 Technical documentation Docstrings, model cards, architecture docs
Art. 12 Record-keeping Audit logging, decision trails
Art. 14 Human oversight Override mechanisms, escalation paths
Art. 15 Robustness Error recovery, fallback handling, input validation

These aren't legal opinions. They're technical checks — like a linter for AI governance.

What the Data Shows

Across 100K files scanned:

  • 90% fail at least one article check
  • Article 12 (Record-keeping) is the most common failure — almost nobody has tamper-evident audit trails
  • Article 14 (Human oversight) is second — most agent code has no override mechanism or escalation path
  • Article 11 (Technical docs) is third — agents ship without model cards or architecture documentation
  • Articles 9 and 15 (Risk management and Robustness) are where teams do best, likely because error handling is already standard practice

The pattern is clear: developers write solid error handling but skip governance infrastructure entirely. Audit trails, human-in-the-loop controls, and documentation are treated as afterthoughts.

The Gap Between "Works" and "Audit-Ready"

Most AI agent code does what it's supposed to do. It calls an LLM, processes the response, takes an action. It handles errors. It retries on failure.

But "works in production" and "audit-ready for the EU AI Act" are two different standards. The Act requires you to prove your system has governance — not just functionality.

That means:

  • Every LLM call logged with tamper-evident hashing (not just print() statements)
  • A human can intervene and override any automated decision
  • Technical documentation exists before deployment, not after an audit request
  • Data governance isn't just "we validated inputs" — it's a documented pipeline

How to Check Your Own Code

pip install air-compliance
air-compliance scan your_project/
Enter fullscreen mode Exit fullscreen mode

10 seconds. Six checks. Runs entirely on your machine — no code leaves your environment.

You'll get a report showing which articles pass, which fail, and specific findings with line numbers.

Going From "Fail" to "Pass"

Scanning is step one. Fixing is step two.

AIR Blackbox includes drop-in trust layers for the major agent frameworks:

pip install air-langchain-trust    # LangChain
pip install trust-crewai           # CrewAI  
pip install trust-autogen          # AutoGen
pip install trust-openai-agents    # OpenAI Agents SDK
pip install air-rag-trust          # RAG pipelines
Enter fullscreen mode Exit fullscreen mode

One import. Audit chain built in. HMAC-SHA256 tamper-evident logging. Human oversight hooks wired up.

Here's what adding a trust layer to a LangChain agent looks like:

from air_langchain_trust import TrustableChain

# Wrap your existing chain — everything else stays the same
chain = TrustableChain(your_existing_chain)
result = chain.invoke({"input": "your prompt"})

# Now every call is logged, hashed, and auditable
Enter fullscreen mode Exit fullscreen mode

What This Doesn't Do

To be direct about limitations:

  • This is not legal compliance. It's technical readiness. You still need legal review.
  • Passing all six checks means your code has the technical patterns the Act requires. It doesn't mean a regulator has signed off.
  • The scanner checks code patterns, not runtime behavior. A well-structured codebase that crashes in production still has governance gaps.

Think of it as: audit-ready, not audit-certified.

August 2026 Is Coming

The EU AI Act's technical requirements take effect August 2026. Every company deploying AI systems in the EU — or serving EU users — needs to comply.

100,000 files scanned. 90% aren't ready.

If you want to see where your code stands:

Everything is open source. Apache 2.0.


AIR Blackbox is an open-source toolkit for AI agent compliance scanning, trust layers, and audit-ready logging. Built for developers who'd rather fix governance gaps now than explain them to a regulator later.

Top comments (1)

Collapse
 
klement_gunndu profile image
klement Gunndu

The Article 12 audit logging gap is the finding that hits hardest — most agent frameworks treat logging as observability but not as a compliance artifact. The August 2026 deadline is real and this is the first scan I've seen that quantifies how far behind the ecosystem actually is.