We scanned LangChain agents, CrewAI workflows, AutoGen conversations, and RAG pipelines for EU AI Act compliance. Out of the box, none of them pass. Not even close.
The August 2, 2026 deadline for high-risk AI systems is now less than 5 months away. Fines go up to 35 million euros or 7% of global turnover. And yet most Python AI projects have zero compliance infrastructure.
I built AIR Blackbox to fix that. But I also wanted to know: what else is out there? So I dug into every open-source EU AI Act compliance tool I could find and compared them head-to-head.
What the EU AI Act Actually Requires in Your Code
The EU AI Act isn't just paperwork. Articles 9 through 15 impose specific technical requirements on high-risk AI systems:
- Article 9 — Risk management (documented risk assessment processes)
- Article 10 — Data governance (training data quality controls)
- Article 11 — Technical documentation (auditor-verifiable docs)
- Article 12 — Record-keeping (automatic logging of system behavior)
- Article 14 — Human oversight (kill-switch, intervention mechanisms)
- Article 15 — Robustness (accuracy testing, cybersecurity measures)
These translate directly to code: logging, audit trails, bias detection, documentation generation, and human-in-the-loop checkpoints.
The Tools I Compared
I found six open-source projects specifically targeting EU AI Act compliance:
1. AIR Blackbox (what I built)
pip install air-compliance-checker
air-compliance scan .
10 seconds to first scan. 7 PyPI packages. Trust layers for LangChain, CrewAI, AutoGen, OpenAI SDK, and RAG pipelines. Fine-tuned local LLM for contextual analysis. HMAC-SHA256 tamper-evident audit chains.
The architecture is different from everything else: instead of just scanning and reporting, the trust layers are runtime compliance components. They hook into your framework's callback system and create a continuous audit trail as your agents run in production.
Everything runs locally. No API keys. No cloud. Your code never leaves your machine.
GitHub: github.com/airblackbox/gateway
2. Systima Comply
npm install @systima/comply
TypeScript-based CLI with a GitHub Action (systima-ai/comply@v1). Supports 37+ frameworks. Strong CI/CD integration for JavaScript/TypeScript teams.
The gap: no dedicated Python agent framework support, no audit trails, no fine-tuned model. It scans your code but doesn't understand LangChain's callback system or CrewAI's delegation patterns.
3. ArkForge MCP EU AI Act Scanner
MCP server that runs inside Claude Desktop, Cursor, or any MCP-compatible client. Python-native, lightweight, single dependency.
The gap: MCP-only. No CLI, no GitHub Action, no CI/CD integration. Great inside your editor, but it can't run in your deployment pipeline.
4. EuConform
Risk classification and bias detection. 100% offline, GDPR-by-design, WCAG 2.2 AA accessible. Strongest bias testing of any tool here.
The gap: no framework integrations, no audit chains, no documentation generation.
5. COMPL-AI
Evaluation framework for generative AI models (not application code). Benchmarking suites that test models against EU AI Act requirements. Different category — useful for model eval, not code scanning.
6. ARQNXS Compliance Checker
Questionnaire-based assessment. You answer questions, it generates a report. Similar to the EU Commission's own compliance checker. Not a code scanner.
The Comparison Table
| Feature | AIR Blackbox | Systima | ArkForge | EuConform |
|---|---|---|---|---|
| Language | Python | TypeScript | Python | Python |
| CLI scanner | Yes | Yes | No (MCP only) | Yes |
| GitHub Action | Yes | Yes | No | No |
| Framework trust layers | 5 frameworks | None | None | None |
| Fine-tuned LLM | Yes (local) | No | No | No |
| Audit trail | HMAC-SHA256 | No | No | No |
| Runs offline | Yes | Yes | Yes | Yes |
| Bias detection | Yes | No | No | Yes |
| GDPR scanning | Yes | No | No | Partial |
| PyPI packages | 7 | 0 | 0 | 1 |
What I Learned Building This
Three things surprised me:
1. Nobody else does framework-specific compliance. Every scanner does generic code analysis. None of them understand how LangChain callbacks work, how CrewAI agents delegate, or how AutoGen's conversation patterns create compliance gaps. This is the biggest gap in the space.
2. Rule-based scanning isn't enough. Pattern matching catches the obvious stuff — missing logging, no error handlers. But understanding whether your Article 12 implementation actually satisfies the requirement? That takes contextual analysis. That's why we fine-tuned a local LLM on thousands of compliance scenarios.
3. Audit trails matter more than scan results. A scan report says "you passed at this point in time." An HMAC-SHA256 audit chain says "here is cryptographic proof of every compliance check, every agent action, and every human oversight intervention, and it hasn't been tampered with." When an auditor asks for evidence, the second one wins.
Try It
# Install and scan in 10 seconds
pip install air-compliance-checker
air-compliance scan .
# Add a framework trust layer
pip install air-langchain-trust
No configuration. No API keys. No account.
- GitHub: github.com/airblackbox/gateway
- Website: airblackbox.ai
- Demo: airblackbox.ai/demo
- Full comparison: airblackbox.ai/blog/eu-ai-act-compliance-tools-compared
What's Next
We're expanding framework support to Anthropic Agent SDK and Pydantic AI, growing the training dataset for the fine-tuned model, and publishing the HMAC-SHA256 audit chain spec as an open standard.
August 2026 is coming. Your agents need to be ready. Star the repo if this is useful — PRs welcome.
Top comments (0)