August 2, 2026 — that's 29 days from today.
On that date, the EU AI Act's General-Purpose AI (GPAI) rules take effect, bringing mandatory compliance obligations for every provider and deployer of general-purpose AI systems in the European market.
If you're responsible for AI compliance at a company that uses or provides LLM-based services, you've probably already started your documentation package: model cards, training data disclosures, energy consumption reports, and systemic risk assessments.
But there's a gap in most compliance preparations — and it's in the one place the regulation explicitly requires but most technical teams haven't addressed.
What Article 55 Actually Requires
Article 55 of the EU AI Act (part of Chapter V, Sections 3-4) establishes compliance obligations for GPAI models. Three requirements are directly relevant to runtime AI systems:
1. Output-Level Assessment (Art. 55(1)(a))
"Identify and assess systemic risks at the Union level, including... possible negative effects on the protection of fundamental rights, health and safety..."
This isn't a one-time evaluation. The regulation requires ongoing assessment of systemic risks arising from model outputs. For LLM-based systems, this means verifying that outputs don't contain hallucinated legal advice, fabricated medical information, or misleading financial data — in production, at runtime.
2. Hallucination Evaluation (Art. 55(1)(b) — Code of Practice)
The GPAI Code of Practice, published April 30, 2026, explicitly requires:
"Appropriate measures to assess and mitigate the risk of hallucinations in the output of general-purpose AI models."
This is the first major regulation to explicitly name "hallucination" as a compliance risk. And it demands runtime mitigation — not just pre-deployment red-teaming, but ongoing verification in production.
3. Incident Reporting (Art. 73)
"Providers of general-purpose AI models shall report any serious incident to the national supervisory authority."
"Serious incident" includes output failures that cause harm. Without runtime verification, you can't detect output failures systematically. You're relying on user reports — which means you'll report incidents late, and regulators will notice.
The Architecture Gap
Most AI compliance architectures today look like this:
Pre-deployment: Production:
• Model evaluation (MMLU, etc.) ❌ No runtime verification
• Red-teaming (Promptfoo, etc.) ❌ No output quality monitoring
• Risk assessment documentation • Basic latency/uptime monitoring
• Training data disclosures ❌ No semantic drift detection
The compliance documentation is solid. The production verification is missing.
What Article 55 actually requires is a continuous verification loop:
Every production LLM call →
1. Execute (get response from provider)
2. VERIFY (structural + semantic validation)
3. Log (verifiable audit trail)
4. Report (systemic risk + incident data)
The Architecture That Satisfies Article 55
Here's the production architecture that meets both the letter and spirit of Article 55's output verification requirements:
Layer 1: Contract Validation (Base Level)
Every LLM response must pass structural verification before it reaches the user. This is not optional under Article 55.
| Check | What It Verifies | Article 55 Relevance |
|---|---|---|
| Structure | Response has valid format, non-empty content | Basic safety — empty/truncated responses can mislead users |
| Schema | Required fields exist, correct types | Ensures output completeness per documented spec |
| Latency | Response time within documented SLA | Operational reliability for critical deployments |
| Cost | Token usage within expected range | Prevents unexpected consumption that degrades service |
| Identity | Model field matches what was requested | Prevents silent model substitution — critical for documentation accuracy |
| Integrity | Semantic quality passes threshold | Directly maps to Art. 55's hallucination assessment requirement |
Layer 2: Verifiable Audit Trail (Article 73)
Article 73 requires incident reporting. You can't report what you didn't log. A verified audit trail requires:
Every API call: Request → Response → Validation Result → Signed Receipt
Every failure: What failed → Which dimension → Which provider → Timestamp
Every recovery: What action taken → Result → New validation → Signed Receipt
Layer 3: Systemic Risk Monitoring (Article 55(1)(a))
Systemic risk assessment requires aggregate data from runtime:
- Drift rates across models and providers
- Failure patterns by dimension (structural vs. semantic vs. latency)
- Provider-level reliability distributions
- Cross-model consistency scores
Why Most Compliance Teams Miss This
Three reasons:
Documentation mindset — Most GDPR-era compliance teams focus on documents, not systems. Article 55 requires runtime compliance, not paper compliance.
Verification doesn't exist as a product category — There's no "output verification vendor" in most procurement catalogs. The closest thing is Promptfoo (red-teaming, pre-deployment) or Patronus AI (evaluation, not runtime).
Technical complexity — Implementing 6-dimension contract validation requires changes to the inference pipeline, not just configuration.
The One Tool That Closes This Gap
Correctover is currently the only production-ready runtime verification SDK for LLM outputs. It implements the full 6-dimension contract validation stack described above — plus the τ (tau) framework for transition-sufficiency verification (Required(τ) ⊆ Supported(τ)).
# Article 55-compliant LLM call in 3 lines
from correctover import CorrectoverEngine
engine = CorrectoverEngine(providers=["openai", "anthropic"])
result = engine.run(prompt) # Every response validated against 6 dimensions
The SDK is embedded (no proxy, no data interception), which means:
- Data stays in your process — no third-party access to your users' inputs (GDPR Article 28 compliance)
- Verification runs on every call — not sampled, not periodic
- Audit trail is automatic — every validation result is logged
What You Should Do Before August 2
Week 1: Audit Your Current Architecture
- Map every LLM call path in production
- Identify where output verification happens (if anywhere)
- Document the gap between current verification and Article 55 requirements
Week 2: Implement Contract Validation
- Add structural/schema/latency/cost/identity/integrity checks to your inference pipeline
- Test with your existing provider configurations
- Measure overhead (Correctover's validation adds ~22μs P50 — negligible)
Week 3: Build Your Audit Trail
- Ensure every verification result is logged with:
- Timestamp (precise to ms)
- Provider + model identity
- Validation dimensions passed/failed
- Recovery action taken (if any)
- Match logging schema to Article 73 incident reporting requirements
Week 4: Go Live + Document
- Enable runtime verification in production
- Document your compliance architecture
- Reference Article 55 mapping in your GPAI compliance documentation
The Bottom Line
EU AI Act Article 55 doesn't just require you to think about output safety. It requires you to demonstrate it — continuously, in production, with verifiable evidence.
Documentation without runtime verification is not compliance. It's documentation.
"Failover verifies. Correctover verifies."
Related: How an autogen Engineer Used the τ Framework to Find a $50 Production Bug — real-world production validation of runtime verification
Correctover可瑞沃 — Enterprise AI Reliability Infrastructure. Runtime verification for production AI systems. Article 55-compliant output validation as an embedded SDK. GitHub | correctover.com | pip install correctover
Top comments (0)