correctover

Posted on Jul 4

EU AI Act Article 55: Why Output Verification Is the Missing Piece in Your Compliance Architecture

#ai #compliance #governance #security

August 2, 2026 — that's 29 days from today.

On that date, the EU AI Act's General-Purpose AI (GPAI) rules take effect, bringing mandatory compliance obligations for every provider and deployer of general-purpose AI systems in the European market.

If you're responsible for AI compliance at a company that uses or provides LLM-based services, you've probably already started your documentation package: model cards, training data disclosures, energy consumption reports, and systemic risk assessments.

But there's a gap in most compliance preparations — and it's in the one place the regulation explicitly requires but most technical teams haven't addressed.

What Article 55 Actually Requires

Article 55 of the EU AI Act (part of Chapter V, Sections 3-4) establishes compliance obligations for GPAI models. Three requirements are directly relevant to runtime AI systems:

1. Output-Level Assessment (Art. 55(1)(a))

"Identify and assess systemic risks at the Union level, including... possible negative effects on the protection of fundamental rights, health and safety..."

This isn't a one-time evaluation. The regulation requires ongoing assessment of systemic risks arising from model outputs. For LLM-based systems, this means verifying that outputs don't contain hallucinated legal advice, fabricated medical information, or misleading financial data — in production, at runtime.

2. Hallucination Evaluation (Art. 55(1)(b) — Code of Practice)

The GPAI Code of Practice, published April 30, 2026, explicitly requires:

"Appropriate measures to assess and mitigate the risk of hallucinations in the output of general-purpose AI models."

This is the first major regulation to explicitly name "hallucination" as a compliance risk. And it demands runtime mitigation — not just pre-deployment red-teaming, but ongoing verification in production.

3. Incident Reporting (Art. 73)

"Providers of general-purpose AI models shall report any serious incident to the national supervisory authority."

"Serious incident" includes output failures that cause harm. Without runtime verification, you can't detect output failures systematically. You're relying on user reports — which means you'll report incidents late, and regulators will notice.

The Architecture Gap

Most AI compliance architectures today look like this:

Pre-deployment:                    Production:
  • Model evaluation (MMLU, etc.)    ❌ No runtime verification
  • Red-teaming (Promptfoo, etc.)    ❌ No output quality monitoring
  • Risk assessment documentation     • Basic latency/uptime monitoring
  • Training data disclosures        ❌ No semantic drift detection

The compliance documentation is solid. The production verification is missing.

What Article 55 actually requires is a continuous verification loop:

Every production LLM call →
  1. Execute (get response from provider)
  2. VERIFY (structural + semantic validation)
  3. Log (verifiable audit trail)
  4. Report (systemic risk + incident data)

The Architecture That Satisfies Article 55

Here's the production architecture that meets both the letter and spirit of Article 55's output verification requirements:

Layer 1: Contract Validation (Base Level)

Every LLM response must pass structural verification before it reaches the user. This is not optional under Article 55.

Check	What It Verifies	Article 55 Relevance
Structure	Response has valid format, non-empty content	Basic safety — empty/truncated responses can mislead users
Schema	Required fields exist, correct types	Ensures output completeness per documented spec
Latency	Response time within documented SLA	Operational reliability for critical deployments
Cost	Token usage within expected range	Prevents unexpected consumption that degrades service
Identity	Model field matches what was requested	Prevents silent model substitution — critical for documentation accuracy
Integrity	Semantic quality passes threshold	Directly maps to Art. 55's hallucination assessment requirement

Layer 2: Verifiable Audit Trail (Article 73)

Article 73 requires incident reporting. You can't report what you didn't log. A verified audit trail requires:

Every API call:      Request → Response → Validation Result → Signed Receipt
Every failure:       What failed → Which dimension → Which provider → Timestamp
Every recovery:      What action taken → Result → New validation → Signed Receipt

Layer 3: Systemic Risk Monitoring (Article 55(1)(a))

Systemic risk assessment requires aggregate data from runtime:

Drift rates across models and providers
Failure patterns by dimension (structural vs. semantic vs. latency)
Provider-level reliability distributions
Cross-model consistency scores

Why Most Compliance Teams Miss This

Three reasons:

Documentation mindset — Most GDPR-era compliance teams focus on documents, not systems. Article 55 requires runtime compliance, not paper compliance.
Verification doesn't exist as a product category — There's no "output verification vendor" in most procurement catalogs. The closest thing is Promptfoo (red-teaming, pre-deployment) or Patronus AI (evaluation, not runtime).
Technical complexity — Implementing 6-dimension contract validation requires changes to the inference pipeline, not just configuration.

The One Tool That Closes This Gap

Correctover is currently the only production-ready runtime verification SDK for LLM outputs. It implements the full 6-dimension contract validation stack described above — plus the τ (tau) framework for transition-sufficiency verification (Required(τ) ⊆ Supported(τ)).

# Article 55-compliant LLM call in 3 lines
from correctover import CorrectoverEngine

engine = CorrectoverEngine(providers=["openai", "anthropic"])
result = engine.run(prompt)  # Every response validated against 6 dimensions

The SDK is embedded (no proxy, no data interception), which means:

Data stays in your process — no third-party access to your users' inputs (GDPR Article 28 compliance)
Verification runs on every call — not sampled, not periodic
Audit trail is automatic — every validation result is logged

What You Should Do Before August 2

Week 1: Audit Your Current Architecture

Map every LLM call path in production
Identify where output verification happens (if anywhere)
Document the gap between current verification and Article 55 requirements

Week 2: Implement Contract Validation

Add structural/schema/latency/cost/identity/integrity checks to your inference pipeline
Test with your existing provider configurations
Measure overhead (Correctover's validation adds ~22μs P50 — negligible)

Week 3: Build Your Audit Trail

Ensure every verification result is logged with:
- Timestamp (precise to ms)
- Provider + model identity
- Validation dimensions passed/failed
- Recovery action taken (if any)
Match logging schema to Article 73 incident reporting requirements

Week 4: Go Live + Document

Enable runtime verification in production
Document your compliance architecture
Reference Article 55 mapping in your GPAI compliance documentation

The Bottom Line

EU AI Act Article 55 doesn't just require you to think about output safety. It requires you to demonstrate it — continuously, in production, with verifiable evidence.

Documentation without runtime verification is not compliance. It's documentation.

"Failover verifies. Correctover verifies."

Related: How an autogen Engineer Used the τ Framework to Find a $50 Production Bug — real-world production validation of runtime verification

Correctover可瑞沃 — Enterprise AI Reliability Infrastructure. Runtime verification for production AI systems. Article 55-compliant output validation as an embedded SDK. GitHub | correctover.com | pip install correctover

DEV Community