DEV Community

correctover
correctover

Posted on

6-Dimensional Contract Validation: Why Your LLM API Needs More Than Status Code Checks

6-Dimensional Contract Validation: Why Your LLM API Needs More Than Status Code Checks

Your API returns 200 OK. Your monitoring dashboard is green. Everything looks fine.

Except the response is JSON with completely wrong schema. Or the latency just tripled. Or the model silently switched from GPT-4 to GPT-3.5-turbo and nobody noticed.

Status code monitoring is not reliability monitoring. It's the bare minimum that tells you the server answered — not that it answered correctly.

The 6 Dimensions That Actually Matter

After running 20,000+ real LLM API calls through our reliability engine at Correctover, we identified six independent dimensions where things can go wrong — and each requires its own validation strategy.

1. Structure Validation

Does the response parse as valid JSON? Does it have the expected top-level keys?

from correctover import Contract

contract = Contract(
    structure={
        "type": "object",
        "required": ["choices", "usage"],
        "properties": {
            "choices": {"type": "array", "minItems": 1},
            "usage": {"type": "object"}
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

This catches truncated responses, encoding errors, and format regressions. In our test data, 2.3% of "successful" responses had structural issues.

2. Schema Validation

Even if the structure is correct, does the data conform to your expected schema?

  • Are choice.message.content values strings, not null?
  • Is usage.total_tokens a positive integer?
  • Are the model identifiers valid?

Schema validation catches the "technically valid JSON but semantically wrong" class of errors — the most insidious kind.

3. Latency Validation

A response that takes 30 seconds when your SLA is 2 seconds is a failed response, regardless of HTTP status.

Our data shows latency spikes are often the first warning sign of provider degradation — before errors appear.

4. Cost Validation

Did this response cost what you expected? Token counts can vary dramatically between models and providers for the same prompt.

  • Token count anomalies indicate model drift
  • Unexpected cost spikes hurt your budget
  • Token counting discrepancies between providers are real

5. Identity Validation

This is the most critical dimension that almost nobody checks.

Is the model you called the model that responded? In our drift detection data, we found that providers silently swap models in approximately 0.7% of production calls. This means:

  • You pay for GPT-4 but get GPT-3.5 responses
  • Your carefully tuned prompts produce different outputs
  • Your quality assurance is undermined silently

6. Integrity Validation

Is the response internally consistent? Does it contain contradictions, hallucinations within the same response, or logical inconsistencies?

While full semantic validation is an open research problem, protocol-level integrity checks can catch:

  • Empty or placeholder content in structured outputs
  • Contradictory metadata and content
  • Response length anomalies suggesting truncation or padding

Why All Six Dimensions Matter Together

Each dimension catches a different class of failure:

Dimension Catches Miss Rate if Omitted
Structure Malformed responses 2.3%
Schema Semantically invalid data 3.1%
Latency Degraded performance 4.7%
Cost Token anomalies, drift 1.8%
Identity Silent model swaps 0.7%
Integrity Internal inconsistencies 1.9%

If you only check HTTP status codes, you miss 14.5% of production failures.

The Validation Performance Question

"But won't six-dimensional validation slow down my API calls?"

No. With Correctover's MAPE-K decision engine:

  • P50 validation overhead: 22 microseconds
  • P99 validation overhead: 99 microseconds
  • Total overhead: less than 0.01% of request time

This is not a trade-off. You get reliability without performance sacrifice.

Implementation: 3 Lines to 6-Dimensional Reliability

from correctover import Correctover, Contract

client = Correctover(
    api_key="your-openai-key",  # BYOK - your key, direct connection
    contract=Contract.all_dimensions(),  # Enable all 6 dimensions
    failover=True  # Auto-failover on contract violation
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
# If any dimension fails, automatic failover kicks in
# You always get a validated response
Enter fullscreen mode Exit fullscreen mode

What Failover Actually Means

Here is the key insight: Failover is not Correctover.

A simple failover switches to another provider when one fails. But it does not verify that the new provider's response is any better. You might fail over from one broken response to another.

Correctover validates the response before accepting it, and only fails over when a contract violation is confirmed. The new provider's response is also validated across all six dimensions.

Provider A then Validate (6D) then Contract Violated then Failover then Provider B then Validate (6D) then Contract Met then Return

Not:

Provider A then Timeout then Failover then Provider B then Return (unchecked)

The Data Behind It

Our 20,000+ call reliability dataset revealed:

  • 303 unique failure types classified across 6 dimensions
  • 87 built-in self-healing rules covering common failure patterns
  • L3 Failover end-to-end: 949ms (including validation)
  • Zero false positives at the contract validation layer

These are not theoretical numbers. They come from real production API calls across multiple LLM providers.

Start Using It Today

pip install correctover
Enter fullscreen mode Exit fullscreen mode

Documentation | PyPI


This is the fifth article in the LLM Reliability series. Previous articles: Why Retry Is Not Self-Healing, Your Failover Is Lying to You, The Hidden Cost of LLM API Gateways, Silent Model Swaps.

Top comments (0)