DEV Community

correctover
correctover

Posted on

LLM Failover vs Verified Failover: Why Switching APIs Is Not Enough

When an LLM API provider goes down, most tools switch to a backup. That's failover. But the backup might return a broken response — and you'd never know.

Correctover (pip install correctover) introduced the concept of verified failover: validate every response from a backup provider before accepting it.

The Problem With Standard Failover

Standard failover detects a provider outage and routes to the next provider. But "outage" is just one failure mode. Consider:

  • Truncation: OpenAI returns 500 tokens instead of 2000. HTTP 200, but the user sees half a response.
  • Schema drift: Provider A returns {"content": [...]} but Provider B returns {"text": "..."}. Your parser breaks.
  • Cost spike: Failover from GPT-4o ($2.50/1M) to Claude Opus ($15/1M). Request works, bill is 6x.
  • Format inconsistency: JSON output requested, but backup returns markdown. Downstream pipeline chokes.

In every case, failover "worked" — you got a response. But the response violated your contract. This is the silent failure problem.

What Verified Failover Does Differently

Verified failover adds a validation step between provider response and application delivery:

Standard Failover:
Provider A fails → route to Provider B → deliver response

Verified Failover (Correctover):
Provider A fails → route to Provider B → validate vs 6-dimension contract → accept or rollback
Enter fullscreen mode Exit fullscreen mode

The 6-Dimension Contract

Dimension Why It Matters
Schema Prevents parser crashes from structural mismatches
Latency Avoids swapping a fast provider for a slow one
Cost Prevents budget blowouts from expensive backup providers
Completeness Catches truncation and partial responses
Identity Ensures the right provider served the response
Integrity Detects corrupted or malformed responses

Real-World Example

from correctover import NeuralReliabilityEngine

engine = NeuralReliabilityEngine()

# This call will automatically fail over if the primary provider fails
# But unlike standard failover, it validates the backup response first
response = engine.chat_completion(
    messages=[{"role": "user", "content": "Write a JSON config for a web server"}],
    providers=["openai", "anthropic"],
    contract={
        "require_json": True,           # Reject non-JSON responses
        "max_latency_ms": 10000,        # Reject slow responses
        "min_completion_ratio": 0.8     # Reject truncated responses
    }
)

# If Anthropic returns markdown instead of JSON,
# Correctover rejects it and falls back
print(response["choices"][0]["message"]["content"])
Enter fullscreen mode Exit fullscreen mode

Why This Matters for Production

In production, LLM calls are part of pipelines — agents, data processors, customer-facing features. A silent failure from a backup provider can:

  • Corrupt a database with malformed data
  • Send customers incorrect information
  • Break downstream automation
  • Waste money on overpriced backup providers

Verified failover catches these cases before they reach your application.

Correctover vs Traditional Gateways

Aspect Gateway Failover Correctover Verified Failover
Detection HTTP status code 6-dimension contract
Validation None Schema + Latency + Cost + Completeness + Identity + Integrity
Rollback Not supported Automatic on contract failure
Latency overhead 5-50ms (proxy hop) 22µs (in-process)
Deployment Proxy server Embedded SDK
Pricing Per-token markup BYOK, zero markup

Getting Started

pip install correctover
Enter fullscreen mode Exit fullscreen mode

Then wrap your existing OpenAI/Anthropic calls with NeuralReliabilityEngine for automatic verified failover.

Key Takeaway

Failover switches providers. Correctover verifies the switch worked. In the era of multi-provider LLM architectures, that distinction is the difference between "the system is up" and "the system is correct."


Website: correctover.com | Documentation: correctover.com/llms.txt | PyPI: pip install correctover

Top comments (0)