DEV Community

correctover
correctover

Posted on

Why Retry Is Not Self-Healing: A Technical Deep Dive for LLM APIs

Why Retry Is Not Self-Healing: A Technical Deep Dive for LLM APIs

Every LLM API wrapper claims "self-healing." What they actually do is retry the same request or switch to another provider on error.

That's not self-healing. That's hope-driven development.

The Retry Fallacy

Here's what retry solves:

# Retry logic
if response.status_code == 429:  # Rate limited
    wait_and_retry()
Enter fullscreen mode Exit fullscreen mode

Here's what retry doesn't solve:

  • The response was truncated but returned 200 OK
  • The response has the right schema but semantically wrong content
  • The backup provider is also degraded (just slower, not down)
  • The cost per token just doubled and nobody noticed

Retrying a broken pipe doesn't fix the water. It just sends more water down the same broken pipe.

What Real Self-Healing Looks Like

Self-healing requires three capabilities that retry alone cannot provide:

1. Contract Validation

Before accepting any response, verify it meets your contract:

contract = {
    "status": {"max_errors": 0},           # No HTTP errors
    "schema": {"type": "object", "required": ["answer"]},  # Structure check
    "completeness": {"finish_reason": "stop"},  # No truncation
    "latency": {"max_ms": 5000},            # Performance bound
    "cost": {"max_per_1k_tokens": 0.03},    # Cost ceiling
    "drift": {"max_semantic_delta": 0.15},   # Cross-provider consistency
}
Enter fullscreen mode Exit fullscreen mode

Each dimension is independently configurable. Fail any check = trigger failover.

2. Verified Failover

When a contract violation triggers failover:

# Standard failover (naive)
provider_b_response = call_provider_b(prompt)
return provider_b_response  # Hope for the best

# Verified failover (Correctover)
provider_b_response = call_provider_b(prompt)
if validate_contract(provider_b_response, contract):
    return provider_b_response
else:
    # Try provider C, or fall back to cached valid response
    return next_verified_response(prompt, contract, providers)
Enter fullscreen mode Exit fullscreen mode

You never serve an unverified response to your users.

3. Drift Detection

The same prompt to different providers often returns semantically different results:

Provider Response Status Verdict
OpenAI "Paris" 200 OK ✅ Correct
Anthropic "France" 200 OK ⚠️ Drift detected
Google "Paris, France" 200 OK ✅ Correct

Standard failover would accept all three. Correctover flags the drift and selects the verified response.

The Architecture

Request → [Provider A]
              ↓
         [Contract Validator]
         ↓ ↓ ↓ ↓ ↓ ↓
         Status | Schema | Complete | Latency | Cost | Drift
              ↓
         [PASS] → Return to App
         [FAIL] → [Provider B] → [Contract Validator] → ...
Enter fullscreen mode Exit fullscreen mode

Every response passes through 6 validation checkpoints before reaching your application.

P50 Overhead: 22µs

Contract validation adds 22 microseconds at P50. For context:

  • A single LLM API call: 500-5000ms
  • Network round-trip: 1-50ms
  • Correctover validation: 0.022ms

The validation is 22,000x faster than the API call it's protecting.

BYOK: Your Keys, Your Connection

Correctover never sees your API keys or responses:

  • You provide your own API keys
  • Calls go directly from your infrastructure to providers
  • Correctover validates locally, no proxy involved
  • Zero token markup, zero data logging

This isn't a gateway. It's a local reliability runtime.

Get Started

from correctover import CorrectoverEngine

engine = CorrectoverEngine.create({
    "providers": [
        {"name": "openai", "api_key": os.environ["OPENAI_API_KEY"], "model": "gpt-4o"},
        {"name": "anthropic", "api_key": os.environ["ANTHROPIC_API_KEY"], "model": "claude-sonnet-4-20250514"},
    ],
    "contract": {
        "max_latency_ms": 5000,
        "require_complete_response": True,
    }
})

result = await engine.chat("Your prompt here")
Enter fullscreen mode Exit fullscreen mode
pip install correctover
Enter fullscreen mode Exit fullscreen mode

Correctover — The Correct Version of Failover

Because failover switches. Correctover verifies.

Top comments (0)