correctover

Posted on Jun 25

Why Retry Is Not Self-Healing: A Technical Deep Dive for LLM APIs

#llm #architecture #ai #python

Why Retry Is Not Self-Healing: A Technical Deep Dive for LLM APIs

Every LLM API wrapper claims "self-healing." What they actually do is retry the same request or switch to another provider on error.

That's not self-healing. That's hope-driven development.

The Retry Fallacy

Here's what retry solves:

# Retry logic
if response.status_code == 429:  # Rate limited
    wait_and_retry()

Here's what retry doesn't solve:

The response was truncated but returned 200 OK
The response has the right schema but semantically wrong content
The backup provider is also degraded (just slower, not down)
The cost per token just doubled and nobody noticed

Retrying a broken pipe doesn't fix the water. It just sends more water down the same broken pipe.

What Real Self-Healing Looks Like

Self-healing requires three capabilities that retry alone cannot provide:

1. Contract Validation

Before accepting any response, verify it meets your contract:

contract = {
    "status": {"max_errors": 0},           # No HTTP errors
    "schema": {"type": "object", "required": ["answer"]},  # Structure check
    "completeness": {"finish_reason": "stop"},  # No truncation
    "latency": {"max_ms": 5000},            # Performance bound
    "cost": {"max_per_1k_tokens": 0.03},    # Cost ceiling
    "drift": {"max_semantic_delta": 0.15},   # Cross-provider consistency
}

Each dimension is independently configurable. Fail any check = trigger failover.

2. Verified Failover

When a contract violation triggers failover:

# Standard failover (naive)
provider_b_response = call_provider_b(prompt)
return provider_b_response  # Hope for the best

# Verified failover (Correctover)
provider_b_response = call_provider_b(prompt)
if validate_contract(provider_b_response, contract):
    return provider_b_response
else:
    # Try provider C, or fall back to cached valid response
    return next_verified_response(prompt, contract, providers)

You never serve an unverified response to your users.

3. Drift Detection

The same prompt to different providers often returns semantically different results:

Provider	Response	Status	Verdict
OpenAI	"Paris"	200 OK	✅ Correct
Anthropic	"France"	200 OK	⚠️ Drift detected
Google	"Paris, France"	200 OK	✅ Correct

Standard failover would accept all three. Correctover flags the drift and selects the verified response.

The Architecture

Request → [Provider A]
              ↓
         [Contract Validator]
         ↓ ↓ ↓ ↓ ↓ ↓
         Status | Schema | Complete | Latency | Cost | Drift
              ↓
         [PASS] → Return to App
         [FAIL] → [Provider B] → [Contract Validator] → ...

Every response passes through 6 validation checkpoints before reaching your application.

P50 Overhead: 22µs

Contract validation adds 22 microseconds at P50. For context:

A single LLM API call: 500-5000ms
Network round-trip: 1-50ms
Correctover validation: 0.022ms

The validation is 22,000x faster than the API call it's protecting.

BYOK: Your Keys, Your Connection

Correctover never sees your API keys or responses:

You provide your own API keys
Calls go directly from your infrastructure to providers
Correctover validates locally, no proxy involved
Zero token markup, zero data logging

This isn't a gateway. It's a local reliability runtime.

Get Started

from correctover import CorrectoverEngine

engine = CorrectoverEngine.create({
    "providers": [
        {"name": "openai", "api_key": os.environ["OPENAI_API_KEY"], "model": "gpt-4o"},
        {"name": "anthropic", "api_key": os.environ["ANTHROPIC_API_KEY"], "model": "claude-sonnet-4-20250514"},
    ],
    "contract": {
        "max_latency_ms": 5000,
        "require_complete_response": True,
    }
})

result = await engine.chat("Your prompt here")

pip install correctover

Correctover — The Correct Version of Failover

Because failover switches. Correctover verifies.

DEV Community

Why Retry Is Not Self-Healing: A Technical Deep Dive for LLM APIs

Why Retry Is Not Self-Healing: A Technical Deep Dive for LLM APIs

The Retry Fallacy

What Real Self-Healing Looks Like

1. Contract Validation

2. Verified Failover

3. Drift Detection

The Architecture

P50 Overhead: 22µs

BYOK: Your Keys, Your Connection

Get Started

Top comments (0)