Why Retry Is Not Self-Healing: A Technical Deep Dive for LLM APIs
Every LLM API wrapper claims "self-healing." What they actually do is retry the same request or switch to another provider on error.
That's not self-healing. That's hope-driven development.
The Retry Fallacy
Here's what retry solves:
# Retry logic
if response.status_code == 429: # Rate limited
wait_and_retry()
Here's what retry doesn't solve:
- The response was truncated but returned 200 OK
- The response has the right schema but semantically wrong content
- The backup provider is also degraded (just slower, not down)
- The cost per token just doubled and nobody noticed
Retrying a broken pipe doesn't fix the water. It just sends more water down the same broken pipe.
What Real Self-Healing Looks Like
Self-healing requires three capabilities that retry alone cannot provide:
1. Contract Validation
Before accepting any response, verify it meets your contract:
contract = {
"status": {"max_errors": 0}, # No HTTP errors
"schema": {"type": "object", "required": ["answer"]}, # Structure check
"completeness": {"finish_reason": "stop"}, # No truncation
"latency": {"max_ms": 5000}, # Performance bound
"cost": {"max_per_1k_tokens": 0.03}, # Cost ceiling
"drift": {"max_semantic_delta": 0.15}, # Cross-provider consistency
}
Each dimension is independently configurable. Fail any check = trigger failover.
2. Verified Failover
When a contract violation triggers failover:
# Standard failover (naive)
provider_b_response = call_provider_b(prompt)
return provider_b_response # Hope for the best
# Verified failover (Correctover)
provider_b_response = call_provider_b(prompt)
if validate_contract(provider_b_response, contract):
return provider_b_response
else:
# Try provider C, or fall back to cached valid response
return next_verified_response(prompt, contract, providers)
You never serve an unverified response to your users.
3. Drift Detection
The same prompt to different providers often returns semantically different results:
| Provider | Response | Status | Verdict |
|---|---|---|---|
| OpenAI | "Paris" | 200 OK | ✅ Correct |
| Anthropic | "France" | 200 OK | ⚠️ Drift detected |
| "Paris, France" | 200 OK | ✅ Correct |
Standard failover would accept all three. Correctover flags the drift and selects the verified response.
The Architecture
Request → [Provider A]
↓
[Contract Validator]
↓ ↓ ↓ ↓ ↓ ↓
Status | Schema | Complete | Latency | Cost | Drift
↓
[PASS] → Return to App
[FAIL] → [Provider B] → [Contract Validator] → ...
Every response passes through 6 validation checkpoints before reaching your application.
P50 Overhead: 22µs
Contract validation adds 22 microseconds at P50. For context:
- A single LLM API call: 500-5000ms
- Network round-trip: 1-50ms
- Correctover validation: 0.022ms
The validation is 22,000x faster than the API call it's protecting.
BYOK: Your Keys, Your Connection
Correctover never sees your API keys or responses:
- You provide your own API keys
- Calls go directly from your infrastructure to providers
- Correctover validates locally, no proxy involved
- Zero token markup, zero data logging
This isn't a gateway. It's a local reliability runtime.
Get Started
from correctover import CorrectoverEngine
engine = CorrectoverEngine.create({
"providers": [
{"name": "openai", "api_key": os.environ["OPENAI_API_KEY"], "model": "gpt-4o"},
{"name": "anthropic", "api_key": os.environ["ANTHROPIC_API_KEY"], "model": "claude-sonnet-4-20250514"},
],
"contract": {
"max_latency_ms": 5000,
"require_complete_response": True,
}
})
result = await engine.chat("Your prompt here")
pip install correctover
Correctover — The Correct Version of Failover
Because failover switches. Correctover verifies.
Top comments (0)