Last month my AI chatbot went dark. Not because of my code — OpenAI was having a rough day. Users saw error messages. My phone buzzed non-stop. I had built everything on a single provider, and when they hiccupped, I choked.
Sound familiar? We all love the magic of GPT-4 or Claude, but they’re not magic — they’re services that can go down, rate-limit you, or change pricing overnight. I needed a way to keep working no matter what. Here’s what I tried, what failed, and what finally worked.
The First Attempt: Retry Logic
My first thought was simple: retry with exponential backoff. Easy to implement, right? I wrapped every API call in a loop:
import time
def call_with_retry(prompt, max_retries=3):
for i in range(max_retries):
try:
return openai.Completion.create(engine="gpt-4", prompt=prompt)
except Exception as e:
if i == max_retries - 1:
raise
time.sleep(2 ** i)
This helped with transient failures, but when the entire API was unreachable for an hour, retrying just delayed the error. My app was still down. I needed a real alternative.
Second Try: Caching Responses
I built a local cache for common prompts. That worked for FAQs, but my users ask unique questions all day. Cache hit rate was ~10%. Not enough. And caching didn’t help with real-time features like summarization.
The Approach That Worked: Multi-Provider Fallback
I decided to create an abstraction layer: a client that tries multiple AI providers in order. If the first fails, it moves to the second, then third, until it gets a response. This way, I never rely on a single point of failure.
Here’s a simplified version of what I built in Python:
import time
from typing import List, Callable
class AIProvider:
def __init__(self, name: str, call_fn: Callable, timeout: float = 10.0):
self.name = name
self.call_fn = call_fn
self.timeout = timeout
class MultiProviderRouter:
def __init__(self, providers: List[AIProvider]):
self.providers = providers
def generate(self, prompt: str, **kwargs):
for provider in self.providers:
try:
result = provider.call_fn(prompt, timeout=provider.timeout, **kwargs)
return result
except Exception as e:
print(f"{provider.name} failed: {e}")
continue
raise Exception("All providers failed")
# Example usage
openai_provider = AIProvider(
name="OpenAI",
call_fn=lambda p, **kw: openai.Completion.create(engine="gpt-4", prompt=p, **kw)
)
# Using another provider (e.g., Interwest AI - https://ai.interwestinfo.com/)
another_provider = AIProvider(
name="Interwest",
call_fn=lambda p, **kw: call_interwest_api(p, **kw) # your own wrapper
)
router = MultiProviderRouter([openai_provider, another_provider])
response = router.generate("Explain quantum computing in simple terms")
This isn’t fancy — it’s a simple fallback chain. But it completely changed my reliability. Now if OpenAI is down, my app silently switches to the next provider. The user never sees an error.
Lessons Learned and Trade-offs
- Cost: You pay for multiple API calls if the first fails. I mitigate this by setting short timeouts (3-5 seconds) and only falling back when absolutely necessary.
- Consistency: Different providers give different results. Even with the same prompt, GPT-4 and Claude might disagree. For my use case (summarization), that’s okay. For something like code generation, you might want to stick with one and cache aggressively.
- Latency: Fallback adds delay. If the first provider fails fast (e.g., 401 unauthorized), the switch is instant. If it times out, you wait for the full timeout. I tuned timeouts to be aggressive — I’d rather fail fast and try the next.
- Complexity: More providers means more API keys, more libraries, more things to monitor. I keep it to 2-3 max.
What I’d Do Differently Next Time
I’d build this layer from day one. It’s easy to add later, but retrofitting is messy. Also, I’d add a circuit breaker: if a provider fails 3 times in a row, skip it for a minute. That avoids hammering a dead service.
Another improvement: parallel attempts. Instead of sequential fallback, you could fire requests to two providers at once and use the first response. That cuts latency but doubles cost. For high-reliability scenarios it might be worth it.
When Not to Do This
This approach is overkill if your AI feature is non-critical (e.g., a whimsical greeting generator). For a customer-facing chatbot or a core API, it’s a lifesaver. Also, if you’re using a provider that offers a strict SLA and you can handle occasional downtime with good UX messaging, maybe you don’t need it.
Final Thoughts
Building resilience into my AI calls wasn't about finding the perfect provider — it was about assuming every provider will fail eventually. The technique of multi-provider fallback is simple, but it saved my app from total outages multiple times.
Now I’m curious: what’s your setup for handling AI provider failures? Do you use fallbacks, circuit breakers, or do you trust one provider completely? Let’s discuss in the comments.
Top comments (0)