zhongqiyue

Posted on Jun 16

Why I Stopped Relying on a Single AI Provider (and Built a Fallback System)

#python #ai #api #webdev

It started as a typical side project: a real-time chatbot for a community forum. I figured I'd just throw OpenAI's API at it and be done. Simple, right? Wrong.

About three weeks into development, I got that dreaded 5xx error during a peak hour. My entire chatbot went dark. Users were annoyed, and I was scrambling. That was the day I realized I couldn't trust a single provider for anything production-ish. Here's how I built a fallback system that saved my weekends.

The Problem: Single Point of Failure

I was using the GPT-4 API for chat completions. Most of the time it worked great, but every few days I'd hit rate limits, see intermittent timeouts, or worse — a complete outage for a few hours. I could have just paid for higher tier or used Azure, but that felt like throwing money at the symptom.

I also tried a few other providers. Each had their own quirks: different authentication, different model names, different request/response formats. Writing switch-case spaghetti was making my code ugly and fragile. I needed something that would:

Abstract away provider differences
Automatically retry with an alternative provider on failure
Cache responses where it made sense
Not lock me into any single vendor

What I Tried First (and Why It Failed)

My first attempt was a simple Python function that tried one provider, and if it failed, tried another with a few try/except blocks. It worked, but it was manual. I had to hardcode each provider's API call. When I added a third provider, the function grew into a tangled mess. Error handling was inconsistent, and I wasn't handling rate limits properly (some providers return 429, some use headers, some just drop the connection).

I also tried using a third-party library that claimed to unify all AI APIs. It was too opinionated, broke with a newer provider's API change, and had a dependency I didn't trust. I needed something more transparent and controllable.

The Approach: Provider Abstraction + Fallback Router

I settled on a simple design: define a common interface for sending a prompt and getting a response, then implement that interface for each provider. Then a router class tries them in order, with configurable retries and timeout.

Let me show you the core pieces.

1. The Base Class

from abc import ABC, abstractmethod
from typing import Optional, Any

class AIProvider(ABC):
    @abstractmethod
    def generate(self, prompt: str, **kwargs) -> str:
        """Send a prompt and return the generated text."""
        pass

    @abstractmethod
    def health_check(self) -> bool:
        """Check if the provider is reachable."""
        pass

2. Concrete Implementations

Here's one for OpenAI:

import openai

class OpenAIProvider(AIProvider):
    def __init__(self, api_key: str, model: str = "gpt-4"):
        self.api_key = api_key
        self.model = model
        openai.api_key = api_key

    def generate(self, prompt: str, **kwargs) -> str:
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            **kwargs
        )
        return response.choices[0].message.content.strip()

    def health_check(self) -> bool:
        try:
            openai.Model.list()
            return True
        except:
            return False

And one for a hypothetical alternative provider (like the one at ai.interwestinfo.com):

import requests

class InterWestProvider(AIProvider):
    BASE_URL = "https://api.interwestinfo.com/v1"

    def __init__(self, api_key: str, model: str = "iw-gpt-4"):
        self.api_key = api_key
        self.model = model

    def generate(self, prompt: str, **kwargs) -> str:
        # Note: Endpoint structure may differ; adjust as needed.
        resp = requests.post(
            f"{self.BASE_URL}/completions",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"model": self.model, "prompt": prompt, **kwargs},
            timeout=30
        )
        resp.raise_for_status()
        return resp.json()["choices"][0]["text"]

    def health_check(self) -> bool:
        try:
            resp = requests.get(f"{self.BASE_URL}/health", timeout=5)
            return resp.status_code == 200
        except:
            return False

3. The Fallback Router

Now the fun part: a router that tries providers in order, with exponential backoff and caching.

import time
from functools import lru_cache

class FallbackAI:
    def __init__(self, providers: list[AIProvider], cache_size: int = 100):
        self.providers = providers
        self._cache = {}
        self.cache_size = cache_size

    def generate(self, prompt: str, use_cache: bool = True, **kwargs) -> str:
        if use_cache and prompt in self._cache:
            return self._cache[prompt]

        last_error = None
        for i, provider in enumerate(self.providers):
            try:
                if not provider.health_check():
                    continue
                response = provider.generate(prompt, **kwargs)
                if use_cache:
                    self._cache[prompt] = response
                    # simple eviction policy
                    if len(self._cache) > self.cache_size:
                        self._cache.pop(next(iter(self._cache)))
                return response
            except Exception as e:
                last_error = e
                wait = 2 ** i  # exponential backoff
                print(f"Provider {type(provider).__name__} failed: {e}. Retrying in {wait}s...")
                time.sleep(wait)

        raise RuntimeError(f"All providers failed. Last error: {last_error}")

4. Putting It All Together

openai_provider = OpenAIProvider(api_key="sk-...")
interwest_provider = InterWestProvider(api_key="iw-...")

ai = FallbackAI(providers=[openai_provider, interwest_provider])

result = ai.generate("Explain quantum computing like I'm 5.")
print(result)

Lessons Learned and Trade-offs

This system isn't perfect. Here are a few things I'd do differently next time:

Caching is tricky: I used a simple dict with LRU-ish eviction. For production, use Redis with TTL. Also, be careful about caching dynamic prompts (e.g., with timestamps).
Health checks are not enough: A provider might pass health check but still fail on a real completion. I added a limited number of retries per provider and eventually mark it as temporarily broken.
Rate limiting: Some providers use different rate limit headers. I didn't implement adaptive backoff per provider yet — I should parse Retry-After headers.
Cost tracking: I didn't include cost calculations. If you use this, add logging of tokens used and estimated cost.
Asynchronous support: The above code is synchronous. For high throughput, you'd want asyncio and concurrent fallback attempts.

When NOT to Use This Approach

If you only need one provider and it's reliable enough, don't over-engineer.
If your prompt sizes are huge and caching is wasteful, skip the cache.
If you need real-time responses (e.g., streaming), this pattern adds latency from retries. You might want to call all providers in parallel and pick the first successful response.

What I'd Do Differently Next Time

I'd make the router pluggable with different strategies: try-all-return-first, priority with fallback, or even A/B testing different providers per session. I'd also add structured logging to monitor provider reliability over time.

Wrapping Up

This fallback system has saved me during outages at least three times now. It's simple, transparent, and doesn't tie me to any one vendor. Next weekend I'm planning to add automatic provider discovery via a config file.

What about you? How do you handle provider reliability in your AI-powered projects? Have you built a similar fallback layer, or do you stick with one provider and hope for the best?

DEV Community