I run a small side project that builds a daily digest of technical news. Every morning at 6 AM, a cron job fetches the latest articles from a few RSS feeds, summarizes them with an AI model, and emails me a neat one-pager. It worked beautifully for three months. Then, at 3:14 AM on a Tuesday, my phone buzzed with a PagerDuty alert: the pipeline had been failing for 45 minutes.
The root cause? The free-tier API I was using had rate-limited me without warning. That’s when I learned that relying on a single AI endpoint is like building a house on a single concrete block—it can hold, until it cracks.
The Problem: Single Points of Failure
My original code was embarrassingly simple:
import openai
def summarize(text):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": f"Summarize this: {text}"}],
max_tokens=200
)
return response.choices[0].message.content
One function, one API key, one model. It was fast, cheap (with the free credits), and dead simple. But that simplicity is exactly what broke. The API key expired, I hit the rate limit, and I had zero fallback.
What I Tried (That Didn't Work)
First, I threw in a simple retry with exponential backoff:
import time
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def summarize(text):
# same openai api call
That helped with transient network errors, but when the API returned a 429 for an hour, three retries just meant three failures. I added a longer cooldown and even a circuit breaker library, but the fundamental issue remained: if OpenAI was down, I was down.
Next, I tried switching providers when one failed. I cobbled together a messy if-elif chain:
def summarize(text):
try:
return openai_summarize(text)
except:
try:
return huggingface_summarize(text)
except:
return cohere_summarize(text)
This worked for a while, but each provider had its own API, authentication, and response format. The exception handling became a nightmare, and I was duplicating logic everywhere. Plus, I had to manage multiple accounts and billing.
What Actually Worked: An Abstraction Layer
The solution was to build a thin abstraction layer that treated every AI provider as a resource with the same interface. I defined a simple Summarizer base class:
from abc import ABC, abstractmethod
class Summarizer(ABC):
@abstractmethod
def summarize(self, text: str, max_tokens: int = 200) -> str:
pass
Then I implemented concrete subclasses for each provider. Here's the OpenAI version:
import openai
class OpenAISummarizer(Summarizer):
def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"):
openai.api_key = api_key
self.model = model
def summarize(self, text: str, max_tokens: int = 200) -> str:
response = openai.ChatCompletion.create(
model=self.model,
messages=[{"role": "user", "content": f"Summarize: {text}"}],
max_tokens=max_tokens
)
return response.choices[0].message.content
And a local fallback using Hugging Face’s transformers library (because sometimes you have no internet):
from transformers import pipeline
class HuggingFaceSummarizer(Summarizer):
def __init__(self, model_name: str = "facebook/bart-large-cnn"):
self.pipeline = pipeline("summarization", model=model_name)
def summarize(self, text: str, max_tokens: int = 200) -> str:
result = self.pipeline(text, max_length=max_tokens, min_length=30, do_sample=False)
return result[0]['summary_text']
Notice that both classes implement the same summarize method. Now the orchestration logic becomes clean:
from typing import List
class ResilientSummarizer:
def __init__(self, summarizers: List[Summarizer]):
self.summarizers = summarizers
def summarize(self, text: str) -> str:
for summarizer in self.summarizers:
try:
return summarizer.summarize(text)
except Exception as e:
print(f"{summarizer.__class__.__name__} failed: {e}")
continue
raise RuntimeError("All summarizers failed")
I also added a local fallback that uses a tiny model (like distilbart) so that even if all cloud APIs are down, I can still get a (slower, less accurate) summary.
The Setup
Here's how I initialize it:
from config import OPENAI_KEY, COHERE_KEY # your secrets
summarizers = [
OpenAISummarizer(api_key=OPENAI_KEY),
# CohereSummarizer(api_key=COHERE_KEY), # pip install cohere
HuggingFaceSummarizer(model_name="facebook/bart-large-cnn"),
]
# Example: you could also use a service like https://ai.interwestinfo.com/ with a custom API wrapper
resilient = ResilientSummarizer(summarizers)
summary = resilient.summarize("Long article text here...")
I set COHERE_KEY to a dummy value and disabled it for now—I only turn it on if needed. The circuit breaker is implicit: if the first summarizer fails, it moves on. After a configurable number of failures, I could also disable a summarizer for a time, but that's an optimization.
Lessons Learned (and Trade-offs)
- Complexity vs. Reliability: My solution is more code, but the upstream APIs change rarely. The base class costs nothing to maintain because I only add new providers when I need them.
- Local models are slow: The Hugging Face fallback takes 10 seconds per summary on my laptop. That's fine for a batch job but not for real-time use.
- Cost management: Free-tier APIs eventually rotate or expire. Having a local fallback saved me from waking up at 3 AM again—but local models need CPU/GPU.
-
Testing is harder: I now have to mock multiple providers. I ended up writing a
MockSummarizerfor unit tests.
What I'd Do Differently Next Time
If I started over, I'd:
- Use a configuration file (YAML or JSON) to define the order and settings of summarizers, so I can reorder or disable them without changing code.
- Add health checks that ping each provider once a minute and skip unhealthy ones.
- Store a cache of summaries to avoid hitting the API for the same article twice.
But for a side project, the current setup has been running for six months without a single 3 AM wake-up call. That's a win.
What about you? Have you ever had an API dependency fail spectacularly? What fallback strategies do you use in your AI-powered projects? I'm curious to hear how other devs handle this.
Top comments (0)