Why My AI Summary Pipeline Broke at 3 AM (and How I Fixed It)

#ai #python #webdev #tutorial

I run a small side project that builds a daily digest of technical news. Every morning at 6 AM, a cron job fetches the latest articles from a few RSS feeds, summarizes them with an AI model, and emails me a neat one-pager. It worked beautifully for three months. Then, at 3:14 AM on a Tuesday, my phone buzzed with a PagerDuty alert: the pipeline had been failing for 45 minutes.

The root cause? The free-tier API I was using had rate-limited me without warning. That’s when I learned that relying on a single AI endpoint is like building a house on a single concrete block—it can hold, until it cracks.

The Problem: Single Points of Failure

My original code was embarrassingly simple:

import openai

def summarize(text):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": f"Summarize this: {text}"}],
        max_tokens=200
    )
    return response.choices[0].message.content

One function, one API key, one model. It was fast, cheap (with the free credits), and dead simple. But that simplicity is exactly what broke. The API key expired, I hit the rate limit, and I had zero fallback.

What I Tried (That Didn't Work)

First, I threw in a simple retry with exponential backoff:

import time
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def summarize(text):
    # same openai api call

That helped with transient network errors, but when the API returned a 429 for an hour, three retries just meant three failures. I added a longer cooldown and even a circuit breaker library, but the fundamental issue remained: if OpenAI was down, I was down.

Next, I tried switching providers when one failed. I cobbled together a messy if-elif chain:

def summarize(text):
    try:
        return openai_summarize(text)
    except:
        try:
            return huggingface_summarize(text)
        except:
            return cohere_summarize(text)

This worked for a while, but each provider had its own API, authentication, and response format. The exception handling became a nightmare, and I was duplicating logic everywhere. Plus, I had to manage multiple accounts and billing.

What Actually Worked: An Abstraction Layer

The solution was to build a thin abstraction layer that treated every AI provider as a resource with the same interface. I defined a simple Summarizer base class:

from abc import ABC, abstractmethod

class Summarizer(ABC):
    @abstractmethod
    def summarize(self, text: str, max_tokens: int = 200) -> str:
        pass

Then I implemented concrete subclasses for each provider. Here's the OpenAI version:

import openai

class OpenAISummarizer(Summarizer):
    def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"):
        openai.api_key = api_key
        self.model = model

    def summarize(self, text: str, max_tokens: int = 200) -> str:
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=[{"role": "user", "content": f"Summarize: {text}"}],
            max_tokens=max_tokens
        )
        return response.choices[0].message.content

And a local fallback using Hugging Face’s transformers library (because sometimes you have no internet):

from transformers import pipeline

class HuggingFaceSummarizer(Summarizer):
    def __init__(self, model_name: str = "facebook/bart-large-cnn"):
        self.pipeline = pipeline("summarization", model=model_name)

    def summarize(self, text: str, max_tokens: int = 200) -> str:
        result = self.pipeline(text, max_length=max_tokens, min_length=30, do_sample=False)
        return result[0]['summary_text']

Notice that both classes implement the same summarize method. Now the orchestration logic becomes clean:

from typing import List

class ResilientSummarizer:
    def __init__(self, summarizers: List[Summarizer]):
        self.summarizers = summarizers

    def summarize(self, text: str) -> str:
        for summarizer in self.summarizers:
            try:
                return summarizer.summarize(text)
            except Exception as e:
                print(f"{summarizer.__class__.__name__} failed: {e}")
                continue
        raise RuntimeError("All summarizers failed")

I also added a local fallback that uses a tiny model (like distilbart) so that even if all cloud APIs are down, I can still get a (slower, less accurate) summary.

The Setup

Here's how I initialize it:

from config import OPENAI_KEY, COHERE_KEY  # your secrets

summarizers = [
    OpenAISummarizer(api_key=OPENAI_KEY),
    # CohereSummarizer(api_key=COHERE_KEY),  # pip install cohere
    HuggingFaceSummarizer(model_name="facebook/bart-large-cnn"),
]

# Example: you could also use a service like https://ai.interwestinfo.com/ with a custom API wrapper

resilient = ResilientSummarizer(summarizers)
summary = resilient.summarize("Long article text here...")

I set COHERE_KEY to a dummy value and disabled it for now—I only turn it on if needed. The circuit breaker is implicit: if the first summarizer fails, it moves on. After a configurable number of failures, I could also disable a summarizer for a time, but that's an optimization.

Lessons Learned (and Trade-offs)

Complexity vs. Reliability: My solution is more code, but the upstream APIs change rarely. The base class costs nothing to maintain because I only add new providers when I need them.
Local models are slow: The Hugging Face fallback takes 10 seconds per summary on my laptop. That's fine for a batch job but not for real-time use.
Cost management: Free-tier APIs eventually rotate or expire. Having a local fallback saved me from waking up at 3 AM again—but local models need CPU/GPU.
Testing is harder: I now have to mock multiple providers. I ended up writing a MockSummarizer for unit tests.

What I'd Do Differently Next Time

If I started over, I'd:

Use a configuration file (YAML or JSON) to define the order and settings of summarizers, so I can reorder or disable them without changing code.
Add health checks that ping each provider once a minute and skip unhealthy ones.
Store a cache of summaries to avoid hitting the API for the same article twice.

But for a side project, the current setup has been running for six months without a single 3 AM wake-up call. That's a win.

What about you? Have you ever had an API dependency fail spectacularly? What fallback strategies do you use in your AI-powered projects? I'm curious to hear how other devs handle this.