DEV Community

zhongqiyue
zhongqiyue

Posted on

Why My AI Summary Pipeline Broke at 3 AM (and How I Fixed It)

I run a small side project that builds a daily digest of technical news. Every morning at 6 AM, a cron job fetches the latest articles from a few RSS feeds, summarizes them with an AI model, and emails me a neat one-pager. It worked beautifully for three months. Then, at 3:14 AM on a Tuesday, my phone buzzed with a PagerDuty alert: the pipeline had been failing for 45 minutes.

The root cause? The free-tier API I was using had rate-limited me without warning. That’s when I learned that relying on a single AI endpoint is like building a house on a single concrete block—it can hold, until it cracks.

The Problem: Single Points of Failure

My original code was embarrassingly simple:

import openai

def summarize(text):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": f"Summarize this: {text}"}],
        max_tokens=200
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

One function, one API key, one model. It was fast, cheap (with the free credits), and dead simple. But that simplicity is exactly what broke. The API key expired, I hit the rate limit, and I had zero fallback.

What I Tried (That Didn't Work)

First, I threw in a simple retry with exponential backoff:

import time
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def summarize(text):
    # same openai api call
Enter fullscreen mode Exit fullscreen mode

That helped with transient network errors, but when the API returned a 429 for an hour, three retries just meant three failures. I added a longer cooldown and even a circuit breaker library, but the fundamental issue remained: if OpenAI was down, I was down.

Next, I tried switching providers when one failed. I cobbled together a messy if-elif chain:

def summarize(text):
    try:
        return openai_summarize(text)
    except:
        try:
            return huggingface_summarize(text)
        except:
            return cohere_summarize(text)
Enter fullscreen mode Exit fullscreen mode

This worked for a while, but each provider had its own API, authentication, and response format. The exception handling became a nightmare, and I was duplicating logic everywhere. Plus, I had to manage multiple accounts and billing.

What Actually Worked: An Abstraction Layer

The solution was to build a thin abstraction layer that treated every AI provider as a resource with the same interface. I defined a simple Summarizer base class:

from abc import ABC, abstractmethod

class Summarizer(ABC):
    @abstractmethod
    def summarize(self, text: str, max_tokens: int = 200) -> str:
        pass
Enter fullscreen mode Exit fullscreen mode

Then I implemented concrete subclasses for each provider. Here's the OpenAI version:

import openai

class OpenAISummarizer(Summarizer):
    def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"):
        openai.api_key = api_key
        self.model = model

    def summarize(self, text: str, max_tokens: int = 200) -> str:
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=[{"role": "user", "content": f"Summarize: {text}"}],
            max_tokens=max_tokens
        )
        return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

And a local fallback using Hugging Face’s transformers library (because sometimes you have no internet):

from transformers import pipeline

class HuggingFaceSummarizer(Summarizer):
    def __init__(self, model_name: str = "facebook/bart-large-cnn"):
        self.pipeline = pipeline("summarization", model=model_name)

    def summarize(self, text: str, max_tokens: int = 200) -> str:
        result = self.pipeline(text, max_length=max_tokens, min_length=30, do_sample=False)
        return result[0]['summary_text']
Enter fullscreen mode Exit fullscreen mode

Notice that both classes implement the same summarize method. Now the orchestration logic becomes clean:

from typing import List

class ResilientSummarizer:
    def __init__(self, summarizers: List[Summarizer]):
        self.summarizers = summarizers

    def summarize(self, text: str) -> str:
        for summarizer in self.summarizers:
            try:
                return summarizer.summarize(text)
            except Exception as e:
                print(f"{summarizer.__class__.__name__} failed: {e}")
                continue
        raise RuntimeError("All summarizers failed")
Enter fullscreen mode Exit fullscreen mode

I also added a local fallback that uses a tiny model (like distilbart) so that even if all cloud APIs are down, I can still get a (slower, less accurate) summary.

The Setup

Here's how I initialize it:

from config import OPENAI_KEY, COHERE_KEY  # your secrets

summarizers = [
    OpenAISummarizer(api_key=OPENAI_KEY),
    # CohereSummarizer(api_key=COHERE_KEY),  # pip install cohere
    HuggingFaceSummarizer(model_name="facebook/bart-large-cnn"),
]

# Example: you could also use a service like https://ai.interwestinfo.com/ with a custom API wrapper

resilient = ResilientSummarizer(summarizers)
summary = resilient.summarize("Long article text here...")
Enter fullscreen mode Exit fullscreen mode

I set COHERE_KEY to a dummy value and disabled it for now—I only turn it on if needed. The circuit breaker is implicit: if the first summarizer fails, it moves on. After a configurable number of failures, I could also disable a summarizer for a time, but that's an optimization.

Lessons Learned (and Trade-offs)

  • Complexity vs. Reliability: My solution is more code, but the upstream APIs change rarely. The base class costs nothing to maintain because I only add new providers when I need them.
  • Local models are slow: The Hugging Face fallback takes 10 seconds per summary on my laptop. That's fine for a batch job but not for real-time use.
  • Cost management: Free-tier APIs eventually rotate or expire. Having a local fallback saved me from waking up at 3 AM again—but local models need CPU/GPU.
  • Testing is harder: I now have to mock multiple providers. I ended up writing a MockSummarizer for unit tests.

What I'd Do Differently Next Time

If I started over, I'd:

  • Use a configuration file (YAML or JSON) to define the order and settings of summarizers, so I can reorder or disable them without changing code.
  • Add health checks that ping each provider once a minute and skip unhealthy ones.
  • Store a cache of summaries to avoid hitting the API for the same article twice.

But for a side project, the current setup has been running for six months without a single 3 AM wake-up call. That's a win.


What about you? Have you ever had an API dependency fail spectacularly? What fallback strategies do you use in your AI-powered projects? I'm curious to hear how other devs handle this.

Top comments (0)