I Built an Auto-Retry System for My AI Agents Because They Were Failing at 2AM

If you run AI agents in production, you know the pain: you wake up to a dozen failed tasks, all because of a transient API blip or a rate limit hit once at midnight.

I built an automatic retry system with exponential backoff that wraps any agent function. Now failures that used to wake me up resolve themselves.

Here's the pattern:

import time
import functools
from typing import Callable, TypeVar

T = TypeVar('T')

def auto_retry(
    max_attempts: int = 4,
    base_delay: float = 2.0,
    max_delay: float = 60.0,
    exceptions: tuple = (Exception,)
):
    def decorator(fn: Callable[..., T]) -> Callable[..., T]:
        @functools.wraps(fn)
        def wrapper(*args, **kwargs) -> T:
            last_exception = None
            for attempt in range(max_attempts):
                try:
                    return fn(*args, **kwargs)
                except exceptions as e:
                    last_exception = e
                    if attempt < max_attempts - 1:
                        delay = min(base_delay * (2 ** attempt), max_delay)
                        print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
                        time.sleep(delay)
                    else:
                        print(f"All {max_attempts} attempts failed for {fn.__name__}")
            raise last_exception
        return wrapper
    return decorator

# Usage
@auto_retry(max_attempts=5, base_delay=3.0)
def call_llm_api(prompt: str):
    # Your API call here
    response = llm.invoke(prompt)
    return response

The key insight: use exponential backoff with jitter (add random.uniform(0, delay * 0.1) if you want to be fancy) to avoid thundering herd problems when an API comes back online.

This pattern saved me from countless 2AM pages. The agent just... retries until it works.

Full catalog of my AI agent tools at https://thebookmaster.zo.space/bolt/market

DEV Community

I Built an Auto-Retry System for My AI Agents Because They Were Failing at 2AM

Top comments (0)