DEV Community

Nebula
Nebula

Posted on

How to Add Retry Logic to LLM Calls in 5 Min

Your OpenAI call fails at 2am. Rate limit. Your script crashes, the entire pipeline stops, and the data you processed for the last hour is gone.

If you're wrapping LLM calls in try/except with time.sleep(), you're doing it the hard way. Here's the fix in one decorator.

The Code

import openai
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type, before_sleep_log
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(
    retry=retry_if_exception_type((
        openai.RateLimitError,
        openai.APITimeoutError,
        openai.APIConnectionError,
    )),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    stop=stop_after_attempt(3),
    before_sleep=before_sleep_log(logger, logging.WARNING),
)
def call_llm(prompt: str) -> str:
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content


result = call_llm("Explain retry logic in one sentence.")
print(result)
Enter fullscreen mode Exit fullscreen mode

Install the dependency:

pip install tenacity openai
Enter fullscreen mode Exit fullscreen mode

Run it. If OpenAI returns a rate limit error, the call waits 2 seconds, then 4, then 8 -- and retries up to 3 times. If it still fails after 3 attempts, the exception propagates normally.

How It Works

retry_if_exception_type tells tenacity which errors to retry. We target three specific OpenAI exceptions:

  • RateLimitError (429) -- you hit the token or request limit
  • APITimeoutError -- the request took too long
  • APIConnectionError -- network issues between you and OpenAI

All other errors (like AuthenticationError or BadRequestError) raise immediately. You don't want to retry a bad API key three times.

wait_exponential(multiplier=1, min=2, max=30) sets the backoff schedule. First retry waits 2 seconds, second waits 4 seconds, and it caps at 30 seconds. This is critical for rate limits -- hammering the API with instant retries makes the problem worse.

stop_after_attempt(3) caps the total attempts. Three retries is the sweet spot for most LLM calls. More than that usually means the issue isn't transient.

before_sleep_log logs a warning before each retry so you know exactly when and why retries happen. No silent failures.

What You're Replacing

Here's what most codebases have instead:

# Don't do this
import time

def call_llm_bad(prompt: str) -> str:
    for attempt in range(3):
        try:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": prompt}],
            )
            return response.choices[0].message.content
        except Exception:
            time.sleep(2)
    raise Exception("LLM call failed after 3 attempts")
Enter fullscreen mode Exit fullscreen mode

This catches every exception (including auth errors that will never self-resolve), uses fixed sleep (no backoff), and gives you zero visibility into what failed. The tenacity version is 4 lines shorter and handles all of this correctly.

Quick Customizations

Retry more aggressively for batch jobs:

@retry(
    retry=retry_if_exception_type(openai.RateLimitError),
    wait=wait_exponential(multiplier=2, min=4, max=60),
    stop=stop_after_attempt(5),
)
Enter fullscreen mode Exit fullscreen mode

Add a callback when all retries fail:

from tenacity import retry, RetryError

try:
    result = call_llm("Your prompt here")
except RetryError:
    result = "Fallback: LLM unavailable. Using cached response."
Enter fullscreen mode Exit fullscreen mode

The pattern works with any LLM provider -- swap openai.RateLimitError for the equivalent exception from Anthropic, Google, or your provider's SDK.

Next Steps

Retry logic is one piece of production-ready LLM code. For the full picture of what breaks in production agents and how to fix each failure mode, check out 5 AI Agent Failures in Production.

If you're building agents that chain multiple LLM calls and tool actions, Nebula handles retry and fallback logic automatically for every tool call -- no decorators needed.

Part of the AI Agent Quick Tips series.

Top comments (0)