DEV Community

Daniel Dong
Daniel Dong

Posted on

How to Build AI API Fallback Logic (Never Fail on Model Errors)

Your AI feature is live. Suddenly, your primary model starts failing.

❌ Rate limited
❌ Timeout
❌ 500 error

What happens to your users?

The solution: Fallback logic — automatically switch to a backup model when the primary fails.


The Problem

# What happens when this fails?
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": user_input}]
)
# If this times out, your app crashes 💥
Enter fullscreen mode Exit fullscreen mode

The Solution: Fallback Chain
With AIBridge, you can define a fallback chain — try Model A, if it fails, try Model B, then Model C.

from openai import OpenAI
import tenacity

client = OpenAI(
    api_key="mb_your_key",
    base_url="https://aibridge-api.com/v1"
)

# Define fallback chain
FALLBACK_CHAIN = [
    "deepseek-v4-pro",    # Primary: Best quality
    "qwen3-235b-a22b",   # Fallback 1: Still high quality
    "glm-4-plus",        # Fallback 2: Reliable alternative
    "deepseek-v4-flash"  # Fallback 3: Fast & cheap
]

def call_with_fallback(messages, fallback_chain=FALLBACK_CHAIN):
    """Try models in order until one succeeds."""
    last_error = None

    for model in fallback_chain:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                timeout=10  # Prevent hanging
            )
            print(f"✅ Success with {model}")
            return response

        except Exception as e:
            print(f"❌ {model} failed: {e}")
            last_error = e
            continue  # Try next model

    # All models failed
    raise last_error

# Usage
response = call_with_fallback([
    {"role": "user", "content": "Explain quantum computing"}
])
Enter fullscreen mode Exit fullscreen mode

Advanced: Smart Retry with Exponential Backoff
For production, add retry logic with exponential backoff:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10)
)
def call_ai_with_retry(model, messages):
    return client.chat.completions.create(
        model=model,
        messages=messages,
        timeout=10
    )

# Now automatically retries on failure
response = call_ai_with_retry("deepseek-v4-pro", messages)
Enter fullscreen mode Exit fullscreen mode

AIBridge Advantage
With direct APIs, fallback means:
❌ Multiple API clients
❌ Multiple base URLs
❌ Different error formats

With AIBridge:
✅ One client
✅ One base URL
✅ Same error format for all models
✅ Switch models instantly

Production Checklist
✅ Fallback chain (primary + 2 backups)
✅ Retry logic (exponential backoff)
✅ Timeout handling (prevent hanging)
✅ Error logging (know which model failed)
✅ Graceful degradation (return cached response if all models fail)

Try it: https://aibridge-api.com

5M free tokens. Build resilient AI features. 🛡️

mainpage

models

playground

pricing

Top comments (0)