How to Build AI API Fallback Logic (Never Fail on Model Errors)

#ai #api #llm #programming

Your AI feature is live. Suddenly, your primary model starts failing.

❌ Rate limited
❌ Timeout
❌ 500 error

What happens to your users?

The solution: Fallback logic — automatically switch to a backup model when the primary fails.

The Problem

# What happens when this fails?
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": user_input}]
)
# If this times out, your app crashes 💥

The Solution: Fallback Chain
With AIBridge, you can define a fallback chain — try Model A, if it fails, try Model B, then Model C.

from openai import OpenAI
import tenacity

client = OpenAI(
    api_key="mb_your_key",
    base_url="https://aibridge-api.com/v1"
)

# Define fallback chain
FALLBACK_CHAIN = [
    "deepseek-v4-pro",    # Primary: Best quality
    "qwen3-235b-a22b",   # Fallback 1: Still high quality
    "glm-4-plus",        # Fallback 2: Reliable alternative
    "deepseek-v4-flash"  # Fallback 3: Fast & cheap
]

def call_with_fallback(messages, fallback_chain=FALLBACK_CHAIN):
    """Try models in order until one succeeds."""
    last_error = None

    for model in fallback_chain:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                timeout=10  # Prevent hanging
            )
            print(f"✅ Success with {model}")
            return response

        except Exception as e:
            print(f"❌ {model} failed: {e}")
            last_error = e
            continue  # Try next model

    # All models failed
    raise last_error

# Usage
response = call_with_fallback([
    {"role": "user", "content": "Explain quantum computing"}
])

Advanced: Smart Retry with Exponential Backoff
For production, add retry logic with exponential backoff:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10)
)
def call_ai_with_retry(model, messages):
    return client.chat.completions.create(
        model=model,
        messages=messages,
        timeout=10
    )

# Now automatically retries on failure
response = call_ai_with_retry("deepseek-v4-pro", messages)

AIBridge Advantage
With direct APIs, fallback means:
❌ Multiple API clients
❌ Multiple base URLs
❌ Different error formats

With AIBridge:
✅ One client
✅ One base URL
✅ Same error format for all models
✅ Switch models instantly

Production Checklist
✅ Fallback chain (primary + 2 backups)
✅ Retry logic (exponential backoff)
✅ Timeout handling (prevent hanging)
✅ Error logging (know which model failed)
✅ Graceful degradation (return cached response if all models fail)

Try it: https://aibridge-api.com

5M free tokens. Build resilient AI features. 🛡️

DEV Community

How to Build AI API Fallback Logic (Never Fail on Model Errors)

The Problem

Top comments (0)