The discourse around who controls AI's future got loud again this week. But while pundits debate trust and governance, I'm staring at a very concrete problem in my codebase: my entire application is hardwired to a single AI provider's API.
If they change pricing tomorrow, deprecate a model, or go down for six hours (again), I'm cooked. And if you've built anything with LLM APIs in the last two years, you probably are too.
Let's fix that.
The Root Cause: Tight Coupling to a Single Provider
Here's what most AI integration code looks like in the wild:
# This is everywhere. This is the problem.
import openai
def summarize(text: str) -> str:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Summarize: {text}"}],
temperature=0.3,
max_tokens=500
)
return response.choices[0].message.content
Every function that touches AI is married to one provider's SDK, model names, response shapes, and quirks. When you have 40 of these scattered across your codebase, switching providers isn't a weekend task — it's a rewrite.
The real issue isn't that you picked the wrong provider. It's that you let implementation details leak into your business logic. Classic dependency inversion violation, just wearing a new hat.
Step 1: Define Your Own Interface
The fix starts where it always starts in software — with an abstraction boundary. Define what your app needs from an LLM, not what any specific provider offers.
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Optional
@dataclass
class LLMResponse:
"""Your app's response shape. You own this."""
content: str
model: str
input_tokens: int
output_tokens: int
finish_reason: str
@dataclass
class LLMMessage:
role: str # "system", "user", "assistant"
content: str
class LLMProvider(ABC):
@abstractmethod
def complete(
self,
messages: list[LLMMessage],
temperature: float = 0.7,
max_tokens: Optional[int] = None,
) -> LLMResponse:
pass
@abstractmethod
def is_available(self) -> bool:
"""Health check — useful for fallback logic"""
pass
Nothing revolutionary. But now your business logic depends on your types, not theirs.
Step 2: Implement Provider Adapters
Each provider becomes a thin adapter that translates between your interface and their SDK.
import openai
import anthropic
class OpenAIProvider(LLMProvider):
def __init__(self, model: str = "gpt-4o", api_key: str = None):
self.client = openai.OpenAI(api_key=api_key)
self.model = model
def complete(self, messages, temperature=0.7, max_tokens=None):
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": m.role, "content": m.content} for m in messages],
temperature=temperature,
max_tokens=max_tokens,
)
choice = response.choices[0]
return LLMResponse(
content=choice.message.content,
model=response.model,
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
finish_reason=choice.finish_reason,
)
def is_available(self) -> bool:
try:
self.client.models.list()
return True
except Exception:
return False
class AnthropicProvider(LLMProvider):
def __init__(self, model: str = "claude-sonnet-4-6", api_key: str = None):
self.client = anthropic.Anthropic(api_key=api_key)
self.model = model
def complete(self, messages, temperature=0.7, max_tokens=None):
# Anthropic handles system messages differently
system = None
chat_messages = []
for m in messages:
if m.role == "system":
system = m.content
else:
chat_messages.append({"role": m.role, "content": m.content})
kwargs = {"model": self.model, "messages": chat_messages,
"temperature": temperature, "max_tokens": max_tokens or 1024}
if system:
kwargs["system"] = system
response = self.client.messages.create(**kwargs)
return LLMResponse(
content=response.content[0].text,
model=response.model,
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
finish_reason=response.stop_reason,
)
def is_available(self) -> bool:
try:
self.client.messages.create(
model=self.model, max_tokens=1,
messages=[{"role": "user", "content": "hi"}]
)
return True
except Exception:
return False
Notice how each adapter handles provider-specific quirks (like Anthropic's separate system message parameter) without leaking those details upward.
Step 3: Add Fallback Logic
Now that you have a clean interface, building resilience is almost trivial:
import logging
logger = logging.getLogger(__name__)
class FallbackProvider(LLMProvider):
"""Tries providers in order. First healthy one wins."""
def __init__(self, providers: list[LLMProvider]):
self.providers = providers
def complete(self, messages, temperature=0.7, max_tokens=None):
errors = []
for provider in self.providers:
try:
return provider.complete(messages, temperature, max_tokens)
except Exception as e:
logger.warning(f"{provider.__class__.__name__} failed: {e}")
errors.append((provider.__class__.__name__, e))
raise RuntimeError(
f"All providers failed: {[(n, str(e)) for n, e in errors]}"
)
def is_available(self) -> bool:
return any(p.is_available() for p in self.providers)
# Wire it up
llm = FallbackProvider([
AnthropicProvider(model="claude-sonnet-4-6"),
OpenAIProvider(model="gpt-4o"),
])
Your business code now calls llm.complete() and genuinely does not care who answers. Provider goes down? Next one picks up. Want to add a local Ollama instance as a last resort? Write a 30-line adapter and append it to the list.
Step 4: Make It Configurable
Hardcoding provider order defeats the purpose. Pull it from config:
import os
import json
PROVIDER_REGISTRY = {
"openai": OpenAIProvider,
"anthropic": AnthropicProvider,
}
def build_provider_from_config() -> LLMProvider:
# LLM_PROVIDERS='[{"name": "anthropic", "model": "claude-sonnet-4-6"}, ...]'
config = json.loads(os.environ.get("LLM_PROVIDERS", "[]"))
providers = []
for entry in config:
cls = PROVIDER_REGISTRY.get(entry["name"])
if cls:
providers.append(cls(model=entry.get("model", cls.__init__.__defaults__[0])))
if not providers:
raise ValueError("No LLM providers configured")
return FallbackProvider(providers) if len(providers) > 1 else providers[0]
Now switching your primary provider is an environment variable change, not a code change. Your deploy pipeline can handle that.
What About Streaming and Tool Use?
Fair question. The basic complete() interface won't cover every use case. You'll probably want to add:
-
stream()for streaming responses (yields chunks) -
complete_with_tools()for function calling -
embed()if you're doing RAG
Each one follows the same pattern: define the interface in your terms, then adapt. The tool-calling schemas vary wildly between providers, so that adapter will be the thickest. But it's still better than having provider-specific tool definitions scattered across your app.
Prevention: How to Not End Up Here Again
- Treat AI APIs like databases. You wouldn't scatter raw SQL across your codebase (I hope). Same principle applies. Keep the integration at the boundary.
- Pin your adapter tests to real responses. Record actual API responses and use them as fixtures. When a provider changes their response shape, your adapter tests catch it before production does.
-
Track costs per provider. Add token counting to your
LLMResponse(we already did) and log it. You'll want this data when negotiating pricing or deciding fallback order. - Run local models in dev. Ollama with a small model behind the same interface means your tests don't burn API credits and your dev workflow survives an internet outage.
The Bigger Picture
The conversation about who controls AI's future is important. But as a developer, the most productive thing you can do today isn't debating governance structures — it's making sure your code doesn't have a single point of failure controlled by any one entity.
This pattern took me about a day to implement across a medium-sized project. The fallback logic has already saved me twice during provider outages. And the next time pricing changes or a better model drops, I'll be adding an adapter — not rewriting my app.
That's the kind of trust model I'm comfortable with: trust the interface I control, and let the implementations compete.
Top comments (1)
An often overlooked tactic when architecting provider-agnostic LLM integrations is using agent-based architectures. In our experience, building modular agents allows for dynamic switching between providers without service disruption. This approach not only enhances reliability but also simplifies the integration of fallback logic since each agent can independently handle failover protocols. It's a game-changer for maintaining uptime and flexibility in your app. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)