DEV Community

Cover image for How to Stop Your AI Provider From Holding Your App Hostage
Alan West
Alan West

Posted on

How to Stop Your AI Provider From Holding Your App Hostage

The discourse around who controls AI's future got loud again this week. But while pundits debate trust and governance, I'm staring at a very concrete problem in my codebase: my entire application is hardwired to a single AI provider's API.

If they change pricing tomorrow, deprecate a model, or go down for six hours (again), I'm cooked. And if you've built anything with LLM APIs in the last two years, you probably are too.

Let's fix that.

The Root Cause: Tight Coupling to a Single Provider

Here's what most AI integration code looks like in the wild:

# This is everywhere. This is the problem.
import openai

def summarize(text: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
        temperature=0.3,
        max_tokens=500
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Every function that touches AI is married to one provider's SDK, model names, response shapes, and quirks. When you have 40 of these scattered across your codebase, switching providers isn't a weekend task — it's a rewrite.

The real issue isn't that you picked the wrong provider. It's that you let implementation details leak into your business logic. Classic dependency inversion violation, just wearing a new hat.

Step 1: Define Your Own Interface

The fix starts where it always starts in software — with an abstraction boundary. Define what your app needs from an LLM, not what any specific provider offers.

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Optional

@dataclass
class LLMResponse:
    """Your app's response shape. You own this."""
    content: str
    model: str
    input_tokens: int
    output_tokens: int
    finish_reason: str

@dataclass
class LLMMessage:
    role: str  # "system", "user", "assistant"
    content: str

class LLMProvider(ABC):
    @abstractmethod
    def complete(
        self,
        messages: list[LLMMessage],
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
    ) -> LLMResponse:
        pass

    @abstractmethod
    def is_available(self) -> bool:
        """Health check — useful for fallback logic"""
        pass
Enter fullscreen mode Exit fullscreen mode

Nothing revolutionary. But now your business logic depends on your types, not theirs.

Step 2: Implement Provider Adapters

Each provider becomes a thin adapter that translates between your interface and their SDK.

import openai
import anthropic

class OpenAIProvider(LLMProvider):
    def __init__(self, model: str = "gpt-4o", api_key: str = None):
        self.client = openai.OpenAI(api_key=api_key)
        self.model = model

    def complete(self, messages, temperature=0.7, max_tokens=None):
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": m.role, "content": m.content} for m in messages],
            temperature=temperature,
            max_tokens=max_tokens,
        )
        choice = response.choices[0]
        return LLMResponse(
            content=choice.message.content,
            model=response.model,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
            finish_reason=choice.finish_reason,
        )

    def is_available(self) -> bool:
        try:
            self.client.models.list()
            return True
        except Exception:
            return False


class AnthropicProvider(LLMProvider):
    def __init__(self, model: str = "claude-sonnet-4-6", api_key: str = None):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.model = model

    def complete(self, messages, temperature=0.7, max_tokens=None):
        # Anthropic handles system messages differently
        system = None
        chat_messages = []
        for m in messages:
            if m.role == "system":
                system = m.content
            else:
                chat_messages.append({"role": m.role, "content": m.content})

        kwargs = {"model": self.model, "messages": chat_messages,
                  "temperature": temperature, "max_tokens": max_tokens or 1024}
        if system:
            kwargs["system"] = system

        response = self.client.messages.create(**kwargs)
        return LLMResponse(
            content=response.content[0].text,
            model=response.model,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
            finish_reason=response.stop_reason,
        )

    def is_available(self) -> bool:
        try:
            self.client.messages.create(
                model=self.model, max_tokens=1,
                messages=[{"role": "user", "content": "hi"}]
            )
            return True
        except Exception:
            return False
Enter fullscreen mode Exit fullscreen mode

Notice how each adapter handles provider-specific quirks (like Anthropic's separate system message parameter) without leaking those details upward.

Step 3: Add Fallback Logic

Now that you have a clean interface, building resilience is almost trivial:

import logging

logger = logging.getLogger(__name__)

class FallbackProvider(LLMProvider):
    """Tries providers in order. First healthy one wins."""

    def __init__(self, providers: list[LLMProvider]):
        self.providers = providers

    def complete(self, messages, temperature=0.7, max_tokens=None):
        errors = []
        for provider in self.providers:
            try:
                return provider.complete(messages, temperature, max_tokens)
            except Exception as e:
                logger.warning(f"{provider.__class__.__name__} failed: {e}")
                errors.append((provider.__class__.__name__, e))

        raise RuntimeError(
            f"All providers failed: {[(n, str(e)) for n, e in errors]}"
        )

    def is_available(self) -> bool:
        return any(p.is_available() for p in self.providers)


# Wire it up
llm = FallbackProvider([
    AnthropicProvider(model="claude-sonnet-4-6"),
    OpenAIProvider(model="gpt-4o"),
])
Enter fullscreen mode Exit fullscreen mode

Your business code now calls llm.complete() and genuinely does not care who answers. Provider goes down? Next one picks up. Want to add a local Ollama instance as a last resort? Write a 30-line adapter and append it to the list.

Step 4: Make It Configurable

Hardcoding provider order defeats the purpose. Pull it from config:

import os
import json

PROVIDER_REGISTRY = {
    "openai": OpenAIProvider,
    "anthropic": AnthropicProvider,
}

def build_provider_from_config() -> LLMProvider:
    # LLM_PROVIDERS='[{"name": "anthropic", "model": "claude-sonnet-4-6"}, ...]'
    config = json.loads(os.environ.get("LLM_PROVIDERS", "[]"))
    providers = []
    for entry in config:
        cls = PROVIDER_REGISTRY.get(entry["name"])
        if cls:
            providers.append(cls(model=entry.get("model", cls.__init__.__defaults__[0])))

    if not providers:
        raise ValueError("No LLM providers configured")

    return FallbackProvider(providers) if len(providers) > 1 else providers[0]
Enter fullscreen mode Exit fullscreen mode

Now switching your primary provider is an environment variable change, not a code change. Your deploy pipeline can handle that.

What About Streaming and Tool Use?

Fair question. The basic complete() interface won't cover every use case. You'll probably want to add:

  • stream() for streaming responses (yields chunks)
  • complete_with_tools() for function calling
  • embed() if you're doing RAG

Each one follows the same pattern: define the interface in your terms, then adapt. The tool-calling schemas vary wildly between providers, so that adapter will be the thickest. But it's still better than having provider-specific tool definitions scattered across your app.

Prevention: How to Not End Up Here Again

  • Treat AI APIs like databases. You wouldn't scatter raw SQL across your codebase (I hope). Same principle applies. Keep the integration at the boundary.
  • Pin your adapter tests to real responses. Record actual API responses and use them as fixtures. When a provider changes their response shape, your adapter tests catch it before production does.
  • Track costs per provider. Add token counting to your LLMResponse (we already did) and log it. You'll want this data when negotiating pricing or deciding fallback order.
  • Run local models in dev. Ollama with a small model behind the same interface means your tests don't burn API credits and your dev workflow survives an internet outage.

The Bigger Picture

The conversation about who controls AI's future is important. But as a developer, the most productive thing you can do today isn't debating governance structures — it's making sure your code doesn't have a single point of failure controlled by any one entity.

This pattern took me about a day to implement across a medium-sized project. The fallback logic has already saved me twice during provider outages. And the next time pricing changes or a better model drops, I'll be adding an adapter — not rewriting my app.

That's the kind of trust model I'm comfortable with: trust the interface I control, and let the implementations compete.

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

An often overlooked tactic when architecting provider-agnostic LLM integrations is using agent-based architectures. In our experience, building modular agents allows for dynamic switching between providers without service disruption. This approach not only enhances reliability but also simplifies the integration of fallback logic since each agent can independently handle failover protocols. It's a game-changer for maintaining uptime and flexibility in your app. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)