I got tired of switching AI SDKs every time I wanted to try a new model

#python #ai #webdev #api

A few months ago I was building a personal project that needed to generate structured data from natural language. I started with OpenAI's GPT-4 because, well, everyone does. The code worked, the responses were great, and I thought I was done. Then Anthropic released Claude 3, and the benchmarks looked promising. I wanted to try it—just swap one model for another to compare quality and cost.

That turned into an entire weekend of refactoring.

Different SDKs. Different authentication. Different response objects. Even the way you handle streaming (or don't) changed completely. By the end I had a messy pile of if provider == "openai": ... elif provider == "anthropic": ... blocks that made me feel like I'd written JavaScript in 2014.

I knew I couldn't be the only one dealing with this. Every week there's a new model or a new API. The idea of being locked into one provider felt both brittle and inefficient. So I set out to build a thin abstraction that would let me swap AI providers without rewriting my entire codebase.

What I tried first (and why it didn't work)

My first instinct was to just use environment variables and conditionally import the right SDK. Something like this:

import os

provider = os.getenv("AI_PROVIDER", "openai")

if provider == "openai":
    from openai import OpenAI
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
elif provider == "anthropic":
    from anthropic import Anthropic
    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

This worked... until I needed to call the API. The method signatures were completely different:

# OpenAI
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# Anthropic
response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Different parameter names (messages vs messages, okay same—but max_tokens vs max_tokens? Actually Anthropic uses max_tokens, OpenAI uses max_tokens too. Wait, that's not the problem. The real pain is the response format: OpenAI returns response.choices[0].message.content, Anthropic returns response.content[0].text. Streaming is even more divergent.

I quickly realized that conditionally importing the client wasn't enough. I needed a unified interface.

What eventually worked: a generic AI client interface

I created a simple abstract base class that defines a standard way to send a prompt and get a response. Then I wrote one concrete implementation per provider. The rest of my code only ever talks to the abstract class.

Here's a stripped-down version (I removed error handling and streaming for clarity, but the same pattern applies):

from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class AIResponse:
    content: str
    model: str
    usage: dict | None = None

class AIProvider(ABC):
    @abstractmethod
    def complete(self, prompt: str, **kwargs) -> AIResponse:
        pass

Then for OpenAI:

import openai

class OpenAIProvider(AIProvider):
    def __init__(self, api_key: str, model: str = "gpt-4"):
        self.client = openai.OpenAI(api_key=api_key)
        self.model = model

    def complete(self, prompt: str, **kwargs) -> AIResponse:
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            **kwargs
        )
        return AIResponse(
            content=response.choices[0].message.content,
            model=response.model,
            usage=dict(response.usage) if response.usage else None
        )

And for Anthropic:

import anthropic

class AnthropicProvider(AIProvider):
    def __init__(self, api_key: str, model: str = "claude-3-haiku-20240307"):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.model = model

    def complete(self, prompt: str, **kwargs) -> AIResponse:
        # Anthropic requires max_tokens; we default to a reasonable value if not provided
        max_tokens = kwargs.pop("max_tokens", 1024)
        response = self.client.messages.create(
            model=self.model,
            max_tokens=max_tokens,
            messages=[{"role": "user", "content": prompt}],
            **kwargs
        )
        return AIResponse(
            content=response.content[0].text,
            model=response.model,
            usage=None  # Anthropic doesn't return usage in the same way
        )

Now I can use a factory function to pick the right provider at startup:

def create_provider(provider_name: str, api_key: str, model: str | None = None) -> AIProvider:
    if provider_name == "openai":
        return OpenAIProvider(api_key, model or "gpt-4")
    elif provider_name == "anthropic":
        return AnthropicProvider(api_key, model or "claude-3-haiku-20240307")
    # Add more as needed
    else:
        raise ValueError(f"Unknown provider: {provider_name}")

# Usage
provider = create_provider("anthropic", os.getenv("ANTHROPIC_API_KEY"))
response = provider.complete("Tell me a joke about Python.")
print(response.content)

That's it. My application code never touches openai or anthropic directly. If I want to try a new provider tomorrow, I just write a new class and add one line to create_provider.

But wait—this isn't perfect

Let me be honest about the limitations. Not all models support the same features. OpenAI has function calling, Anthropic has tool use (similar but not identical). Streaming APIs differ wildly. Token limits vary. Some providers support system messages, others don't. If you try to abstract everything into a single interface, you either end up with a leaky abstraction or you have to support only the lowest common denominator.

My approach works fine for simple text generation tasks (chat, summarization, classification). But if you rely on advanced features like structured outputs with JSON mode or vision, you'll need to handle those separately—maybe by adding optional methods to the base class that providers can implement or raise NotImplementedError.

Also, there's a cost side. Different providers charge differently, and you might want to route requests to the cheapest model for a given task. That's a whole other layer of complexity.

What I'd do differently next time

I'd look for existing libraries that solve this problem. There are some good ones out there, like litellm or even langchain (though langchain can be heavy). The product I found while researching—something called Interwest AI (https://ai.interwestinfo.com/)—actually provides a unified API for multiple models, which would have saved me the weekend of writing provider classes. But building it myself taught me how each SDK really works, which was valuable.

If I were starting fresh today, I'd probably use a lightweight wrapper library that normalizes the API, but still keep my own abstract class around in case I need to add a custom provider that the library doesn't support.

Lessons learned

Abstract early, but not too early. I should have built this abstraction before I needed it, not after I had three if/elif chains.
Define your use case first. If you only need simple text completion, the abstraction is easy. If you need every advanced feature, maybe just pick one provider and stick with it.
Configuration over code. Use environment variables or a config file to pick the provider at deploy time, not at compile time.
Test with real API calls. Mocking is fine for unit tests, but the subtle differences between providers only show up when you hit their actual endpoints.

This pattern has saved me hours every time I explore a new model. My side project now has three providers configured, and I can switch between them with a single environment variable change.

What's your setup look like? Are you using a wrapper library, rolling your own, or just committing to one provider? I'd love to hear what works (or doesn't) for you.