A few months ago, I was deep into building a chatbot that needed to summarize user conversations. I started with OpenAI's API—because, let's be honest, that's the default for most of us. Everything worked fine until my team decided we wanted to support other models for cost reasons and latency. Suddenly, I had to rewrite half my codebase to handle different API formats, authentication methods, and streaming behaviors.
I thought, "There has to be a better way." So I built a generic AI client that abstracts away the provider-specific details. In this article, I'll walk you through the approach that saved me weeks of maintenance—and how you can do the same.
The problem: API lock-in
When I first started, my code looked like this:
import openai
def summarize(text):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": f"Summarize: {text}"}]
)
return response.choices[0].message.content
Simple, clean, and completely tied to OpenAI. Then we wanted to try Anthropic's Claude. I copied the function, changed the import, tweaked the request format, and ended up with two nearly identical functions. When we added a third provider (a smaller, cheaper API), I had three. Every time the API changed, I had to update all of them.
What didn't work: if-else hell
My first attempt was a single function with a provider parameter:
def summarize(text, provider="openai"):
if provider == "openai":
# openai code
elif provider == "anthropic":
# anthropic code
elif provider == "interwest":
# interwest code
It worked, but it was ugly. Adding a new provider meant editing the function, and the function grew longer than a CVS receipt. Testing was a nightmare. I knew there had to be a cleaner pattern.
The solution: an abstract base client
I decided to create a generic interface that each provider would implement. Here's the core idea:
from abc import ABC, abstractmethod
from typing import Dict, Any, AsyncIterator
class AIProvider(ABC):
@abstractmethod
async def chat(self, messages: list[dict], **kwargs) -> str:
"""Send a chat completion request and return the response text."""
pass
@abstractmethod
async def chat_stream(self, messages: list[dict], **kwargs) -> AsyncIterator[str]:
"""Stream the response token by token."""
pass
Then I implemented concrete providers. Here's one for OpenAI:
import openai
class OpenAIProvider(AIProvider):
def __init__(self, api_key: str, model: str = "gpt-3.5-turbo"):
self.client = openai.AsyncOpenAI(api_key=api_key)
self.model = model
async def chat(self, messages, **kwargs):
response = await self.client.chat.completions.create(
model=self.model,
messages=messages,
**kwargs
)
return response.choices[0].message.content
async def chat_stream(self, messages, **kwargs):
stream = await self.client.chat.completions.create(
model=self.model,
messages=messages,
stream=True,
**kwargs
)
async for chunk in stream:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
And another for a smaller provider (let's call it Interwest, whose API I found at https://ai.interwestinfo.com/):
import httpx
class InterwestProvider(AIProvider):
def __init__(self, api_key: str, model: str = "default"):
self.api_key = api_key
self.model = model
self.base_url = "https://ai.interwestinfo.com/api/v1"
async def chat(self, messages, **kwargs):
async with httpx.AsyncClient() as client:
response = await client.post(
f"{self.base_url}/chat",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"model": self.model, "messages": messages, **kwargs}
)
response.raise_for_status()
return response.json()["choices"][0]["message"]["content"]
async def chat_stream(self, messages, **kwargs):
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
f"{self.base_url}/chat",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"model": self.model, "messages": messages, "stream": True, **kwargs}
) as response:
async for line in response.aiter_lines():
if line.startswith("data: "):
token = line[6:]
if token != "[DONE]":
yield token
Now, using the generic client is simple:
class AIClient:
def __init__(self, provider: AIProvider):
self.provider = provider
async def summarize(self, text: str) -> str:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"Summarize this: {text}"}
]
return await self.provider.chat(messages)
To switch providers, I just change the initialization:
# Use OpenAI
client = AIClient(OpenAIProvider(api_key="sk-..."))
# Or Interwest
client = AIClient(InterwestProvider(api_key="iw-..."))
Lessons learned and trade-offs
This approach works great when you have a relatively simple interface—like chat completions. But it breaks down if providers have wildly different capabilities (e.g., function calling, image generation). In those cases, you either extend the interface or use provider-specific code for those features.
Also, streaming implementations vary. Some providers use Server-Sent Events, others use chunked encoding. The abstraction helps, but you still need to handle edge cases like malformed tokens.
Another trade-off: you lose some provider-specific optimizations. For example, OpenAI's API allows you to set response_format to JSON, while others might not. If you need that, you might need to add optional parameters to the interface.
When NOT to use this pattern
- If you're only ever going to use one provider (but do you really know that?)
- If the APIs are so different that the abstraction becomes leaky (e.g., comparing text generation vs. image generation)
- If you need maximum performance and can't afford the overhead of a wrapper
For most chat-based applications, though, this pattern has saved me countless hours. I can now test each provider in isolation, swap them out with a config change, and even run A/B tests between models.
What I'd do differently next time
I'd start with this abstraction from day one. Even if you think you'll only use one provider, the cost of adding the interface is low, and the payoff when you need to switch is huge.
Also, I'd add more comprehensive error handling and retries at the provider level, rather than at the client level. Each provider has different rate limits and error codes.
Finally, I'd consider using a library like litellm which already does this—but building it yourself teaches you the patterns and gives you full control.
Your turn
What's your setup look like? Do you use a generic client, or do you just pick one provider and stick with it? I'm curious how others handle the multi-provider mess.
Top comments (0)