The first time you add a second LLM provider to a codebase, it feels manageable. By the third, you've got three SDKs, three auth schemes, three slightly different messages shapes, three retry policies, and three places a model deprecation can break you. The "use the best model for each job" advice is correct — but the integration tax is real, and it compounds.
Here's the pattern I keep coming back to: put everything behind one OpenAI-compatible endpoint and stop maintaining three clients.
The actual cost of multi-SDK glue
It's not the happy path that hurts — it's everything around it:
-
Auth drift. OpenAI wants a Bearer key, Anthropic wants
x-api-key+ a version header, Google wants its own scheme. Three secrets, three rotation schedules. -
Response-shape divergence.
choices[0].message.contentvs. acontentblock array vs.candidates[].content.parts[]. Every call site that reads a response needs a per-provider branch. - Streaming differences. SSE framing and event names differ enough that your stream parser grows a switch statement.
- Failure-mode sprawl. Rate-limit codes, timeout behavior, and error bodies all differ. Your retry/backoff logic forks per provider.
None of it is hard. All of it is maintenance — the kind that quietly slows a small team down.
The OpenAI-compatible shim
Most providers can be normalized to the OpenAI Chat Completions contract, because the ecosystem already standardized on it. So you point the official OpenAI SDK at a different base_url:
from openai import OpenAI
client = OpenAI(
base_url="https://your-gateway/v1",
api_key="YOUR_GATEWAY_KEY",
)
resp = client.chat.completions.create(
model="gpt-4o-mini", # or claude-*, gemini-*, etc.
messages=[{"role": "user", "content": "Summarize this PR in two lines."}],
)
print(resp.choices[0].message.content)
Same call site, same response shape, one key. To switch the underlying model you change one string. Your retry logic, your streaming parser, your logging — all written once, against one contract.
What you actually gain
- One auth + one secret to rotate, not three.
- One response shape at every call site — no per-provider branching.
-
No lock-in. Because you code against the OpenAI contract (an open de-facto standard), moving off any single provider — or off the gateway itself — is a
base_urlchange, not a rewrite. That cuts both ways, and that's the point.
What you give up (the honest part)
A normalization layer can't be a free lunch:
- Provider-specific features get flattened. Anthropic's prompt caching, Google's grounding, OpenAI's structured-output modes — anything that doesn't map cleanly to the common contract is either unavailable or exposed through non-standard fields you'll have to special-case anyway.
- You add a hop. One more network segment and one more thing that can be down. For latency-sensitive paths, measure it.
- You inherit the shim's coverage gaps. If the gateway hasn't mapped a parameter you need, you're blocked until it does. Pick one whose mapping is transparent and documented.
The trade is: lose access to the long tail of provider-specific knobs, gain a dramatically smaller integration surface. For most app teams — who use maybe 5% of any provider's surface — that's a good deal. For teams leaning hard on one provider's exclusive features, it isn't.
Takeaway
If your codebase has grown a per-provider branch at every LLM call site, collapsing to one OpenAI-compatible base_url removes a whole category of maintenance — at the cost of the provider-specific long tail. I build this into a small gateway called Modelis (one key for GPT/Claude/Gemini, optional auto routing, free tier) at modelishub.com, but the shim pattern works with anything OpenAI-compatible. Curious which provider-specific feature you'd refuse to give up — that's usually the deciding factor.
Top comments (0)