One base_url for GPT, Claude, and Gemini: cutting three SDKs down to one

#llm #api #openai #ai

The first time you add a second LLM provider to a codebase, it feels manageable. By the third, you've got three SDKs, three auth schemes, three slightly different messages shapes, three retry policies, and three places a model deprecation can break you. The "use the best model for each job" advice is correct — but the integration tax is real, and it compounds.

Here's the pattern I keep coming back to: put everything behind one OpenAI-compatible endpoint and stop maintaining three clients.

The actual cost of multi-SDK glue

It's not the happy path that hurts — it's everything around it:

Auth drift. OpenAI wants a Bearer key, Anthropic wants x-api-key + a version header, Google wants its own scheme. Three secrets, three rotation schedules.
Response-shape divergence. choices[0].message.content vs. a content block array vs. candidates[].content.parts[]. Every call site that reads a response needs a per-provider branch.
Streaming differences. SSE framing and event names differ enough that your stream parser grows a switch statement.
Failure-mode sprawl. Rate-limit codes, timeout behavior, and error bodies all differ. Your retry/backoff logic forks per provider.

None of it is hard. All of it is maintenance — the kind that quietly slows a small team down.

The OpenAI-compatible shim

Most providers can be normalized to the OpenAI Chat Completions contract, because the ecosystem already standardized on it. So you point the official OpenAI SDK at a different base_url:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-gateway/v1",
    api_key="YOUR_GATEWAY_KEY",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",          # or claude-*, gemini-*, etc.
    messages=[{"role": "user", "content": "Summarize this PR in two lines."}],
)
print(resp.choices[0].message.content)

Same call site, same response shape, one key. To switch the underlying model you change one string. Your retry logic, your streaming parser, your logging — all written once, against one contract.

What you actually gain

One auth + one secret to rotate, not three.
One response shape at every call site — no per-provider branching.
No lock-in. Because you code against the OpenAI contract (an open de-facto standard), moving off any single provider — or off the gateway itself — is a base_url change, not a rewrite. That cuts both ways, and that's the point.

What you give up (the honest part)

A normalization layer can't be a free lunch:

Provider-specific features get flattened. Anthropic's prompt caching, Google's grounding, OpenAI's structured-output modes — anything that doesn't map cleanly to the common contract is either unavailable or exposed through non-standard fields you'll have to special-case anyway.
You add a hop. One more network segment and one more thing that can be down. For latency-sensitive paths, measure it.
You inherit the shim's coverage gaps. If the gateway hasn't mapped a parameter you need, you're blocked until it does. Pick one whose mapping is transparent and documented.

The trade is: lose access to the long tail of provider-specific knobs, gain a dramatically smaller integration surface. For most app teams — who use maybe 5% of any provider's surface — that's a good deal. For teams leaning hard on one provider's exclusive features, it isn't.

Takeaway

If your codebase has grown a per-provider branch at every LLM call site, collapsing to one OpenAI-compatible base_url removes a whole category of maintenance — at the cost of the provider-specific long tail. I build this into a small gateway called Modelis (one key for GPT/Claude/Gemini, optional auto routing, free tier) at modelishub.com, but the shim pattern works with anything OpenAI-compatible. Curious which provider-specific feature you'd refuse to give up — that's usually the deciding factor.