DEV Community

chenxiao5580-cmd
chenxiao5580-cmd

Posted on

One base_url for GPT, Claude, and Gemini: cutting three SDKs down to one

The first time you add a second LLM provider to a codebase, it feels manageable. By the third, you've got three SDKs, three auth schemes, three slightly different messages shapes, three retry policies, and three places a model deprecation can break you. The "use the best model for each job" advice is correct — but the integration tax is real, and it compounds.

Here's the pattern I keep coming back to: put everything behind one OpenAI-compatible endpoint and stop maintaining three clients.

The actual cost of multi-SDK glue

It's not the happy path that hurts — it's everything around it:

  • Auth drift. OpenAI wants a Bearer key, Anthropic wants x-api-key + a version header, Google wants its own scheme. Three secrets, three rotation schedules.
  • Response-shape divergence. choices[0].message.content vs. a content block array vs. candidates[].content.parts[]. Every call site that reads a response needs a per-provider branch.
  • Streaming differences. SSE framing and event names differ enough that your stream parser grows a switch statement.
  • Failure-mode sprawl. Rate-limit codes, timeout behavior, and error bodies all differ. Your retry/backoff logic forks per provider.

None of it is hard. All of it is maintenance — the kind that quietly slows a small team down.

The OpenAI-compatible shim

Most providers can be normalized to the OpenAI Chat Completions contract, because the ecosystem already standardized on it. So you point the official OpenAI SDK at a different base_url:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-gateway/v1",
    api_key="YOUR_GATEWAY_KEY",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",          # or claude-*, gemini-*, etc.
    messages=[{"role": "user", "content": "Summarize this PR in two lines."}],
)
print(resp.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Same call site, same response shape, one key. To switch the underlying model you change one string. Your retry logic, your streaming parser, your logging — all written once, against one contract.

What you actually gain

  • One auth + one secret to rotate, not three.
  • One response shape at every call site — no per-provider branching.
  • No lock-in. Because you code against the OpenAI contract (an open de-facto standard), moving off any single provider — or off the gateway itself — is a base_url change, not a rewrite. That cuts both ways, and that's the point.

What you give up (the honest part)

A normalization layer can't be a free lunch:

  • Provider-specific features get flattened. Anthropic's prompt caching, Google's grounding, OpenAI's structured-output modes — anything that doesn't map cleanly to the common contract is either unavailable or exposed through non-standard fields you'll have to special-case anyway.
  • You add a hop. One more network segment and one more thing that can be down. For latency-sensitive paths, measure it.
  • You inherit the shim's coverage gaps. If the gateway hasn't mapped a parameter you need, you're blocked until it does. Pick one whose mapping is transparent and documented.

The trade is: lose access to the long tail of provider-specific knobs, gain a dramatically smaller integration surface. For most app teams — who use maybe 5% of any provider's surface — that's a good deal. For teams leaning hard on one provider's exclusive features, it isn't.

Takeaway

If your codebase has grown a per-provider branch at every LLM call site, collapsing to one OpenAI-compatible base_url removes a whole category of maintenance — at the cost of the provider-specific long tail. I build this into a small gateway called Modelis (one key for GPT/Claude/Gemini, optional auto routing, free tier) at modelishub.com, but the shim pattern works with anything OpenAI-compatible. Curious which provider-specific feature you'd refuse to give up — that's usually the deciding factor.

Top comments (0)