DEV Community

TokenHub
TokenHub

Posted on

Swap OpenAI for DeepSeek without rewriting a single line of code

Last month I added Claude to a project that was already using GPT-4o. Two SDKs, two error formats, two retry strategies. By the time I finished I had wrapped both in my own abstraction — a tiny LLM gateway, badly written, that I now had to maintain.

Then I noticed something I should have noticed earlier: most of the new providers expose an OpenAI-compatible endpoint. DeepSeek, Mistral, Together, Fireworks — they all speak the same wire format. You don't need a new SDK. You need a new base_url.

This post is the 5-minute version of that realization, with the tradeoffs I learned the hard way.

The "before" code

Standard OpenAI Python:

from openai import OpenAI

client = OpenAI(api_key="sk-...")

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this PR diff..."}],
)
print(resp.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

The "after" code

from openai import OpenAI

client = OpenAI(
    api_key="th-...",
    base_url="https://jiatoken.com/v1",   # gateway
)

# Same call, different model
resp = client.chat.completions.create(
    model="deepseek-chat",                # DeepSeek-V3
    messages=[{"role": "user", "content": "Summarize this PR diff..."}],
)
Enter fullscreen mode Exit fullscreen mode

That's it. Two lines changed. The rest of your code — streaming handlers, tool calls, retry logic — keeps working because the response shape is identical.

Why this works

The OpenAI Python SDK is just a typed HTTP client. It POSTs JSON to {base_url}/chat/completions. Anything that responds with the same JSON shape is, from the SDK's point of view, OpenAI.

Most gateways take advantage of this:

  • DeepSeek ships its own OpenAI-compatible endpoint at api.deepseek.com/v1. You can point the SDK there directly.
  • Anthropic does not — Claude has its own message format. You need a translator.
  • Gemini has both: a native API and a Vertex-side OpenAI shim.

A multi-model gateway (LiteLLM, OpenRouter, TokenHub, your own) collapses these into one endpoint. One key, one base_url, every model behind it.

What I actually save

For the workload I just migrated (~3M input tokens / 1M output per day, mostly summarization):

Model Input $/1M Output $/1M Daily cost
GPT-4o 2.50 10.00 $17.50
Claude 3.5 3.00 15.00 $24.00
DeepSeek-V3 0.07 0.28 $0.49

DeepSeek isn't a drop-in quality replacement for everything — GPT-4o still wins on instruction following in my evals — but for the 80% of calls that are "summarize this", "extract these fields", "rewrite in tone X", it's fine and ~35× cheaper.

The annoying parts

A few things don't carry over cleanly through OpenAI compatibility:

  1. Tool calling JSON shape. Most providers match it now, but older OSS models return tool calls inside the content string. Always test with your actual prompts before flipping production.
  2. Vision. OpenAI uses image_url parts; some providers want base64. A gateway should normalize this for you — verify before you assume.
  3. Streaming with usage stats. OpenAI added stream_options={"include_usage": True} to get token counts on the final SSE chunk. Not every backend forwards this.
  4. Rate limits. You're now subject to the gateway's RPM, which may be lower than direct provider limits.

When NOT to use a gateway

  • You only ever call one provider. Direct SDK is one less moving part.
  • You need provider-specific features (Anthropic's prompt caching, OpenAI's Realtime API, Gemini's long context). Gateways usually lag behind native features by weeks.
  • You're in a regulated environment that requires data plane control. Most gateways are SaaS.

For everything else — especially side projects and prototypes where the model you "want" changes every two weeks — a gateway pays for itself in saved switching cost.

TL;DR

  client = OpenAI(
      api_key="...",
+     base_url="https://your-gateway/v1",
  )
  client.chat.completions.create(
-     model="gpt-4o",
+     model="deepseek-chat",
      ...
  )
Enter fullscreen mode Exit fullscreen mode

If you want to skip running your own LiteLLM, TokenHub hosts a pre-configured gateway with 40+ models behind one key. Otherwise, LiteLLM self-hosted is the standard answer.

Top comments (0)