I Built an OpenAI-Compatible Gateway to 40+ AI Models (DeepSeek, MiniMax, Claude)

#ai #api #llm #showdev

The Problem

I was paying for 5+ different AI subscriptions: OpenAI, Anthropic, Google, etc. Each with separate API keys, billing dashboards, and SDK quirks.

When DeepSeek-V3 dropped at ~$0.28 per million output tokens (vs GPT-4o at $10), I wanted to switch — but the friction of changing SDKs across multiple projects was a pain.

So I built TokenHub — an OpenAI-compatible gateway that routes to 40+ AI models with a single API key.

How It Works

It's a drop-in replacement for the OpenAI SDK. Just change base_url and api_key:

from openai import OpenAI

client = OpenAI(
    api_key="your-tokenhub-key",
    base_url="https://jiatoken.com/v1"
)

# Use any of 40+ models — DeepSeek, MiniMax, Claude, GPT, Gemini, Llama, etc.
response = client.chat.completions.create(
    model="deepseek-v3",
    messages=[{"role": "user", "content": "Explain async/await in Python"}]
)

print(response.choices[0].message.content)

That's it. The same code works with:

gpt-4o
claude-sonnet-4-6
gemini-2.5-pro
deepseek-v3 / deepseek-r1
minimax-text-01
llama-3.3-70b
...and more

Real Pricing Comparison

Per million tokens (input / output):

Model	Provider	Input	Output
GPT-4o	OpenAI	$2.50	$10.00
GPT-4o mini	OpenAI	$0.15	$0.60
DeepSeek-V3	TokenHub	$0.07	$0.28
DeepSeek-R1	TokenHub	$0.14	$0.55
MiniMax-Text-01	TokenHub	$0.10	$0.40

For high-volume workloads (RAG, agents, batch summarization), DeepSeek-V3 is ~35x cheaper than GPT-4o for output tokens.

When to Use Which Model

A quick mental model from my own usage:

Cheap & good enough → DeepSeek-V3 (most general tasks)
Reasoning → DeepSeek-R1 (CoT-style tasks)
Long context → MiniMax-Text-01 (200K+ tokens)
Frontier capability → GPT-4o or Claude (still worth it for hard problems)
Code → Claude Sonnet 4.6 or DeepSeek-V3

The win is being able to A/B test across models without rewriting code.

Why I Open-Sourced the Routing Logic

(Note: TokenHub itself is hosted, but the routing pattern is straightforward.)

The hardest part wasn't the proxy — it was:

Normalizing function-calling formats across providers
Handling streaming differences (SSE format quirks)
Token counting for accurate billing pre-request

If you're building something similar, the OpenAI spec is the de facto standard. Most providers either match it or have OpenAI-compatible endpoints already.