DEV Community

TokenHub
TokenHub

Posted on • Originally published at llm-api.vynexcloud.com

Call GPT-5.5, Claude Opus 4.8, and Gemini 3.1 from one OpenAI-compatible endpoint

If you build with LLMs, you probably juggle multiple SDKs, API keys, and bills — one for OpenAI, one for Anthropic, one for Google. And every time you want to A/B-test a different model, you rewrite integration code.

There's a simpler pattern: one OpenAI-compatible endpoint that proxies every frontier model. You keep the official openai SDK, change the base_url, and switch models by changing the model parameter.

The idea

Vynex API exposes a single /v1 endpoint that speaks the OpenAI Chat Completions contract. Behind it: GPT, Claude, Gemini, DeepSeek, Qwen, and GLM. One key, one bill.

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://llm-api.vynexcloud.com/v1",
)

# GPT-5.5
client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)

# Same call, different model — Claude Opus 4.8
client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)

# Gemini 3.1 Pro
client.chat.completions.create(
    model="gemini-3.1-pro-preview",
    messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)
Enter fullscreen mode Exit fullscreen mode

No new client library. No per-provider abstraction layer. The /v1/chat/completions contract is preserved, so streaming, tool calling, and structured output all work the way you expect.

Why this matters in practice

1. A/B-testing models becomes a one-liner. Want to compare GPT-5.5 vs Claude Opus 4.8 on your prompt? Change one string. No separate SDK init, no different auth header format.

2. Fallback chains. If one model is rate-limited or down, fall back to another family without touching your call shape:

models = ["gpt-5.5", "claude-opus-4-8", "gemini-3.1-pro-preview"]
for m in models:
    try:
        return client.chat.completions.create(model=m, messages=msgs)
    except Exception:
        continue
Enter fullscreen mode Exit fullscreen mode

3. Streaming works everywhere:

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=msgs,
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
Enter fullscreen mode Exit fullscreen mode

4. Tool/function calling is identical to OpenAI's schema.

Cost transparency

Per-token pricing (input/output per 1M tokens), no seat fees:

Model Input $/1M Output $/1M
gpt-5.4-mini $0.25 $1.50
deepseek/deepseek-v3.2 $0.23 $0.34
claude-haiku-4-5-20251001 $1.00 $5.00
gpt-5.5 $1.25 $7.50
claude-opus-4-8 $5.00 $25.00

Full list on the pricing page. The cheap open models (deepseek-v3.2 at ~$0.23/M) are useful for high-volume classification and extraction where you don't need a frontier model.

cURL (no SDK needed)

curl https://llm-api.vynexcloud.com/v1/chat/completions \
  -H "Authorization: Bearer $VYNEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
Enter fullscreen mode Exit fullscreen mode

Who is this for

  • Teams that want one integration instead of six
  • Anyone building model-eval or fallback tooling
  • Developers in regions where direct access to some providers is restricted
  • People who want transparent per-token billing without negotiating enterprise contracts

Getting started

  1. Create a key at llm-api.vynexcloud.com/register
  2. Point your OpenAI client at https://llm-api.vynexcloud.com/v1
  3. Call any model by name

SDK examples (Python, Node.js, cURL) are in the public repo.


This is a factual walkthrough of the API I use. Prices are per-token and listed on the pricing page; verify them there before relying on the numbers above.

Top comments (0)