Call GPT-5.5, Claude Opus 4.8, and Gemini 3.1 from one OpenAI-compatible endpoint

#ai #api #openai #llm

If you build with LLMs, you probably juggle multiple SDKs, API keys, and bills — one for OpenAI, one for Anthropic, one for Google. And every time you want to A/B-test a different model, you rewrite integration code.

There's a simpler pattern: one OpenAI-compatible endpoint that proxies every frontier model. You keep the official openai SDK, change the base_url, and switch models by changing the model parameter.

The idea

Vynex API exposes a single /v1 endpoint that speaks the OpenAI Chat Completions contract. Behind it: GPT, Claude, Gemini, DeepSeek, Qwen, and GLM. One key, one bill.

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://llm-api.vynexcloud.com/v1",
)

# GPT-5.5
client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)

# Same call, different model — Claude Opus 4.8
client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)

# Gemini 3.1 Pro
client.chat.completions.create(
    model="gemini-3.1-pro-preview",
    messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)

No new client library. No per-provider abstraction layer. The /v1/chat/completions contract is preserved, so streaming, tool calling, and structured output all work the way you expect.

Why this matters in practice

1. A/B-testing models becomes a one-liner. Want to compare GPT-5.5 vs Claude Opus 4.8 on your prompt? Change one string. No separate SDK init, no different auth header format.

2. Fallback chains. If one model is rate-limited or down, fall back to another family without touching your call shape:

models = ["gpt-5.5", "claude-opus-4-8", "gemini-3.1-pro-preview"]
for m in models:
    try:
        return client.chat.completions.create(model=m, messages=msgs)
    except Exception:
        continue

3. Streaming works everywhere:

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=msgs,
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

4. Tool/function calling is identical to OpenAI's schema.

Cost transparency

Per-token pricing (input/output per 1M tokens), no seat fees:

Model	Input $/1M	Output $/1M
`gpt-5.4-mini`	$0.25	$1.50
`deepseek/deepseek-v3.2`	$0.23	$0.34
`claude-haiku-4-5-20251001`	$1.00	$5.00
`gpt-5.5`	$1.25	$7.50
`claude-opus-4-8`	$5.00	$25.00

Full list on the pricing page. The cheap open models (deepseek-v3.2 at ~$0.23/M) are useful for high-volume classification and extraction where you don't need a frontier model.

cURL (no SDK needed)

curl https://llm-api.vynexcloud.com/v1/chat/completions \
  -H "Authorization: Bearer $VYNEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Who is this for

Teams that want one integration instead of six
Anyone building model-eval or fallback tooling
Developers in regions where direct access to some providers is restricted
People who want transparent per-token billing without negotiating enterprise contracts

Getting started

Create a key at llm-api.vynexcloud.com/register
Point your OpenAI client at https://llm-api.vynexcloud.com/v1
Call any model by name

SDK examples (Python, Node.js, cURL) are in the public repo.

This is a factual walkthrough of the API I use. Prices are per-token and listed on the pricing page; verify them there before relying on the numbers above.