If you build with LLMs, you probably juggle multiple SDKs, API keys, and bills — one for OpenAI, one for Anthropic, one for Google. And every time you want to A/B-test a different model, you rewrite integration code.
There's a simpler pattern: one OpenAI-compatible endpoint that proxies every frontier model. You keep the official openai SDK, change the base_url, and switch models by changing the model parameter.
The idea
Vynex API exposes a single /v1 endpoint that speaks the OpenAI Chat Completions contract. Behind it: GPT, Claude, Gemini, DeepSeek, Qwen, and GLM. One key, one bill.
from openai import OpenAI
client = OpenAI(
api_key="sk-...",
base_url="https://llm-api.vynexcloud.com/v1",
)
# GPT-5.5
client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)
# Same call, different model — Claude Opus 4.8
client.chat.completions.create(
model="claude-opus-4-8",
messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)
# Gemini 3.1 Pro
client.chat.completions.create(
model="gemini-3.1-pro-preview",
messages=[{"role": "user", "content": "Explain backpressure in 2 lines"}],
)
No new client library. No per-provider abstraction layer. The /v1/chat/completions contract is preserved, so streaming, tool calling, and structured output all work the way you expect.
Why this matters in practice
1. A/B-testing models becomes a one-liner. Want to compare GPT-5.5 vs Claude Opus 4.8 on your prompt? Change one string. No separate SDK init, no different auth header format.
2. Fallback chains. If one model is rate-limited or down, fall back to another family without touching your call shape:
models = ["gpt-5.5", "claude-opus-4-8", "gemini-3.1-pro-preview"]
for m in models:
try:
return client.chat.completions.create(model=m, messages=msgs)
except Exception:
continue
3. Streaming works everywhere:
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=msgs,
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
4. Tool/function calling is identical to OpenAI's schema.
Cost transparency
Per-token pricing (input/output per 1M tokens), no seat fees:
| Model | Input $/1M | Output $/1M |
|---|---|---|
gpt-5.4-mini |
$0.25 | $1.50 |
deepseek/deepseek-v3.2 |
$0.23 | $0.34 |
claude-haiku-4-5-20251001 |
$1.00 | $5.00 |
gpt-5.5 |
$1.25 | $7.50 |
claude-opus-4-8 |
$5.00 | $25.00 |
Full list on the pricing page. The cheap open models (deepseek-v3.2 at ~$0.23/M) are useful for high-volume classification and extraction where you don't need a frontier model.
cURL (no SDK needed)
curl https://llm-api.vynexcloud.com/v1/chat/completions \
-H "Authorization: Bearer $VYNEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"messages": [{"role": "user", "content": "Hello"}]
}'
Who is this for
- Teams that want one integration instead of six
- Anyone building model-eval or fallback tooling
- Developers in regions where direct access to some providers is restricted
- People who want transparent per-token billing without negotiating enterprise contracts
Getting started
- Create a key at llm-api.vynexcloud.com/register
- Point your OpenAI client at
https://llm-api.vynexcloud.com/v1 - Call any model by name
SDK examples (Python, Node.js, cURL) are in the public repo.
This is a factual walkthrough of the API I use. Prices are per-token and listed on the pricing page; verify them there before relying on the numbers above.
Top comments (0)