DEV Community

Cover image for Why we built an AI gateway with three native API formats, not just OpenAI-compatible
ChrisL
ChrisL

Posted on

Why we built an AI gateway with three native API formats, not just OpenAI-compatible

If you've worked with multiple LLM providers in the past year,
you've probably reached for a gateway like OpenRouter, LiteLLM,
or Portkey. They solve a real problem: one API key, one bill,
drop-in access to dozens of models.

But almost every gateway in this space shares one design
choice: normalize everything to OpenAI-compatible format.
That's the lingua franca of LLM APIs — pick it as the common
denominator and everyone can use it.

We made a different choice. We built OpenModel with three
native API surfaces in parallel:

POST /v1/responses OpenAI Responses API
POST /v1/messages Anthropic Messages API
POST /v1beta/models/{model}:generate Gemini generateContent

This post is about why we made that call, what it enables,
and what it costs.

The OpenAI-compatible default

When a gateway is OpenAI-compatible only, every request —
no matter what model it's targeting — goes through OpenAI's
request/response shape. If you call Claude, the gateway
translates your OpenAI-style request into Anthropic's shape
upstream, then translates the response back to OpenAI format
for you.

This works fine if you're already using the OpenAI SDK. But
if you've built on Anthropic SDK (Claude Code, agent frameworks
that speak Anthropic Messages API natively) or Google GenAI
SDK, you have two bad options:

  1. Rewrite your code to OpenAI shape (one-time but invasive).
  2. Add another translation layer on top (Anthropic shape → OpenAI shape on the way to gateway → OpenAI shape → Anthropic shape on the way back).

Both lose fidelity on the parts of the protocols that don't
round-trip cleanly:

  • Tool use. Anthropic returns content blocks with explicit tool_use and text types in a structured array. OpenAI returns a flat tool_calls field on the message. The structural information in one shape doesn't survive a translation to the other.
  • Vision. Anthropic and Gemini accept image content blocks inline. OpenAI uses image_url URLs. Round-tripping isn't lossless.
  • Streaming. OpenAI streams flat data: JSON lines. Anthropic uses named SSE events (message_start, content_block_delta, message_stop). Gemini sends chunks with candidates arrays. You can normalize them, but you lose the original event structure that SDKs rely on.

What "three native surfaces" buys you

In OpenModel, each format gets its own endpoint, and each
endpoint expects its native request shape. The routing
decision is made by the model name, not the endpoint.

That last part is the interesting bit. The model name decides
what runs — the endpoint decides what shape your code sees.

So you can do this:

# Anthropic SDK calling GPT-5.5
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.openmodel.ai",
    api_key="om-..."
)

response = client.messages.create(
    model="gpt-5.5",        # ← not a Claude model
    max_tokens=1024,
    messages=[{"role": "user", "content": "hi"}]
)
Enter fullscreen mode Exit fullscreen mode

The Anthropic-format request hits /v1/messages. The gateway
sees model="gpt-5.5", translates the request to OpenAI
Responses shape, calls GPT-5.5, and translates the response
back into Anthropic Messages format. Your client code never
sees the underlying provider.

Same thing in reverse:

# OpenAI SDK calling Claude Opus 4.7
from openai import OpenAI

client = OpenAI(
    base_url="https://api.openmodel.ai/v1",
    api_key="om-..."
)

response = client.responses.create(
    model="claude-opus-4-7",   # ← not an OpenAI model
    input="hi"
)
Enter fullscreen mode Exit fullscreen mode

OpenAI Responses shape goes in, Claude Opus 4.7 runs, OpenAI
Responses shape comes out.

What this means in practice: if you've built a Claude Code
workflow and want to delegate one subtask to GPT-5.5 because
it's better at that thing, you don't rewrite anything. You
change the model field. Same code, same SDK, same response
shape downstream.

The hard parts

This design isn't a free win. A few things we ran into:

Translation correctness becomes most of the test surface.
Every pair of formats has its own edge cases. OpenAI tool
calls don't map perfectly to Anthropic tool use. Anthropic
content blocks don't map perfectly to Gemini parts. Most of
our test suite is just round-trip correctness — same
semantic input, expected semantic output across all pairs.

Streaming format translation is the worst of it. Each
format has its own SSE event structure. When the upstream is
OpenAI but the client expects Anthropic's named events, we
have to synthesize message_start, content_block_delta,
message_stop events from a stream that was never structured
that way. Tool use inside a stream is particularly nasty
because each format encodes the start/middle/end of a tool
call differently.

Two-phase rate limiting. Per-user RPM and TPM limits,
plus per-channel limits at the upstream level. RPM is easy
(increment counter, check window). TPM requires a pre-request
token estimate (we use a tokenizer; fall back to length / 4
heuristic when unavailable), reservation against the budget,
and post-response reconciliation against the real token count.
Approximate in the moment, accurate over time.

Error format mapping. When the gateway has to return its
own error (rate limit exceeded, invalid key, etc.), the error
has to come back in the format your SDK expects. So a rate
limit error sent through /v1/messages looks like Anthropic's
error shape ({"type": "error", "error": {"type": "rate_limit_error", ...}}),
not OpenAI's. Without this, every SDK's built-in error
handling breaks.

When this design isn't the right call

To be clear: OpenAI-compatible gateways aren't wrong. For
many teams, they're the right call.

Use an OpenAI-compatible gateway when:

  • You're already on the OpenAI SDK and don't expect to change.
  • You want maximum model breadth, including long-tail open-source.
  • Slight fidelity loss in tool use or streaming details doesn't matter for your workload.

Use multi-native (like OpenModel) when:

  • You've built on Anthropic SDK or Google GenAI SDK and want to keep them native.
  • You want to use one SDK to call models from another provider family.
  • Tool use, vision, or streaming event structure matters for your code.

Both designs solve real problems. Just different ones.

Try it

OpenModel is in early access right now — $10 in free credits,
no card required. Currently routing gpt-5.5, claude-opus-4-7,
gemini-2.5-pro, deepseek-v4-pro, and deepseek-v4-flash.

Credits expire in 7 days and accounts get wiped before public
launch in about 4-6 weeks. The wipe is intentional — we want
feedback on the routing, rate limit, and API design before
we lock anything down.

Questions and pushback welcome. The cross-format translation
layer (especially streaming and tool use) has more nuance
than fit in one post — drop a question if you want a
follow-up.

Top comments (0)