<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ChrisL</title>
    <description>The latest articles on DEV Community by ChrisL (@chrisl_8197).</description>
    <link>https://dev.to/chrisl_8197</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3934939%2Ff220b1d7-22ca-4947-b784-728e7dc050d8.png</url>
      <title>DEV Community: ChrisL</title>
      <link>https://dev.to/chrisl_8197</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chrisl_8197"/>
    <language>en</language>
    <item>
      <title>Why we built an AI gateway with three native API formats, not just OpenAI-compatible</title>
      <dc:creator>ChrisL</dc:creator>
      <pubDate>Sun, 17 May 2026 13:28:17 +0000</pubDate>
      <link>https://dev.to/chrisl_8197/why-we-built-an-ai-gateway-with-three-native-api-formats-not-just-openai-compatible-45ah</link>
      <guid>https://dev.to/chrisl_8197/why-we-built-an-ai-gateway-with-three-native-api-formats-not-just-openai-compatible-45ah</guid>
      <description>&lt;p&gt;If you've worked with multiple LLM providers in the past year, &lt;br&gt;
you've probably reached for a gateway like OpenRouter, LiteLLM, &lt;br&gt;
or Portkey. They solve a real problem: one API key, one bill, &lt;br&gt;
drop-in access to dozens of models.&lt;/p&gt;

&lt;p&gt;But almost every gateway in this space shares one design &lt;br&gt;
choice: normalize everything to OpenAI-compatible format. &lt;br&gt;
That's the lingua franca of LLM APIs — pick it as the common &lt;br&gt;
denominator and everyone can use it.&lt;/p&gt;

&lt;p&gt;We made a different choice. We built OpenModel with three &lt;br&gt;
native API surfaces in parallel:&lt;/p&gt;

&lt;p&gt;POST /v1/responses                     OpenAI Responses API&lt;br&gt;
  POST /v1/messages                      Anthropic Messages API&lt;br&gt;
  POST /v1beta/models/{model}:generate   Gemini generateContent&lt;/p&gt;

&lt;p&gt;This post is about why we made that call, what it enables, &lt;br&gt;
and what it costs.&lt;/p&gt;
&lt;h2&gt;
  
  
  The OpenAI-compatible default
&lt;/h2&gt;

&lt;p&gt;When a gateway is OpenAI-compatible only, every request — &lt;br&gt;
no matter what model it's targeting — goes through OpenAI's &lt;br&gt;
request/response shape. If you call Claude, the gateway &lt;br&gt;
translates your OpenAI-style request into Anthropic's shape &lt;br&gt;
upstream, then translates the response back to OpenAI format &lt;br&gt;
for you.&lt;/p&gt;

&lt;p&gt;This works fine if you're already using the OpenAI SDK. But &lt;br&gt;
if you've built on Anthropic SDK (Claude Code, agent frameworks &lt;br&gt;
that speak Anthropic Messages API natively) or Google GenAI &lt;br&gt;
SDK, you have two bad options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rewrite your code to OpenAI shape (one-time but invasive).&lt;/li&gt;
&lt;li&gt;Add another translation layer on top (Anthropic shape → 
OpenAI shape on the way to gateway → OpenAI shape → 
Anthropic shape on the way back).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both lose fidelity on the parts of the protocols that don't &lt;br&gt;
round-trip cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool use.&lt;/strong&gt; Anthropic returns content blocks with explicit 
&lt;code&gt;tool_use&lt;/code&gt; and &lt;code&gt;text&lt;/code&gt; types in a structured array. OpenAI 
returns a flat &lt;code&gt;tool_calls&lt;/code&gt; field on the message. The 
structural information in one shape doesn't survive a 
translation to the other.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision.&lt;/strong&gt; Anthropic and Gemini accept image content blocks 
inline. OpenAI uses &lt;code&gt;image_url&lt;/code&gt; URLs. Round-tripping isn't 
lossless.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming.&lt;/strong&gt; OpenAI streams flat &lt;code&gt;data:&lt;/code&gt; JSON lines. 
Anthropic uses named SSE events (&lt;code&gt;message_start&lt;/code&gt;, 
&lt;code&gt;content_block_delta&lt;/code&gt;, &lt;code&gt;message_stop&lt;/code&gt;). Gemini sends chunks 
with &lt;code&gt;candidates&lt;/code&gt; arrays. You can normalize them, but you 
lose the original event structure that SDKs rely on.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  What "three native surfaces" buys you
&lt;/h2&gt;

&lt;p&gt;In OpenModel, each format gets its own endpoint, and each &lt;br&gt;
endpoint expects its native request shape. The routing &lt;br&gt;
decision is made by &lt;strong&gt;the model name&lt;/strong&gt;, not the endpoint.&lt;/p&gt;

&lt;p&gt;That last part is the interesting bit. The model name decides &lt;br&gt;
what runs — the endpoint decides what shape your code sees.&lt;/p&gt;

&lt;p&gt;So you can do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Anthropic SDK calling GPT-5.5
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openmodel.ai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;om-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# ← not a Claude model
&lt;/span&gt;    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Anthropic-format request hits &lt;code&gt;/v1/messages&lt;/code&gt;. The gateway &lt;br&gt;
sees &lt;code&gt;model="gpt-5.5"&lt;/code&gt;, translates the request to OpenAI &lt;br&gt;
Responses shape, calls GPT-5.5, and translates the response &lt;br&gt;
back into Anthropic Messages format. Your client code never &lt;br&gt;
sees the underlying provider.&lt;/p&gt;

&lt;p&gt;Same thing in reverse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# OpenAI SDK calling Claude Opus 4.7
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openmodel.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;om-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# ← not an OpenAI model
&lt;/span&gt;    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OpenAI Responses shape goes in, Claude Opus 4.7 runs, OpenAI &lt;br&gt;
Responses shape comes out.&lt;/p&gt;

&lt;p&gt;What this means in practice: if you've built a Claude Code &lt;br&gt;
workflow and want to delegate one subtask to GPT-5.5 because &lt;br&gt;
it's better at that thing, you don't rewrite anything. You &lt;br&gt;
change the &lt;code&gt;model&lt;/code&gt; field. Same code, same SDK, same response &lt;br&gt;
shape downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard parts
&lt;/h2&gt;

&lt;p&gt;This design isn't a free win. A few things we ran into:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Translation correctness becomes most of the test surface.&lt;/strong&gt; &lt;br&gt;
Every pair of formats has its own edge cases. OpenAI tool &lt;br&gt;
calls don't map perfectly to Anthropic tool use. Anthropic &lt;br&gt;
content blocks don't map perfectly to Gemini parts. Most of &lt;br&gt;
our test suite is just round-trip correctness — same &lt;br&gt;
semantic input, expected semantic output across all pairs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming format translation is the worst of it.&lt;/strong&gt; Each &lt;br&gt;
format has its own SSE event structure. When the upstream is &lt;br&gt;
OpenAI but the client expects Anthropic's named events, we &lt;br&gt;
have to synthesize &lt;code&gt;message_start&lt;/code&gt;, &lt;code&gt;content_block_delta&lt;/code&gt;, &lt;br&gt;
&lt;code&gt;message_stop&lt;/code&gt; events from a stream that was never structured &lt;br&gt;
that way. Tool use inside a stream is particularly nasty &lt;br&gt;
because each format encodes the start/middle/end of a tool &lt;br&gt;
call differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two-phase rate limiting.&lt;/strong&gt; Per-user RPM and TPM limits, &lt;br&gt;
plus per-channel limits at the upstream level. RPM is easy &lt;br&gt;
(increment counter, check window). TPM requires a pre-request &lt;br&gt;
token estimate (we use a tokenizer; fall back to &lt;code&gt;length / 4&lt;/code&gt; &lt;br&gt;
heuristic when unavailable), reservation against the budget, &lt;br&gt;
and post-response reconciliation against the real token count. &lt;br&gt;
Approximate in the moment, accurate over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error format mapping.&lt;/strong&gt; When the gateway has to return its &lt;br&gt;
own error (rate limit exceeded, invalid key, etc.), the error &lt;br&gt;
has to come back in the format your SDK expects. So a rate &lt;br&gt;
limit error sent through &lt;code&gt;/v1/messages&lt;/code&gt; looks like Anthropic's &lt;br&gt;
error shape (&lt;code&gt;{"type": "error", "error": {"type": "rate_limit_error", ...}}&lt;/code&gt;), &lt;br&gt;
not OpenAI's. Without this, every SDK's built-in error &lt;br&gt;
handling breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this design isn't the right call
&lt;/h2&gt;

&lt;p&gt;To be clear: OpenAI-compatible gateways aren't wrong. For &lt;br&gt;
many teams, they're the right call.&lt;/p&gt;

&lt;p&gt;Use an OpenAI-compatible gateway when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're already on the OpenAI SDK and don't expect to change.&lt;/li&gt;
&lt;li&gt;You want maximum model breadth, including long-tail open-source.&lt;/li&gt;
&lt;li&gt;Slight fidelity loss in tool use or streaming details doesn't 
matter for your workload.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use multi-native (like OpenModel) when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You've built on Anthropic SDK or Google GenAI SDK and want 
to keep them native.&lt;/li&gt;
&lt;li&gt;You want to use one SDK to call models from another provider 
family.&lt;/li&gt;
&lt;li&gt;Tool use, vision, or streaming event structure matters for 
your code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both designs solve real problems. Just different ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;OpenModel is in early access right now — $10 in free credits, &lt;br&gt;
no card required. Currently routing gpt-5.5, claude-opus-4-7, &lt;br&gt;
gemini-2.5-pro, deepseek-v4-pro, and deepseek-v4-flash.&lt;/p&gt;

&lt;p&gt;Credits expire in 7 days and accounts get wiped before public &lt;br&gt;
launch in about 4-6 weeks. The wipe is intentional — we want &lt;br&gt;
feedback on the routing, rate limit, and API design before &lt;br&gt;
we lock anything down.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try it: &lt;a href="https://openmodel.ai" rel="noopener noreferrer"&gt;openmodel.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://docs.openmodel.ai" rel="noopener noreferrer"&gt;docs.openmodel.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Updates on X: &lt;a href="https://twitter.com/openmodel_" rel="noopener noreferrer"&gt;@openmodel_&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Questions and pushback welcome. The cross-format translation &lt;br&gt;
layer (especially streaming and tool use) has more nuance &lt;br&gt;
than fit in one post — drop a question if you want a &lt;br&gt;
follow-up.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>api</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
