<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jack Liu</title>
    <description>The latest articles on DEV Community by Jack Liu (@jack_liu_2026).</description>
    <link>https://dev.to/jack_liu_2026</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3988282%2Fdcbf32d6-4359-4da4-ba7e-51f2a50b75a5.png</url>
      <title>DEV Community: Jack Liu</title>
      <link>https://dev.to/jack_liu_2026</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jack_liu_2026"/>
    <language>en</language>
    <item>
      <title>How to switch from OpenAI API to a multi-model API gateway in 3 minutes</title>
      <dc:creator>Jack Liu</dc:creator>
      <pubDate>Wed, 17 Jun 2026 08:58:45 +0000</pubDate>
      <link>https://dev.to/jack_liu_2026/how-to-switch-from-openai-api-to-a-multi-model-api-gateway-in-3-minutes-3fph</link>
      <guid>https://dev.to/jack_liu_2026/how-to-switch-from-openai-api-to-a-multi-model-api-gateway-in-3-minutes-3fph</guid>
      <description>&lt;p&gt;Most AI apps start with one model provider.&lt;/p&gt;

&lt;p&gt;That is usually the right choice. It keeps the first version simple: one SDK, one API key, one billing page, one set of model names.&lt;/p&gt;

&lt;p&gt;But once the product grows a little, teams often want to compare models across a few dimensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;quality for harder reasoning tasks&lt;/li&gt;
&lt;li&gt;latency for user-facing flows&lt;/li&gt;
&lt;li&gt;cost for high-volume requests&lt;/li&gt;
&lt;li&gt;long-context behavior&lt;/li&gt;
&lt;li&gt;fallback when one model is slow or unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, wiring every provider separately can get annoying. You may end up with different SDKs, different auth patterns, different model names, different dashboards, and different billing flows.&lt;/p&gt;

&lt;p&gt;One practical option is to use an OpenAI-compatible gateway. The application still talks to an OpenAI-style API, but the gateway lets you route requests to multiple model families.&lt;/p&gt;

&lt;p&gt;I am on the TokenBay team, so the example below uses TokenBay. The broader pattern applies to any OpenAI-compatible gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before: using OpenAI directly
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the tradeoffs of using an LLM API gateway.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  After: using an OpenAI-compatible gateway
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.tokenbay.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TOKENBAY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.4-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the tradeoffs of using an LLM API gateway.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main difference is the &lt;code&gt;base_url&lt;/code&gt; and the API key. The rest of the code keeps the familiar OpenAI client shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trying another model
&lt;/h2&gt;

&lt;p&gt;Once your app uses an OpenAI-compatible endpoint, you can test another supported model by changing configuration instead of rewriting provider-specific integration code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the tradeoffs of using an LLM API gateway.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a real application, you would usually keep the model name in environment variables or application config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://api.tokenbay.com/v1
&lt;span class="nv"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gpt-5.4-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That makes it easier to compare model behavior without changing business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this pattern is useful
&lt;/h2&gt;

&lt;p&gt;This can be useful if you are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;building an AI SaaS product and want to test cost/quality tradeoffs&lt;/li&gt;
&lt;li&gt;building agents that use different models for planning, tool use, classification, and fallback&lt;/li&gt;
&lt;li&gt;building internal tools where different projects need separate API keys and usage tracking&lt;/li&gt;
&lt;li&gt;prototyping with multiple providers before choosing a long-term default&lt;/li&gt;
&lt;li&gt;trying to avoid provider-specific code in your first version&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When direct provider integration may still be better
&lt;/h2&gt;

&lt;p&gt;A gateway is not always the right choice.&lt;/p&gt;

&lt;p&gt;Direct provider integration may be better if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you need provider-specific beta features immediately&lt;/li&gt;
&lt;li&gt;you have strict procurement or compliance requirements&lt;/li&gt;
&lt;li&gt;you already have negotiated enterprise contracts with each model provider&lt;/li&gt;
&lt;li&gt;you want the fewest possible moving parts in the request path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff is convenience and unified billing versus another layer in the stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I would compare gateways
&lt;/h2&gt;

&lt;p&gt;There are several products in this category now, including OpenRouter-style model marketplaces, self-hosted options such as LiteLLM, and production gateway products focused on routing, observability, or governance.&lt;/p&gt;

&lt;p&gt;I would compare them on practical criteria:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model coverage for the models you actually use&lt;/li&gt;
&lt;li&gt;pricing clarity&lt;/li&gt;
&lt;li&gt;OpenAI SDK compatibility&lt;/li&gt;
&lt;li&gt;latency and streaming behavior&lt;/li&gt;
&lt;li&gt;usage logs and project-level cost visibility&lt;/li&gt;
&lt;li&gt;API key limits and safety controls&lt;/li&gt;
&lt;li&gt;privacy/data policy&lt;/li&gt;
&lt;li&gt;whether you need hosted convenience or self-hosted control&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Things to evaluate before using any gateway
&lt;/h2&gt;

&lt;p&gt;Before using a gateway in production, I would check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which models are actually available&lt;/li&gt;
&lt;li&gt;how pricing is displayed&lt;/li&gt;
&lt;li&gt;whether request/usage logs are clear enough&lt;/li&gt;
&lt;li&gt;whether API keys can be limited per project&lt;/li&gt;
&lt;li&gt;what data and privacy policy is published&lt;/li&gt;
&lt;li&gt;what happens when a model errors or times out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those questions matter more than the integration code itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  TokenBay example
&lt;/h2&gt;

&lt;p&gt;TokenBay is an OpenAI-compatible API gateway for accessing models such as GPT, Claude, Gemini, DeepSeek, and others through one endpoint and API key. It also includes pay-as-you-go billing, API key management, usage logs, and per-key limits.&lt;/p&gt;

&lt;p&gt;If you are building an AI app and want to test one OpenAI-compatible endpoint for multiple model families, here is the link:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tokenbay.com/?utm_source=devto&amp;amp;utm_medium=community_content&amp;amp;utm_campaign=week1_free_content" rel="noopener noreferrer"&gt;https://www.tokenbay.com/?utm_source=devto&amp;amp;utm_medium=community_content&amp;amp;utm_campaign=week1_free_content&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Current launch offer on the homepage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;15% off most models&lt;/li&gt;
&lt;li&gt;500 free credits&lt;/li&gt;
&lt;li&gt;invite a friend, get 200 credits each&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That should make it easier to run a small quickstart test before committing real usage.&lt;/p&gt;

&lt;p&gt;I would especially love feedback from builders on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what would make you trust or not trust a gateway like this&lt;/li&gt;
&lt;li&gt;whether unified billing actually matters to you&lt;/li&gt;
&lt;li&gt;how you currently compare model cost and quality&lt;/li&gt;
&lt;li&gt;what is missing from the docs or onboarding&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>python</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
