<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chinallmapi</title>
    <description>The latest articles on DEV Community by Chinallmapi (@chinallmapi).</description>
    <link>https://dev.to/chinallmapi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904381%2F51f8c181-3747-41d0-837c-09064d25b1ce.png</url>
      <title>DEV Community: Chinallmapi</title>
      <link>https://dev.to/chinallmapi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chinallmapi"/>
    <language>en</language>
    <item>
      <title>How to use one OpenAI-compatible gateway for chat, responses, embeddings, rerank, image, and audio APIs</title>
      <dc:creator>Chinallmapi</dc:creator>
      <pubDate>Wed, 29 Apr 2026 13:48:21 +0000</pubDate>
      <link>https://dev.to/chinallmapi/how-to-use-one-openai-compatible-gateway-for-chat-responses-embeddings-rerank-image-and-audio-4431</link>
      <guid>https://dev.to/chinallmapi/how-to-use-one-openai-compatible-gateway-for-chat-responses-embeddings-rerank-image-and-audio-4431</guid>
      <description>&lt;p&gt;If you're building an AI-powered app today, you're probably juggling multiple model providers. OpenAI for GPT. DeepSeek for cost savings. A Chinese model for specific tasks. Maybe Anthropic for Claude.&lt;/p&gt;

&lt;p&gt;Each provider has its own SDK, its own auth flow, its own quirks. That's not just annoying—it's fragile. Switching models means rewriting code. Adding a new provider means more maintenance burden.&lt;/p&gt;

&lt;p&gt;There's a cleaner approach: &lt;strong&gt;one gateway that speaks OpenAI's protocol, but routes to multiple backends.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't about replacing your provider. It's about abstracting the integration layer so you can swap, compare, and combine models without touching your application code.&lt;/p&gt;

&lt;p&gt;Let's walk through what this looks like in practice, using ChinaLLM as a concrete example of a publicly documented gateway.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why OpenAI-compatible matters more than another SDK
&lt;/h2&gt;

&lt;p&gt;The OpenAI API format has become a de facto standard. Most AI tools—LangChain, AutoGen, custom agents—expect an OpenAI-style interface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Base URL: &lt;a href="https://api.openai.com/v1" rel="noopener noreferrer"&gt;https://api.openai.com/v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Auth: Bearer token in the header&lt;/li&gt;
&lt;li&gt;Endpoints: /v1/chat/completions, /v1/embeddings, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you switch to another provider, you either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rewrite your integration code&lt;/li&gt;
&lt;li&gt;Use a provider-specific SDK (and lock yourself in)&lt;/li&gt;
&lt;li&gt;Find a gateway that translates everything into the format you already know&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The third option is increasingly viable. A gateway that exposes OpenAI-compatible endpoints but routes to multiple backends gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Portability:&lt;/strong&gt; change providers without code changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comparison:&lt;/strong&gt; test different models side-by-side with the same API call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization:&lt;/strong&gt; route to cheaper models when quality differences don't matter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simpler stack:&lt;/strong&gt; one auth flow, one SDK, one set of error handling patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't theoretical. ChinaLLM, for instance, exposes exactly this kind of gateway—publicly documented, with known endpoints and pricing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What ChinaLLM publicly exposes today
&lt;/h2&gt;

&lt;p&gt;ChinaLLM is an OpenAI-compatible API gateway that routes to both OpenAI models and China-native providers (DeepSeek, Alibaba coding plans, GLM, ZAI).&lt;/p&gt;

&lt;p&gt;Public documentation shows the following endpoints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core chat:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;/v1/chat/completions — standard OpenAI chat format&lt;/li&gt;
&lt;li&gt;/v1/responses — OpenAI Responses API format&lt;/li&gt;
&lt;li&gt;/v1/responses/compact — compacted responses for lower token usage&lt;/li&gt;
&lt;li&gt;/v1/messages — Anthropic-style messages (Claude protocol)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Discovery and embeddings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;/v1beta/models — list available models&lt;/li&gt;
&lt;li&gt;/v1/embeddings — text embeddings&lt;/li&gt;
&lt;li&gt;/v1/rerank — reranking for search/RAG pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Image:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;/v1/images/generations — generate images from text&lt;/li&gt;
&lt;li&gt;/v1/images/edits — edit existing images&lt;/li&gt;
&lt;li&gt;/v1/images/variations — create variations of an image&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Audio:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;/v1/audio/speech — text-to-speech&lt;/li&gt;
&lt;li&gt;/v1/audio/transcriptions — speech-to-text&lt;/li&gt;
&lt;li&gt;/v1/audio/translations — translate audio to English text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All endpoints use OpenAI-compatible request/response formats. The same SDK you use for OpenAI works here—just change the base URL.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting a token and setting a base URL
&lt;/h2&gt;

&lt;p&gt;The setup is minimal:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get an API key from &lt;a href="https://chinallmapi.com" rel="noopener noreferrer"&gt;ChinaLLM&lt;/a&gt; (signup process is standard)&lt;/li&gt;
&lt;li&gt;Set your base URL to &lt;a href="https://chinallmapi.com/v1" rel="noopener noreferrer"&gt;https://chinallmapi.com/v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use your existing OpenAI SDK or HTTP client&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No new dependencies. No new auth patterns.&lt;/p&gt;

&lt;p&gt;For complete code examples, see the &lt;a href="https://github.com/Chinallmapi/chinallm-openai-compatible-examples" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example with OpenAI Python SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-chinallm-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://chinallmapi.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Now use it exactly like you would with OpenAI
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-v4-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same SDK. Same method signatures. Different backend.&lt;/p&gt;




&lt;h2&gt;
  
  
  First request with /v1/chat/completions
&lt;/h2&gt;

&lt;p&gt;Let's make a real request. We'll use a cost-efficient model from the public pricing list: deepseek-v4-flash.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-chinallm-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://chinallmapi.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-v4-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful coding assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python function to merge two sorted lists.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns a response in standard OpenAI format.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;you didn't change any code to switch from OpenAI to DeepSeek.&lt;/strong&gt; You just changed the model name.&lt;/p&gt;




&lt;h2&gt;
  
  
  Expanding beyond chat (responses / embeddings / rerank)
&lt;/h2&gt;

&lt;p&gt;Chat is the obvious use case. But a unified gateway becomes more valuable when you need multiple capabilities in the same app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Responses API
&lt;/h3&gt;

&lt;p&gt;The Responses API (/v1/responses) is useful when you want structured outputs with built-in reasoning traces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze this customer feedback and extract sentiment, topic, and action items.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return a JSON object with sentiment, topic, and action_items fields.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Embeddings
&lt;/h3&gt;

&lt;p&gt;For RAG or semantic search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the best practices for API design?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Embedding dimension: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rerank
&lt;/h3&gt;

&lt;p&gt;When you have multiple candidate documents and need to rank them by relevance to a query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://chinallmapi.com/v1/rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the refund policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Our refund policy allows returns within 30 days...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Shipping takes 5-7 business days...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;We accept PayPal and credit cards...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each capability uses the same auth pattern, same base URL, familiar request formats. No separate SDKs for embeddings vs. chat vs. rerank.&lt;/p&gt;




&lt;h2&gt;
  
  
  Public pricing and model discovery
&lt;/h2&gt;

&lt;p&gt;ChinaLLM's pricing page shows transparent model costs with group-specific multipliers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Group multipliers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CodingPlan (Alibaba coding plans): 1.1x&lt;/li&gt;
&lt;li&gt;DeepSeek: 1.05x&lt;/li&gt;
&lt;li&gt;GLM: 1.05x&lt;/li&gt;
&lt;li&gt;OpenAI: 1.3x&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DeepSeek models cost roughly 5% more than base DeepSeek pricing&lt;/li&gt;
&lt;li&gt;OpenAI models cost roughly 30% more than base OpenAI pricing&lt;/li&gt;
&lt;li&gt;The gateway adds a margin, but you get unified access and simpler integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Visible models (partial list from public docs):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gpt-5.4, gpt-5.5 (OpenAI)&lt;/li&gt;
&lt;li&gt;gpt-image-2 (OpenAI image model)&lt;/li&gt;
&lt;li&gt;deepseek-v4-flash, deepseek-v4-pro (DeepSeek)&lt;/li&gt;
&lt;li&gt;glm-4.7 (GLM/Zhipu)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To discover all available models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns the current model catalog—useful when new models are added without announcement.&lt;/p&gt;




&lt;h2&gt;
  
  
  When this approach is useful
&lt;/h2&gt;

&lt;p&gt;A unified gateway isn't for everyone. But it's particularly valuable when:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You're comparing models.&lt;/strong&gt; You want to test GPT vs. DeepSeek vs. GLM on the same task without rewriting integration code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You're optimizing costs.&lt;/strong&gt; You want to route simple queries to cheaper models and complex ones to premium models—all through one API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You're building multi-modal apps.&lt;/strong&gt; You need chat + embeddings + images + audio in one stack, and don't want separate auth flows for each.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You're building tooling.&lt;/strong&gt; You want your framework to support multiple providers out of the box, without hardcoding provider-specific logic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You're hedging provider risk.&lt;/strong&gt; You want the option to switch providers quickly if pricing changes, service quality drops, or better options emerge.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The gateway approach abstracts away provider differences. You still need to know which model to use for which task—but you don't need separate code for each provider.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;One OpenAI-compatible gateway. Multiple backends. Same SDK. Same auth. Same request formats.&lt;/p&gt;

&lt;p&gt;This isn't about replacing your provider. It's about making your integration layer more portable, more testable, and more resilient to provider changes.&lt;/p&gt;

&lt;p&gt;ChinaLLM is one concrete implementation of this pattern—publicly documented, with transparent pricing and a clear model catalog. If you're evaluating this approach, it's a useful reference point.&lt;/p&gt;

&lt;p&gt;The bigger idea: &lt;strong&gt;stop writing provider-specific integration code.&lt;/strong&gt; Write to a standard interface, and let the gateway handle the routing.&lt;/p&gt;

</description>
      <category>api</category>
    </item>
  </channel>
</rss>
