DEV Community

Chinallmapi
Chinallmapi

Posted on

How to use one OpenAI-compatible gateway for chat, responses, embeddings, rerank, image, and audio APIs

If you're building an AI-powered app today, you're probably juggling multiple model providers. OpenAI for GPT. DeepSeek for cost savings. A Chinese model for specific tasks. Maybe Anthropic for Claude.

Each provider has its own SDK, its own auth flow, its own quirks. That's not just annoying—it's fragile. Switching models means rewriting code. Adding a new provider means more maintenance burden.

There's a cleaner approach: one gateway that speaks OpenAI's protocol, but routes to multiple backends.

This isn't about replacing your provider. It's about abstracting the integration layer so you can swap, compare, and combine models without touching your application code.

Let's walk through what this looks like in practice, using ChinaLLM as a concrete example of a publicly documented gateway.


Why OpenAI-compatible matters more than another SDK

The OpenAI API format has become a de facto standard. Most AI tools—LangChain, AutoGen, custom agents—expect an OpenAI-style interface:

If you switch to another provider, you either:

  • Rewrite your integration code
  • Use a provider-specific SDK (and lock yourself in)
  • Find a gateway that translates everything into the format you already know

The third option is increasingly viable. A gateway that exposes OpenAI-compatible endpoints but routes to multiple backends gives you:

  • Portability: change providers without code changes
  • Comparison: test different models side-by-side with the same API call
  • Cost optimization: route to cheaper models when quality differences don't matter
  • Simpler stack: one auth flow, one SDK, one set of error handling patterns

This isn't theoretical. ChinaLLM, for instance, exposes exactly this kind of gateway—publicly documented, with known endpoints and pricing.


What ChinaLLM publicly exposes today

ChinaLLM is an OpenAI-compatible API gateway that routes to both OpenAI models and China-native providers (DeepSeek, Alibaba coding plans, GLM, ZAI).

Public documentation shows the following endpoints:

Core chat:

  • /v1/chat/completions — standard OpenAI chat format
  • /v1/responses — OpenAI Responses API format
  • /v1/responses/compact — compacted responses for lower token usage
  • /v1/messages — Anthropic-style messages (Claude protocol)

Discovery and embeddings:

  • /v1beta/models — list available models
  • /v1/embeddings — text embeddings
  • /v1/rerank — reranking for search/RAG pipelines

Image:

  • /v1/images/generations — generate images from text
  • /v1/images/edits — edit existing images
  • /v1/images/variations — create variations of an image

Audio:

  • /v1/audio/speech — text-to-speech
  • /v1/audio/transcriptions — speech-to-text
  • /v1/audio/translations — translate audio to English text

All endpoints use OpenAI-compatible request/response formats. The same SDK you use for OpenAI works here—just change the base URL.


Getting a token and setting a base URL

The setup is minimal:

  1. Get an API key from ChinaLLM (signup process is standard)
  2. Set your base URL to https://chinallmapi.com/v1
  3. Use your existing OpenAI SDK or HTTP client

No new dependencies. No new auth patterns.

For complete code examples, see the GitHub repo.

Example with OpenAI Python SDK:

import openai

client = openai.OpenAI(
    api_key="your-chinallm-key",
    base_url="https://chinallmapi.com/v1"
)

# Now use it exactly like you would with OpenAI
response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "What's the capital of France?"}]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Same SDK. Same method signatures. Different backend.


First request with /v1/chat/completions

Let's make a real request. We'll use a cost-efficient model from the public pricing list: deepseek-v4-flash.

import openai

client = openai.OpenAI(
    api_key="your-chinallm-key",
    base_url="https://chinallmapi.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ],
    temperature=0.7
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

This returns a response in standard OpenAI format.

The key insight: you didn't change any code to switch from OpenAI to DeepSeek. You just changed the model name.


Expanding beyond chat (responses / embeddings / rerank)

Chat is the obvious use case. But a unified gateway becomes more valuable when you need multiple capabilities in the same app.

Responses API

The Responses API (/v1/responses) is useful when you want structured outputs with built-in reasoning traces:

response = client.responses.create(
    model="gpt-5.4",
    input="Analyze this customer feedback and extract sentiment, topic, and action items.",
    instructions="Return a JSON object with sentiment, topic, and action_items fields."
)

print(response.output)
Enter fullscreen mode Exit fullscreen mode

Embeddings

For RAG or semantic search:

embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="What are the best practices for API design?"
)

vector = embedding.data[0].embedding
print(f"Embedding dimension: {len(vector)}")
Enter fullscreen mode Exit fullscreen mode

Rerank

When you have multiple candidate documents and need to rank them by relevance to a query:

import requests

response = requests.post(
    "https://chinallmapi.com/v1/rerank",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "rerank-model",
        "query": "What is the refund policy?",
        "documents": [
            "Our refund policy allows returns within 30 days...",
            "Shipping takes 5-7 business days...",
            "We accept PayPal and credit cards..."
        ]
    }
)

print(response.json()["results"])
Enter fullscreen mode Exit fullscreen mode

Each capability uses the same auth pattern, same base URL, familiar request formats. No separate SDKs for embeddings vs. chat vs. rerank.


Public pricing and model discovery

ChinaLLM's pricing page shows transparent model costs with group-specific multipliers:

Group multipliers:

  • CodingPlan (Alibaba coding plans): 1.1x
  • DeepSeek: 1.05x
  • GLM: 1.05x
  • OpenAI: 1.3x

This means:

  • DeepSeek models cost roughly 5% more than base DeepSeek pricing
  • OpenAI models cost roughly 30% more than base OpenAI pricing
  • The gateway adds a margin, but you get unified access and simpler integration

Visible models (partial list from public docs):

  • gpt-5.4, gpt-5.5 (OpenAI)
  • gpt-image-2 (OpenAI image model)
  • deepseek-v4-flash, deepseek-v4-pro (DeepSeek)
  • glm-4.7 (GLM/Zhipu)

To discover all available models:

models = client.models.list()
for model in models.data:
    print(model.id)
Enter fullscreen mode Exit fullscreen mode

This returns the current model catalog—useful when new models are added without announcement.


When this approach is useful

A unified gateway isn't for everyone. But it's particularly valuable when:

  1. You're comparing models. You want to test GPT vs. DeepSeek vs. GLM on the same task without rewriting integration code.

  2. You're optimizing costs. You want to route simple queries to cheaper models and complex ones to premium models—all through one API.

  3. You're building multi-modal apps. You need chat + embeddings + images + audio in one stack, and don't want separate auth flows for each.

  4. You're building tooling. You want your framework to support multiple providers out of the box, without hardcoding provider-specific logic.

  5. You're hedging provider risk. You want the option to switch providers quickly if pricing changes, service quality drops, or better options emerge.

The gateway approach abstracts away provider differences. You still need to know which model to use for which task—but you don't need separate code for each provider.


Final takeaway

One OpenAI-compatible gateway. Multiple backends. Same SDK. Same auth. Same request formats.

This isn't about replacing your provider. It's about making your integration layer more portable, more testable, and more resilient to provider changes.

ChinaLLM is one concrete implementation of this pattern—publicly documented, with transparent pricing and a clear model catalog. If you're evaluating this approach, it's a useful reference point.

The bigger idea: stop writing provider-specific integration code. Write to a standard interface, and let the gateway handle the routing.

Top comments (0)