ChinaWHAPI Team

Posted on May 22

Switching from OpenAI API to Chinese LLM APIs: A Practical Guide

Introduction

If you're currently using OpenAI's API in production and considering adding Chinese LLMs like DeepSeek, Qwen, or Kimi for cost savings or better performance on Chinese language tasks, this guide is for you.

The good news: you don't need to rewrite your entire application. With an OpenAI-compatible gateway, migration can be as simple as changing two lines of code.

Why Migrate to Chinese LLMs?

Cost Savings

Chinese LLMs offer dramatically lower pricing:

Provider	Model	Input (per 1M)	Output (per 1M)	vs GPT-4
OpenAI	GPT-4	$10.00	$30.00	-
DeepSeek	V3	$0.27	$1.10	~30x cheaper
Alibaba	Qwen Plus	$0.80	$2.00	~12x cheaper
Moonshot	Kimi	$0.50	$1.50	~20x cheaper

For high-volume applications, this can mean thousands of dollars saved monthly.

Performance on Chinese Tasks

Chinese LLMs often outperform Western models on:

Chinese language understanding
China-specific knowledge
Asian cultural context
Local regulations compliance

Model Diversity

Access to 200+ specialized models for different use cases:

Code generation (DeepSeek Coder)
Mathematical reasoning (Qwen Math)
Long context (Kimi 128K)
Fast inference (GLM Flash)

Migration Approaches

Option 1: Direct Provider Integration (Not Recommended)

You could integrate each provider directly:

# DeepSeek
response = requests.post(
    "https://api.deepseek.com/chat/completions",
    headers={"Authorization": "Bearer DEEPSEEK_KEY"},
    json={"model": "deepseek-chat", "messages": [...]}
)

# Qwen  
response = requests.post(
    "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation",
    headers={"Authorization": "Bearer QWEN_KEY"},
    json={"model": "qwen-plus", "input": {...}}
)

Problems:

Different API formats for each provider
Multiple API keys to manage
Different authentication methods
Hard to switch models dynamically

Option 2: OpenAI-Compatible Gateway (Recommended)

Use a gateway that provides OpenAI-compatible access to all Chinese LLMs:

from openai import OpenAI

# Before (OpenAI)
client = OpenAI(api_key="OPENAI_KEY")
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# After (Chinese LLM via gateway)
client = OpenAI(
    base_url="https://api.chinawhapi.com/v1",
    api_key="CHINAWHAPI_KEY"
)
response = client.chat.completions.create(
    model="deepseek-chat",  # or "qwen-plus", "moonshot-v1-128k", etc.
    messages=[{"role": "user", "content": "Hello"}]
)

Benefits:

Same SDK you already use
One API key for all models
Easy model switching
Consistent error handling

Step-by-Step Migration Guide

Step 1: Sign Up for Gateway Access

Create account at ChinaWHAPI
Get your API key from dashboard
Add 200K free credits (no credit card required)

Step 2: Update Your Code

Python Example:

# config.py
BASE_URL = "https://api.chinawhapi.com/v1"
API_KEY = "your_chinawhapi_key"

# llm_client.py
from openai import OpenAI

def get_client():
    return OpenAI(
        base_url=BASE_URL,
        api_key=API_KEY
    )

def chat(model: str, messages: list):
    client = get_client()
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    return response.choices[0].message.content

# Usage
response = chat("deepseek-chat", [{"role": "user", "content": "Hello!"}])
print(response)

Node.js Example:

// config.js
module.exports = {
  baseUrl: "https://api.chinawhapi.com/v1",
  apiKey: process.env.CHINAWHAPI_KEY
};

// llmClient.js
const OpenAI = require('openai');

function getClient() {
  return new OpenAI({
    baseURL: config.baseUrl,
    apiKey: config.apiKey
  });
}

async function chat(model, messages) {
  const client = getClient();
  const response = await client.chat.completions.create({
    model: model,
    messages: messages
  });
  return response.choices[0].message.content;
}

// Usage
const response = await chat("qwen-plus", [{role: "user", content: "Hello!"}]);
console.log(response);

Step 3: Test with Different Models

Create a simple test script to compare models:

models_to_test = [
    "deepseek-chat",      # DeepSeek V3
    "qwen-plus",          # Alibaba Qwen
    "moonshot-v1-128k",   # Kimi
    "glm-4",              # Zhipu GLM
]

for model in models_to_test:
    try:
        response = chat(model, [{"role": "user", "content": "Hello!"}])
        print(f"{model}: {response[:50]}...")
    except Exception as e:
        print(f"{model}: Error - {e}")

Step 4: Implement Model Fallback

For production reliability, implement automatic fallback:

def chat_with_fallback(messages, preferred_models=["deepseek-chat", "qwen-plus"]):
    for model in preferred_models:
        try:
            return chat(model, messages)
        except Exception as e:
            print(f"Model {model} failed: {e}")
            continue
    raise Exception("All models failed")

Step 5: Monitor Costs and Performance

Track usage across models:

import time

def chat_with_tracking(model, messages):
    start_time = time.time()

    response = chat(model, messages)

    duration = time.time() - start_time
    tokens_used = estimate_tokens(messages) + estimate_tokens([response])
    cost = calculate_cost(model, tokens_used)

    log_usage(model, tokens_used, cost, duration)

    return response

Common Migration Issues

Issue 1: Rate Limits

Problem: Getting 429 errors

Solution:

Check rate limits in dashboard
Implement exponential backoff
Use multiple models as fallback

import time
from openai import RateLimitError

def chat_with_retry(model, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return chat(model, messages)
        except RateLimitError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
    raise Exception("Max retries exceeded")

Issue 2: Model Availability

Problem: Some models temporarily unavailable

Solution:

Maintain a list of backup models
Implement health checks
Use gateway's automatic routing

Issue 3: Response Format Differences

Problem: Slight variations in response format

Solution:

Standardize parsing logic
Don't rely on model-specific behaviors
Test thoroughly before production

Production Checklist

Before going live:

[ ] Test all critical workflows with target models
[ ] Set up monitoring and alerting
[ ] Configure rate limiting and quotas
[ ] Implement fallback mechanisms
[ ] Document model selection criteria
[ ] Train team on new API endpoints
[ ] Update CI/CD pipelines if needed
[ ] Plan rollback strategy

Cost Optimization Tips

Use cheaper models for simple tasks: Not every query needs the most powerful model
Implement caching: Cache frequent queries to reduce API calls
Batch requests: Process multiple queries together when possible
Monitor token usage: Track input/output ratios and optimize prompts
Choose right context length: Don't use 128K context for short conversations

Conclusion

Migrating from OpenAI to Chinese LLMs doesn't have to be painful. With an OpenAI-compatible gateway:

Migration time: Hours instead of weeks
Code changes: 2-3 lines typically
Cost savings: 10-30x reduction possible
Model access: 200+ models instantly available

Start with non-critical workloads, test thoroughly, then gradually expand usage. The combination of cost savings and model diversity makes this migration worthwhile for most AI applications.

Questions about migration? Leave them in the comments!

DEV Community

Switching from OpenAI API to Chinese LLM APIs: A Practical Guide

Introduction

Why Migrate to Chinese LLMs?

Cost Savings

Performance on Chinese Tasks

Model Diversity

Migration Approaches

Option 1: Direct Provider Integration (Not Recommended)

Option 2: OpenAI-Compatible Gateway (Recommended)

Step-by-Step Migration Guide

Step 1: Sign Up for Gateway Access

Step 2: Update Your Code

Step 3: Test with Different Models

Step 4: Implement Model Fallback

Step 5: Monitor Costs and Performance

Common Migration Issues

Issue 1: Rate Limits

Issue 2: Model Availability

Issue 3: Response Format Differences

Production Checklist

Cost Optimization Tips

Conclusion

Top comments (0)