DEV Community

ChinaWHAPI Team
ChinaWHAPI Team

Posted on

Switching from OpenAI API to Chinese LLM APIs: A Practical Guide

Introduction

If you're currently using OpenAI's API in production and considering adding Chinese LLMs like DeepSeek, Qwen, or Kimi for cost savings or better performance on Chinese language tasks, this guide is for you.

The good news: you don't need to rewrite your entire application. With an OpenAI-compatible gateway, migration can be as simple as changing two lines of code.

Why Migrate to Chinese LLMs?

Cost Savings

Chinese LLMs offer dramatically lower pricing:

Provider Model Input (per 1M) Output (per 1M) vs GPT-4
OpenAI GPT-4 $10.00 $30.00 -
DeepSeek V3 $0.27 $1.10 ~30x cheaper
Alibaba Qwen Plus $0.80 $2.00 ~12x cheaper
Moonshot Kimi $0.50 $1.50 ~20x cheaper

For high-volume applications, this can mean thousands of dollars saved monthly.

Performance on Chinese Tasks

Chinese LLMs often outperform Western models on:

  • Chinese language understanding
  • China-specific knowledge
  • Asian cultural context
  • Local regulations compliance

Model Diversity

Access to 200+ specialized models for different use cases:

  • Code generation (DeepSeek Coder)
  • Mathematical reasoning (Qwen Math)
  • Long context (Kimi 128K)
  • Fast inference (GLM Flash)

Migration Approaches

Option 1: Direct Provider Integration (Not Recommended)

You could integrate each provider directly:

# DeepSeek
response = requests.post(
    "https://api.deepseek.com/chat/completions",
    headers={"Authorization": "Bearer DEEPSEEK_KEY"},
    json={"model": "deepseek-chat", "messages": [...]}
)

# Qwen  
response = requests.post(
    "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation",
    headers={"Authorization": "Bearer QWEN_KEY"},
    json={"model": "qwen-plus", "input": {...}}
)
Enter fullscreen mode Exit fullscreen mode

Problems:

  • Different API formats for each provider
  • Multiple API keys to manage
  • Different authentication methods
  • Hard to switch models dynamically

Option 2: OpenAI-Compatible Gateway (Recommended)

Use a gateway that provides OpenAI-compatible access to all Chinese LLMs:

from openai import OpenAI

# Before (OpenAI)
client = OpenAI(api_key="OPENAI_KEY")
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# After (Chinese LLM via gateway)
client = OpenAI(
    base_url="https://api.chinawhapi.com/v1",
    api_key="CHINAWHAPI_KEY"
)
response = client.chat.completions.create(
    model="deepseek-chat",  # or "qwen-plus", "moonshot-v1-128k", etc.
    messages=[{"role": "user", "content": "Hello"}]
)
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Same SDK you already use
  • One API key for all models
  • Easy model switching
  • Consistent error handling

Step-by-Step Migration Guide

Step 1: Sign Up for Gateway Access

  1. Create account at ChinaWHAPI
  2. Get your API key from dashboard
  3. Add 200K free credits (no credit card required)

Step 2: Update Your Code

Python Example:

# config.py
BASE_URL = "https://api.chinawhapi.com/v1"
API_KEY = "your_chinawhapi_key"

# llm_client.py
from openai import OpenAI

def get_client():
    return OpenAI(
        base_url=BASE_URL,
        api_key=API_KEY
    )

def chat(model: str, messages: list):
    client = get_client()
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    return response.choices[0].message.content

# Usage
response = chat("deepseek-chat", [{"role": "user", "content": "Hello!"}])
print(response)
Enter fullscreen mode Exit fullscreen mode

Node.js Example:

// config.js
module.exports = {
  baseUrl: "https://api.chinawhapi.com/v1",
  apiKey: process.env.CHINAWHAPI_KEY
};

// llmClient.js
const OpenAI = require('openai');

function getClient() {
  return new OpenAI({
    baseURL: config.baseUrl,
    apiKey: config.apiKey
  });
}

async function chat(model, messages) {
  const client = getClient();
  const response = await client.chat.completions.create({
    model: model,
    messages: messages
  });
  return response.choices[0].message.content;
}

// Usage
const response = await chat("qwen-plus", [{role: "user", content: "Hello!"}]);
console.log(response);
Enter fullscreen mode Exit fullscreen mode

Step 3: Test with Different Models

Create a simple test script to compare models:

models_to_test = [
    "deepseek-chat",      # DeepSeek V3
    "qwen-plus",          # Alibaba Qwen
    "moonshot-v1-128k",   # Kimi
    "glm-4",              # Zhipu GLM
]

for model in models_to_test:
    try:
        response = chat(model, [{"role": "user", "content": "Hello!"}])
        print(f"{model}: {response[:50]}...")
    except Exception as e:
        print(f"{model}: Error - {e}")
Enter fullscreen mode Exit fullscreen mode

Step 4: Implement Model Fallback

For production reliability, implement automatic fallback:

def chat_with_fallback(messages, preferred_models=["deepseek-chat", "qwen-plus"]):
    for model in preferred_models:
        try:
            return chat(model, messages)
        except Exception as e:
            print(f"Model {model} failed: {e}")
            continue
    raise Exception("All models failed")
Enter fullscreen mode Exit fullscreen mode

Step 5: Monitor Costs and Performance

Track usage across models:

import time

def chat_with_tracking(model, messages):
    start_time = time.time()

    response = chat(model, messages)

    duration = time.time() - start_time
    tokens_used = estimate_tokens(messages) + estimate_tokens([response])
    cost = calculate_cost(model, tokens_used)

    log_usage(model, tokens_used, cost, duration)

    return response
Enter fullscreen mode Exit fullscreen mode

Common Migration Issues

Issue 1: Rate Limits

Problem: Getting 429 errors

Solution:

  • Check rate limits in dashboard
  • Implement exponential backoff
  • Use multiple models as fallback
import time
from openai import RateLimitError

def chat_with_retry(model, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return chat(model, messages)
        except RateLimitError:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
    raise Exception("Max retries exceeded")
Enter fullscreen mode Exit fullscreen mode

Issue 2: Model Availability

Problem: Some models temporarily unavailable

Solution:

  • Maintain a list of backup models
  • Implement health checks
  • Use gateway's automatic routing

Issue 3: Response Format Differences

Problem: Slight variations in response format

Solution:

  • Standardize parsing logic
  • Don't rely on model-specific behaviors
  • Test thoroughly before production

Production Checklist

Before going live:

  • [ ] Test all critical workflows with target models
  • [ ] Set up monitoring and alerting
  • [ ] Configure rate limiting and quotas
  • [ ] Implement fallback mechanisms
  • [ ] Document model selection criteria
  • [ ] Train team on new API endpoints
  • [ ] Update CI/CD pipelines if needed
  • [ ] Plan rollback strategy

Cost Optimization Tips

  1. Use cheaper models for simple tasks: Not every query needs the most powerful model
  2. Implement caching: Cache frequent queries to reduce API calls
  3. Batch requests: Process multiple queries together when possible
  4. Monitor token usage: Track input/output ratios and optimize prompts
  5. Choose right context length: Don't use 128K context for short conversations

Conclusion

Migrating from OpenAI to Chinese LLMs doesn't have to be painful. With an OpenAI-compatible gateway:

  • Migration time: Hours instead of weeks
  • Code changes: 2-3 lines typically
  • Cost savings: 10-30x reduction possible
  • Model access: 200+ models instantly available

Start with non-critical workloads, test thoroughly, then gradually expand usage. The combination of cost savings and model diversity makes this migration worthwhile for most AI applications.


Questions about migration? Leave them in the comments!

Top comments (0)