Repository with runnable examples: github.com/apirouter-chat/apirouter-examples
DeepSeek is one of the strongest low-cost reasoning models available right now. The problem for many developers outside China is access: separate account, regional payment, different SDK.
This guide shows a shorter path. You can call DeepSeek using the OpenAI Python SDK through an OpenAI-compatible endpoint. Same SDK, same request format, two lines changed.
What you need
- Python 3.8+
- An API key from APIRouter (free $0.50 trial credit)
- The
openaiPython package
Current DeepSeek models
Two DeepSeek models are available in the public catalog:
| Model | Input / 1M | Output / 1M | Best for |
|---|---|---|---|
deepseek-ai/DeepSeek-V4-Flash |
$0.056 | $0.112 | Fast reasoning, coding, agent workflows |
deepseek-ai/DeepSeek-R1 |
$0.200 | $0.872 | Multi-step analysis, math, planning |
Other Chinese AI models on the same endpoint:
| Model | Input / 1M | Output / 1M | Best for |
|---|---|---|---|
Qwen/Qwen3.6-35B-A3B |
$0.0228 | $0.1828 | Latest-gen coding, multilingual |
moonshotai/Kimi-K2.6 |
$0.380 | $1.60 | Long-context documents |
zai-org/GLM-5.1 |
$0.560 | $1.76 | Structured engineering output |
MiniMaxAI/MiniMax-M2.5 |
$0.120 | $0.480 | Workflow automation |
Prices are USD per 1M tokens. Last updated 2026-05-12.
For many teams, the right pattern is not one model for everything. Use V4-Flash for high-volume everyday prompts, then route harder analysis to R1 only when the extra cost is justified.
Step 1: Install the SDK
pip install openai
If you've used OpenAI before, you already have this.
Step 2: Set up the client
from openai import OpenAI
client = OpenAI(
api_key="YOUR_APIROUTER_KEY",
base_url="https://apirouter.chat/v1"
)
Two changes from a standard OpenAI setup:
-
api_key— your APIRouter key -
base_url—https://apirouter.chat/v1
Step 3: Send your first request
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Flash",
messages=[
{"role": "user", "content": "Summarize the trade-offs between Redis and PostgreSQL for caching."},
],
)
print(response.choices[0].message.content)
The response follows the same chat.completions shape you already know:
{
"id": "chatcmpl_...",
"object": "chat.completion",
"model": "deepseek-ai/DeepSeek-V4-Flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "**Redis:** In-memory, sub-ms latency..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 22,
"completion_tokens": 187,
"total_tokens": 209
}
}
Streaming responses
For chat UIs and coding assistants:
stream = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Flash",
messages=[{"role": "user", "content": "Explain API routing in 3 bullets."}],
stream=True,
)
for event in stream:
delta = event.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Error handling
import openai
try:
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Flash",
messages=[{"role": "user", "content": "Hello"}],
)
except openai.AuthenticationError:
print("Invalid API key — check your APIRouter key")
except openai.RateLimitError:
print("Rate limited — retry with exponential backoff")
except openai.APIStatusError as e:
print(f"Error {e.status_code}: {e.message}")
Common status codes:
| Status | Cause | Fix |
|---|---|---|
| 401 | Missing or invalid key | Create a new key in console |
| 402 | Insufficient balance | Add credit or lower request size |
| 404 | Wrong model name | Copy exact name from the pricing page |
| 429 | Rate limited | Retry with backoff, reduce concurrency |
| 503 | Model unavailable | Try another model, check usage logs |
Before you go to production
A useful first evaluation does not need to be large. Use five prompts that represent your real product:
- Coding prompt — a realistic code-generation or review task
- Long prompt — a support ticket or document-heavy request
- Summarization prompt — a concise output task
- JSON-format prompt — structured output to validate parseability
- Refusal prompt — a safety-boundary check
Record output quality, latency, token usage, and whether the response can be consumed by the next step in your application.
If V4-Flash handles the set cleanly, keep it as the default. If it misses only the hardest reasoning prompt, keep R1 as a targeted second route instead of replacing everything with a more expensive model.
Avoid these mistakes
- Don't hard-code model names from blog posts. The pricing page is the source of truth. Copy the exact model string.
- Don't judge only by input price. Output length can dominate cost when the model writes long explanations or code blocks.
- Don't test with one prompt. One impressive demo answer doesn't prove the model fits your actual workload.
When DeepSeek is the right choice
DeepSeek is strong when you need a low-cost model that handles coding, structured analysis, and agent-style task breakdown.
If your workload is coding-led or multilingual, compare with Qwen. If it's document-heavy with long context, look at Kimi. If it's office workflow automation, try MiniMax.
All of them are accessible through the same endpoint, same API key.
Try it now
- Sign up at apirouter.chat/en — $0.50 free trial credit, no Chinese phone needed
- Create an API key in the console
- Copy the Python snippet above and run it
No separate DeepSeek account. No regional payment setup. One key, one balance, OpenAI-compatible from the first request.
Runnable examples: github.com/apirouter-chat/apirouter-examples
This guide uses APIRouter — an OpenAI-compatible API gateway for Chinese AI models including DeepSeek, Qwen, Kimi, GLM, and MiniMax. See live pricing for current token rates. No Chinese phone number or domestic payment method required.
Top comments (0)