apirouter

Posted on May 12 • Originally published at apirouter.chat

DeepSeek API Quickstart: Call It from Anywhere in Under 5 Minutes

#deepseek #api #python #chineseai

Repository with runnable examples: github.com/apirouter-chat/apirouter-examples

DeepSeek is one of the strongest low-cost reasoning models available right now. The problem for many developers outside China is access: separate account, regional payment, different SDK.

This guide shows a shorter path. You can call DeepSeek using the OpenAI Python SDK through an OpenAI-compatible endpoint. Same SDK, same request format, two lines changed.

What you need

Python 3.8+
An API key from APIRouter (free $0.50 trial credit)
The openai Python package

Current DeepSeek models

Two DeepSeek models are available in the public catalog:

Model	Input / 1M	Output / 1M	Best for
`deepseek-ai/DeepSeek-V4-Flash`	$0.056	$0.112	Fast reasoning, coding, agent workflows
`deepseek-ai/DeepSeek-R1`	$0.200	$0.872	Multi-step analysis, math, planning

Other Chinese AI models on the same endpoint:

Model	Input / 1M	Output / 1M	Best for
`Qwen/Qwen3.6-35B-A3B`	$0.0228	$0.1828	Latest-gen coding, multilingual
`moonshotai/Kimi-K2.6`	$0.380	$1.60	Long-context documents
`zai-org/GLM-5.1`	$0.560	$1.76	Structured engineering output
`MiniMaxAI/MiniMax-M2.5`	$0.120	$0.480	Workflow automation

Prices are USD per 1M tokens. Last updated 2026-05-12.

For many teams, the right pattern is not one model for everything. Use V4-Flash for high-volume everyday prompts, then route harder analysis to R1 only when the extra cost is justified.

Step 1: Install the SDK

pip install openai

If you've used OpenAI before, you already have this.

Step 2: Set up the client

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_APIROUTER_KEY",
    base_url="https://apirouter.chat/v1"
)

Two changes from a standard OpenAI setup:

api_key — your APIRouter key
base_url — https://apirouter.chat/v1

Step 3: Send your first request

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash",
    messages=[
        {"role": "user", "content": "Summarize the trade-offs between Redis and PostgreSQL for caching."},
    ],
)

print(response.choices[0].message.content)

The response follows the same chat.completions shape you already know:

{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "model": "deepseek-ai/DeepSeek-V4-Flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "**Redis:** In-memory, sub-ms latency..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 187,
    "total_tokens": 209
  }
}

Streaming responses

For chat UIs and coding assistants:

stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash",
    messages=[{"role": "user", "content": "Explain API routing in 3 bullets."}],
    stream=True,
)

for event in stream:
    delta = event.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Error handling

import openai

try:
    response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V4-Flash",
        messages=[{"role": "user", "content": "Hello"}],
    )
except openai.AuthenticationError:
    print("Invalid API key — check your APIRouter key")
except openai.RateLimitError:
    print("Rate limited — retry with exponential backoff")
except openai.APIStatusError as e:
    print(f"Error {e.status_code}: {e.message}")

Common status codes:

Status	Cause	Fix
401	Missing or invalid key	Create a new key in console
402	Insufficient balance	Add credit or lower request size
404	Wrong model name	Copy exact name from the pricing page
429	Rate limited	Retry with backoff, reduce concurrency
503	Model unavailable	Try another model, check usage logs

Before you go to production

A useful first evaluation does not need to be large. Use five prompts that represent your real product:

Coding prompt — a realistic code-generation or review task
Long prompt — a support ticket or document-heavy request
Summarization prompt — a concise output task
JSON-format prompt — structured output to validate parseability
Refusal prompt — a safety-boundary check

Record output quality, latency, token usage, and whether the response can be consumed by the next step in your application.

If V4-Flash handles the set cleanly, keep it as the default. If it misses only the hardest reasoning prompt, keep R1 as a targeted second route instead of replacing everything with a more expensive model.

Avoid these mistakes

Don't hard-code model names from blog posts. The pricing page is the source of truth. Copy the exact model string.
Don't judge only by input price. Output length can dominate cost when the model writes long explanations or code blocks.
Don't test with one prompt. One impressive demo answer doesn't prove the model fits your actual workload.

When DeepSeek is the right choice

DeepSeek is strong when you need a low-cost model that handles coding, structured analysis, and agent-style task breakdown.

If your workload is coding-led or multilingual, compare with Qwen. If it's document-heavy with long context, look at Kimi. If it's office workflow automation, try MiniMax.

All of them are accessible through the same endpoint, same API key.

Try it now

Sign up at apirouter.chat/en — $0.50 free trial credit, no Chinese phone needed
Create an API key in the console
Copy the Python snippet above and run it

No separate DeepSeek account. No regional payment setup. One key, one balance, OpenAI-compatible from the first request.

Runnable examples: github.com/apirouter-chat/apirouter-examples

This guide uses APIRouter — an OpenAI-compatible API gateway for Chinese AI models including DeepSeek, Qwen, Kimi, GLM, and MiniMax. See live pricing for current token rates. No Chinese phone number or domestic payment method required.

Top comments (1)

Lina Chen • May 18

Useful quickstart. The part I always add to a first-call checklist is log verification: after the tiny request succeeds, check the dashboard for requested model, actual provider model, input/output tokens, timestamp, latency, and the exact error shape for one forced bad model name. That catches most gateway issues before a team moves a real workload.