DeepSeek V4 launched on April 23, 2026, offering four checkpoints, a live API, and open weights (MIT license) on Hugging Face. You can access it instantly, make production API calls, or self-host for on-premise deployment. This guide covers all three options with actionable steps, tradeoffs, and a production-ready prompt workflow.
If you need an overview, start with what is DeepSeek V4. For API integration, read the DeepSeek V4 API guide. For zero-cost usage, see how to use DeepSeek V4 for free. When you’re ready to test, download Apidog to pre-build your API collection.
TL;DR
- Fastest: chat.deepseek.com — Free chat UI, V4-Pro by default, three reasoning modes.
- Production: Use
https://api.deepseek.com/v1/chat/completionswithdeepseek-v4-proordeepseek-v4-flash. - Self-hosted: Pull weights from Hugging Face, run the
/inferencecode. - Choose Non-Think for fast routing/classification, Think High for code/analysis, Think Max for accuracy-critical tasks.
- Recommended sampling:
temperature=1.0, top_p=1.0. - Use Apidog as your API client. The OpenAI-compatible format allows easy replay across DeepSeek, OpenAI, and Anthropic.
Pick the right path for your workload
Choose the integration that fits your needs:
| Path | Cost | Setup time | Best for |
|---|---|---|---|
| chat.deepseek.com | Free | 30 seconds | Quick tests, ad-hoc work |
| DeepSeek API | Per-token billing | 5 minutes | Production, agents, batch jobs |
| Self-hosted V4-Flash | Hardware cost only | A few hours | On-prem compliance, offline inference |
| Self-hosted V4-Pro | Cluster cost only | A day | Research, custom fine-tunes |
| OpenRouter / aggregator | Per-token billing | 2 minutes | Multi-provider fallback |
Path 1: Use V4 in the web chat
For the quickest evaluation, use the official chat interface:
- Go to chat.deepseek.com.
- Sign in (email, Google, or WeChat).
- V4-Pro is the default. Toggle between Non-Think, Think High, and Think Max at the top.
- Enter your prompt and send.
Web chat supports file uploads, web search, and full 1M-token context. Rate limits are per account; heavy use may slow responses.
Best for: error trace diagnosis, summarizing large PDFs, quick benchmark tests. Not suitable for automation or repeatable workflows.
Path 2: Use the DeepSeek API
The API is OpenAI-compatible and ready for production. Model IDs (deepseek-v4-pro, deepseek-v4-flash) are stable.
Get an API key
- Sign up at platform.deepseek.com.
- Add a payment method (top-ups from $2).
- Create an API key under API Keys and copy it (you won’t see it again).
Set the key in your environment:
export DEEPSEEK_API_KEY="sk-..."
Minimum viable request (curl)
curl https://api.deepseek.com/v1/chat/completions \
-H "Authorization: Bearer $DEEPSEEK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-pro",
"messages": [
{"role": "user", "content": "Refactor this Python function to async. Reply with code only."}
],
"thinking_mode": "thinking"
}'
Switch deepseek-v4-pro to deepseek-v4-flash for lower cost. Use non-thinking mode for faster outputs.
Python client (OpenAI SDK)
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com/v1",
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "system", "content": "You are a concise senior engineer."},
{"role": "user", "content": "Explain the CSA+HCA hybrid attention stack."},
],
extra_body={"thinking_mode": "thinking_max"},
temperature=1.0,
top_p=1.0,
)
print(response.choices[0].message.content)
Any OpenAI-compatible library (LangChain, LlamaIndex, DSPy) works by just changing the base URL.
Node client (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com/v1",
});
const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a fizzbuzz in Rust." }],
temperature: 1.0,
top_p: 1.0,
});
console.log(response.choices[0].message.content);
See the DeepSeek V4 API guide for endpoint details, parameters, and error handling.
Path 3: Iterate with Apidog
For repeated API calls, Apidog streamlines your workflow and helps manage credits.
- Download Apidog for your OS: Mac, Windows, Linux.
- Create a new API project. Add a POST request to
https://api.deepseek.com/v1/chat/completions. - Add
Authorization: Bearer {{DEEPSEEK_API_KEY}}as a header. Store the key in environment variables. - Paste your JSON request body and save. Replay or tweak with a click.
- Use the response viewer to compare outputs (e.g., Non-Think vs Think Max modes).
You can store multiple requests (OpenAI, Claude, DeepSeek) side by side for easy A/B testing and billing visibility. To migrate, simply change the base URL in your existing GPT-5.5 API collection.
Path 4: Self-host V4-Flash
The MIT license allows full self-hosting. This is ideal for compliance, air-gapped environments, or custom economics.
Hardware requirements
- V4-Flash (13B active, 284B total): 2–4 H100/H200/MI300X GPUs (FP8). Quantized INT4 fits on a single 80GB card for small batches.
- V4-Pro (49B active, 1.6T total): Requires 16–32 H100s for production inference.
Download model weights
# Install Hugging Face CLI
pip install -U "huggingface_hub[cli]"
# (Optional) Log in to reduce rate limits
huggingface-cli login
# Download V4-Flash weights
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash \
--local-dir ./models/deepseek-v4-flash \
--local-dir-use-symlinks False
V4-Flash is ~500GB at FP8; V4-Pro is several TB.
Run inference (vLLM)
pip install "vllm>=0.9.0"
vllm serve deepseek-ai/DeepSeek-V4-Flash \
--tensor-parallel-size 4 \
--max-model-len 1048576 \
--dtype auto
Once running, point OpenAI-compatible clients to http://localhost:8000/v1. You can use the same Apidog collection with the new base URL.
Prompting V4 effectively
DeepSeek V4’s prompt handling differs from GPT-5.5 or Claude. For best results:
-
Set
thinking_modeexplicitly for each task. - Use system prompts for persona only, not task instructions. Place the main task in the user message.
- For code tasks, provide a test harness or failing test case to increase solution accuracy.
For long-context prompts, keep the most relevant info at the beginning and end. V4’s attention is optimized, but recency and primacy effects remain.
Cost control
To prevent overspending, apply these safeguards:
- Default to V4-Flash. Use V4-Pro only for proven quality needs.
- Default to Non-Think. Use Think High or Think Max as required.
-
Set
max_tokens. The 1M context is a limit, not a target. Most responses fit in 2,000 tokens.
In Apidog, use environment variables for DEEPSEEK_API_KEY to separate test and production billing. Apidog also records token counts per response to help spot runaway prompts.
Migrating from DeepSeek V3 or other models
-
From
deepseek-chat/deepseek-reasoner: Change model ID todeepseek-v4-proordeepseek-v4-flash. Old IDs deprecate July 24, 2026. -
From OpenAI GPT-5.x: Change base URL to
https://api.deepseek.com/v1and model ID. Request shape is otherwise unchanged. See the GPT-5.5 API guide for reference. -
From Anthropic Claude: Use
https://api.deepseek.com/anthropicfor Anthropic message format, or convert to OpenAI format for main endpoint.
FAQ
Do I need a paid account? Web chat is free. API access requires a minimum $2 top-up. See how to use DeepSeek V4 for free for no-cost options.
Which variant should I use? Start with V4-Flash in Non-Think mode, measure quality, and only upgrade if necessary.
Can I run V4 on a MacBook? V4-Flash runs on M3/M4 Max with 128GB RAM (INT4, slow). V4-Pro requires much more. For laptops, use the API or web chat.
Does V4 support function calling? Yes. The OpenAI-compatible endpoint accepts the standard tools array. Responses return tool_calls. Anthropic endpoint uses its native schema.
How do I stream responses? Set stream: true in your request body. Responses are OpenAI-compatible SSE streams; any compatible library works out of the box.
Are there rate limits? Hosted API has per-tier limits (api-docs.deepseek.com). Self-hosted: limited only by hardware.


Top comments (0)