DeepSeek V4: The Dev-Focused Guide to the 1.6T Open Model
DeepSeek released V4 on April 23, 2026—a major Mixture-of-Experts (MoE) family upgrade. Four checkpoints dropped, led by DeepSeek-V4-Pro (1.6T parameters, MIT license, 1M-token context). V4-Flash is the smaller sibling (284B parameters, same context, open weights). Benchmarks put V4-Pro ahead of Claude Opus 4.6 for code, close behind GPT-5.4 xHigh on MMLU-Pro.
If you’re choosing between Claude, GPT-5.5, Qwen, or DeepSeek V4, this guide covers what’s new, architectural changes from V3.2, implementation details, and how to run it right now.
For hands-on integration, see the DeepSeek V4 API guide, free-access guide, and the full usage walkthrough. The API mirrors OpenAI’s request shape, so you can pre-build collections in Apidog before you have an API key.
TL;DR
- DeepSeek V4: Mixture-of-Experts, released April 23, 2026, MIT license.
- Four checkpoints: V4-Pro, V4-Pro-Base, V4-Flash, V4-Flash-Base.
- V4-Pro: 1.6T total, 49B active params; V4-Flash: 284B total, 13B active.
- Both: 1M-token context, three reasoning modes (Non-Think, Think High, Think Max).
- Headline scores (Pro): LiveCodeBench 93.5, Codeforces 3206, MMLU-Pro 87.5.
- API live at
api.deepseek.com(model IDs:deepseek-v4-pro,deepseek-v4-flash). Weights on Hugging Face and ModelScope.
What DeepSeek V4 Actually Is
V4 succeeds the V3 and V3.2 lines, keeping the MoE architecture but changing model shape. V4-Pro activates 49B of 1.6T parameters per token—so inference is closer to a 50B dense model. Full technical details: DeepSeek V4 model card.
Checkpoints
- DeepSeek-V4-Pro: 1.6T total, 49B active, 1M context. This is the main production API model.
- DeepSeek-V4-Pro-Base: Pre-trained only, for custom fine-tunes.
- DeepSeek-V4-Flash: 284B total, 13B active, 1M context. For latency/local deploy on 2–3 H100s.
- DeepSeek-V4-Flash-Base: Pre-trained, for research/fine-tune.
All checkpoints: MIT license. You can download, mirror, fine-tune, and deploy with no license fee.
What Changed from V3.2
V4 improves on code and reasoning benchmarks by rewriting the attention stack and training pipeline.
| Capability | V3.2 | V4-Pro |
|---|---|---|
| Total parameters | 685B | 1.6T |
| Active parameters | 37B | 49B |
| Context window | 128K | 1M |
| Inference FLOPs (1M context) | baseline | 27% of V3.2 |
| KV cache (1M context) | baseline | 10% of V3.2 |
| Precision | FP8 | FP4 + FP8 mixed |
| License | DeepSeek License | MIT |
| Reasoning modes | single | three |
Key drivers:
- Hybrid attention stack: Combines Compressed Sparse Attention and Heavily Compressed Attention for efficient, long-context inference, shrinking KV cache to 10% and FLOPs to 27% of V3.2 at 1M tokens.
- Manifold-Constrained Hyper-Connections: Stabilizes gradients for deep stacking.
- Muon optimizer: Faster convergence and better handling of large gradient norms.
Training corpus: 32T+ tokens; post-training: two-stage pipeline (domain experts, then on-policy distillation).
Benchmarks That Matter
V4-Pro is top-tier on code and knowledge tasks, with some gaps in long-context retrieval.
V4-Flash delivers:
- MMLU-Pro: 86.2
- GPQA Diamond: 88.1
- LiveCodeBench: 91.6
- Codeforces: 3052
- SWE Verified: 79.0
See DeepSeek V4-Flash card for full tables.
Summary:
- V4-Pro leads on code and factual recall, but Gemini 3.1 Pro leads MMLU-Pro; Claude Opus is stronger for 1M-token retrieval.
- For coding, agentic tasks, and complex analysis, V4-Pro is competitive. For “needle-in-a-haystack” retrieval, Claude is better.
Three Reasoning Modes
Each V4 checkpoint exposes three modes, controlled by the thinking_mode parameter (API) or a script flag:
- Non-Think: Fast, no reasoning tokens. Use for classification, routing, or summaries.
- Think High: Default for complex tasks. The model generates reasoning traces and plans actions.
- Think Max: Longer reasoning, self-critique, recommended for 384K+ context. Highest accuracy, highest token cost.
Sampling settings:
temperature=1.0, top_p=1.0 across all modes.
Architecture in Plain English
Three architectural choices drive V4's efficiency:
- Hybrid attention: Most layers use Compressed Sparse Attention (full attention on "important" tokens, compression elsewhere); some use Heavily Compressed Attention (close to linear cost). This enables efficient scaling to 1M tokens.
- Manifold-Constrained Hyper-Connections: Residuals are constrained to stable manifolds, enabling deeper stacking without gradient instability.
- Muon optimizer: Replaces AdamW, better for MoE gradients and faster convergence.
Availability Today
All four checkpoints and the API are live as of April 24, 2026.
| Surface | Access |
|---|---|
| chat.deepseek.com | Free web chat, V4-Pro default, login required |
| DeepSeek API | Live at api.deepseek.com; model IDs deepseek-v4-pro, deepseek-v4-flash
|
| Hugging Face weights | V4-Pro, V4-Flash, both MIT |
| ModelScope | Mirrored weights for users in China |
| OpenRouter and aggregators | Expected within days; typical DeepSeek launch pattern |
deepseek-chat / deepseek-reasoner
|
Deprecated July 24, 2026 |
Migration note:
If using deepseek-chat in production, migrate to deepseek-v4-pro or deepseek-v4-flash within three months.
How It Compares to GPT-5.5 and Claude
- Cost: V4-Pro/Flash are open weights (MIT). GPT-5.5 and Claude Opus are closed. Self-hosting V4 is cheaper at scale.
- Coding: V4-Pro (LiveCodeBench 93.5, Codeforces 3206) beats GPT-5.5 and Claude on code.
- Knowledge: Gemini 3.1 Pro leads MMLU-Pro. V4-Pro/GPT-5.5 tie at 87.5.
- Long-context retrieval: Claude Opus leads MRCR 1M. For deep retrieval, Claude is safer.
- License: MIT lets you embed V4-Pro in your product freely.
What to Build With It
Good fits for DeepSeek V4:
- Agentic coding loops: Multi-file debugging, repo refactoring, test fixes. Use with Apidog for API inspection and prompt tuning.
- Long-document reasoning: 1M tokens handles large repos, contracts, research corpora. Use Think High mode.
- Self-hosted AI products: V4-Flash is the first open-weights model with frontier quality for on-prem.
- Research/fine-tuning: Base checkpoints let you train specialist models with your own data.
Not ideal for: high-volume classification, embedding retrieval, or short-prompt chat (older DeepSeek models are cheaper for those).
Pricing in One Line
As of writing, V4 API pricing is not final. V3.2 was ~$0.28/million input tokens, $0.42/million output tokens. Expect V4-Flash at a similar rate, V4-Pro at a premium. Closed competitors charge $5–$15/million input tokens. For updates, see the DeepSeek pricing page.
How to Test V4 Today
Three ways to get started (fastest first):
- Web chat: Go to chat.deepseek.com, sign in. V4-Pro is default. Toggle to Think High in UI. Free, no card.
-
API: Get an API key, point your client at
https://api.deepseek.com, set"model": "deepseek-v4-pro". Request shape is OpenAI-compatible—swap the base URL in any OpenAI client. Full guide: DeepSeek V4 API guide. -
Local weights: Download from Hugging Face or ModelScope. V4-Flash runs on 2–4 H100s, V4-Pro requires a cluster. Inference code is in
/inferenceof the repo.
For a prompt-iteration workflow using Apidog, see how to use DeepSeek V4. For zero-cost usage, see how to use DeepSeek V4 for free. Download Apidog and pre-build your collection; the OpenAI-compatible format supports DeepSeek, OpenAI, and other APIs with one request.
FAQ
Is DeepSeek V4 really open source?
Yes. All checkpoints are MIT licensed for commercial use, modification, and redistribution.
Do I need a GPU cluster to run V4-Flash?
For full precision: 2–4 H100s/H200s for V4-Flash. Less if quantized. V4-Pro needs a full cluster. To test without hardware, use the API or chat.deepseek.com.
When does V4 hit the DeepSeek API?
Live as of April 23, 2026. Model IDs: deepseek-v4-pro, deepseek-v4-flash. Old IDs (deepseek-chat, deepseek-reasoner) deprecated July 24, 2026.
How does V4 compare to Kimi and Qwen?
V4-Pro posts higher LiveCodeBench and Codeforces numbers than Kimi K2 and Qwen 3 Max. All are open-weights MoE models. Choose based on the best benchmark for your use case.
Can I fine-tune V4 on my data?
Yes, use the Base checkpoints and a standard SFT pipeline. MIT license permits commercial redistribution.
Will V4 work with my existing OpenAI-compatible tooling?
Yes, the API accepts OpenAI and Anthropic formats:
-
https://api.deepseek.com(OpenAI) -
https://api.deepseek.com/anthropic(Anthropic) Most OpenAI clients work with a base-URL change. For a parallel pattern, see the GPT-5.5 API walkthrough.



Top comments (0)