DeepSeek-V4 Preview: Entering the Era of Accessible Million-Token Context

Today, we are officially launching and open-sourcing the preview release of DeepSeek-V4, our new model family.

DeepSeek-V4 supports an ultra-long 1M-token context window and reaches leading performance in China and across the open-source ecosystem in agent capabilities, world knowledge, and reasoning. The model family is available in two sizes.

Starting today, you can visit chat.deepseek.com or use the official DeepSeek app to chat with the latest DeepSeek-V4 models and explore the new experience enabled by 1M-context memory.

The API service has also been updated. To call the new models, simply change model_name to either deepseek-v4-pro or deepseek-v4-flash.

DeepSeek-V4-Pro: Performance Comparable to Top Closed-Source Models

Significantly Improved Agent Capabilities

Compared with the previous generation, DeepSeek-V4-Pro delivers a substantial improvement in agent capabilities.

In agentic coding evaluations, V4-Pro has reached the strongest level currently available among open-source models. It also performs well across other agent-related benchmarks.

DeepSeek-V4 is now used internally as the company’s agentic coding model. According to evaluation feedback, its user experience is better than Sonnet 4.5, and its delivery quality is close to Opus 4.6 in non-thinking mode. However, it still trails Opus 4.6 in thinking mode.

Rich World Knowledge

In world knowledge evaluations, DeepSeek-V4-Pro significantly outperforms other open-source models and is only slightly behind the top closed-source model, Gemini-Pro-3.1.

World-Class Reasoning Performance

Across evaluations in mathematics, STEM, and competitive programming, DeepSeek-V4-Pro surpasses all open-source models with public benchmark results to date, achieving performance comparable to the world’s leading closed-source models.

DeepSeek-V4-Flash: A Faster and More Cost-Efficient Option

Compared with DeepSeek-V4-Pro, DeepSeek-V4-Flash is slightly weaker in world knowledge, but demonstrates similar reasoning capabilities.

Because it has fewer parameters and lower activation requirements, V4-Flash can provide faster and more economical API service.

In agent evaluations, DeepSeek-V4-Flash performs on par with DeepSeek-V4-Pro on simple tasks, but still shows a gap on more difficult tasks.

Architectural Innovation and Highly Efficient Long Context

DeepSeek-V4 introduces a new attention mechanism that compresses along the token dimension. Combined with DSA sparse attention—DeepSeek Sparse Attention—it achieves globally leading long-context capability while substantially reducing compute and memory requirements compared with traditional approaches.

Starting now, 1M context will become the standard configuration for all official DeepSeek services.

Targeted Optimization for Agent Workloads

DeepSeek-V4 has been adapted and optimized for mainstream agent products such as Claude Code, OpenClaw, OpenCode, and CodeBuddy.

It shows improvements across code tasks, documentation generation, and related workflows. The following figure shows an example of a PPT slide generated by V4-Pro within an agent framework.

Scroll up and down or click to enlarge.

API Access

Due to limited access to high-end compute, Pro currently has very limited service throughput. Its pricing is expected to drop significantly in the second half of the year once Ascend 950 supernodes begin coming online at scale.

The DeepSeek API now supports both V4-Pro and V4-Flash, with compatibility for the OpenAI Chat Completions API and the Anthropic API.

The base_url remains unchanged. To access the new models, set the model parameter to one of the following:

deepseek-v4-pro
deepseek-v4-flash

Both V4-Pro and V4-Flash support a maximum context length of 1M tokens. Both models support non-thinking mode and thinking mode.

In thinking mode, the reasoning_effort parameter can be used to set the reasoning intensity:

high
max

For complex agent scenarios, we recommend using thinking mode and setting the reasoning intensity to max.

For model invocation and parameter configuration, please refer to the API documentation:

https://api-docs.deepseek.com/zh-cn/guides/thinking_mode

Please note that the two legacy API model names, deepseek-chat and deepseek-reasoner, will be discontinued in three months, on July 24, 2026.

During the transition period, these two model names will point to the following modes:

deepseek-chat      -> deepseek-v4-flash, non-thinking mode
deepseek-reasoner  -> deepseek-v4-flash, thinking mode

Open Weights and Local Deployment

DeepSeek-V4 model weights are available at:

https://huggingface.co/collections/deepseek-ai/deepseek-v4

https://modelscope.cn/collections/deepseek-ai/DeepSeek-V4

The DeepSeek-V4 technical report is available here:

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Closing Thoughts

“Do not be tempted by praise, do not fear criticism. Follow the right path, and hold yourself upright.”

Thank you to every user for your trust and support. Your recognition, suggestions, and expectations are what drive us to keep exploring and improving. They also remind us to stay true to our original mission and remain focused on continuous innovation.

We will continue to follow a long-termist approach, move forward steadily through experimentation and reflection, and keep working toward the goal of AGI.