DeepSeek V4-Pro and V4-Flash: What Just Launched and How It Compares

#deepseek #aimodels #deepseekv4 #llm

DeepSeek just dropped preview versions of V4-Pro and V4-Flash, and I've been digging through the docs, benchmarks, and API pricing since the announcement hit this morning. Short version: it's real, it's good, and the price-to-performance story is — again — the most interesting thing about it.

But let me be clear upfront: this isn't an R1 moment. Not even close.

What Is DeepSeek, and Why Does It Keep Showing Up?

If you missed the original disruption: DeepSeek is a Hangzhou-based AI lab that shocked the industry in early 2025 by releasing R1, an open-weight reasoning model that matched or beat GPT-4o on several benchmarks — at a fraction of the development cost. The release triggered a literal stock selloff among U.S. semiconductor companies, because it suggested the AI arms race might be cheaper to fight than everyone assumed.

Since then, they've kept pushing. V3 followed R1 with stronger general capabilities. And now V4.

DeepSeek runs lean by Western lab standards. They're not burning $100M compute runs. They're finding efficiency gains through architecture, and then passing the pricing difference to developers. That's the pattern. V4 continues it.

Two Models, Two Use Cases

DeepSeek launched two distinct versions today.

V4-Pro is their flagship. 1.6 trillion total parameters, 49 billion active at any given moment — classic mixture-of-experts architecture where only a slice of the model fires per token. It's the largest open-weight model currently available. The focus areas are agentic coding, STEM reasoning, and long-document tasks.

V4-Flash is the workhorse. 284 billion total parameters, 13 billion active. Faster, cheaper, and — according to DeepSeek — its reasoning accuracy "closely matches" V4-Pro on many tasks. Take that claim with some skepticism, but if accurate, Flash becomes a very interesting option for high-volume workloads.

Both share the same headline specs: a 1 million token context window (that's a full codebase or a long technical manual in a single prompt), support for thinking and non-thinking modes (their reasoning toggle), and compatibility with both the OpenAI ChatCompletions API and Anthropic's API format. That last part is practical. If you're already on either provider, swapping in DeepSeek is basically a one-line config change.

The big technical move they're highlighting is what they call Hybrid Attention Architecture paired with DeepSeek Sparse Attention (DSA). The short version: it cuts memory requirements for long-context inference dramatically — 9.5x to 13.7x less KV cache memory depending on configuration. For actually running 1M-token workloads at scale, that's not a benchmark number. It's what makes the economics work.

One thing both models don't do: images, audio, video. Text-only. That gap is real and worth noting when you're comparing against multimodal-first competitors.

What the Benchmarks Actually Show

DeepSeek's own launch framing is unusually honest: they say V4 "almost closed the gap" with frontier models. "Almost" is doing some heavy lifting there.

The independent read: V4 trails GPT-5.4 and Gemini 3.1 Pro by roughly three to six months on the development curve. On coding benchmarks and STEM reasoning, it's competitive with some GPT-5.4 variants on specific tasks. On knowledge and factual benchmarks, the gap is wider.

Where it wins clearly: it beats every other open-weight model. If you're in the open-source world comparing against Llama, Mistral, or older Qwen releases, V4-Pro is the new ceiling.

The standard benchmark caveats apply. DeepSeek V3 posted strong benchmark results and was genuinely useful in production — but also had real-world rough edges that didn't show up in the tables. V4 is a preview release. Give it a few weeks before making hard infrastructure decisions based on launch-day numbers.

Pricing — This Is the Part That Matters

Every DeepSeek launch has a pricing table that makes OpenAI product managers uncomfortable. V4 is no different.

Model	Input (per 1M tokens)	Output (per 1M tokens)
DeepSeek V4-Flash	$0.14	$0.28
DeepSeek V4-Pro	$1.74	$3.48
OpenAI GPT-5.4	~$5.00	~$30.00
Anthropic Claude	~$3.00	~$25.00

V4-Pro is roughly one-tenth the output cost of GPT-5.4. V4-Flash is less than one cent per one thousand output tokens.

For low-volume use, these differences are academic. For high-volume applications — document processing pipelines, coding assistants handling hundreds of API calls per user per day, data extraction at scale — the math changes completely. The spreadsheet looks different when output tokens are $0.28 versus $30.

DeepSeek has also signaled further price reductions later in 2026 as Huawei scales production of the Ascend 950 chips powering V4's inference. The chip partnership is worth a quick note: DeepSeek built V4 for Huawei's Ascend platform specifically, reducing dependence on Nvidia hardware under U.S. export controls. It's a geopolitical story as much as a technical one.

Who Should Use Which Model

V4-Pro is the right call if you're building developer tooling, running code analysis pipelines, or working with long-document retrieval where context window size matters. The performance gap on coding tasks versus U.S. frontier models is narrower than it is on general knowledge, and the pricing advantage is significant.

V4-Flash makes sense if you're running high-volume text processing, need fast responses at scale, or want to evaluate DeepSeek before committing V4-Pro to production. It's the lower-risk entry point. If the Flash/Pro capability gap is as small as claimed, it's also the more economical default for most use cases.

Stay with GPT-5.4 or Claude if you need multimodal capabilities — images, audio, document vision — or depend heavily on the plugin and tool ecosystems those providers have built. You can compare both options in detail in our ChatGPT review and Claude AI review.

Stick with local open-source runs if you need air-gapped inference. V4 is available as open weights, but V4-Pro at 1.6 trillion parameters requires hardware most teams don't have sitting around. Flash is more tractable but still substantial.

For a broader view of where DeepSeek fits against the full chatbot field, the best AI chatbots roundup for 2026 has the full comparison.

What V4 Still Can't Do

This is the section launch articles usually skip. So let's not.

No multimodal input. V4 is text-only. In a world where GPT-5.4 and Gemini 3.1 Pro handle images, audio, and increasingly video natively, that's a real gap for any application touching real-world data. If you're building anything that processes screenshots, documents with embedded charts, or audio content — DeepSeek isn't your model yet.

It's a preview. DeepSeek is explicit about this. The full release with final training checkpoints isn't here. Performance numbers may shift. API reliability under production load at scale is still being proven. Worth keeping in mind before migrating critical infrastructure.

Knowledge benchmarks lag. On factual recall and world-knowledge tasks, the performance gap versus GPT-5.4 and Gemini 3.1 Pro is more pronounced than on reasoning and code. If you're building research tools or anything that needs reliable world-knowledge grounding, the benchmark difference isn't just noise.

Old model deprecation is coming. If you're running deepseek-chat or deepseek-reasoner in production, note that both get shut off on July 24, 2026. Migration to V4 variants isn't optional — build that into your planning.

Verdict: Is This Another DeepSeek Disruption Moment?

Honestly? No.

V4 is a strong, well-executed step in the direction DeepSeek has been moving since R1. The pricing undercut is real. The 1M context window is useful. The API compatibility story is thoughtful. The Huawei chip partnership is strategically interesting.

But "trails frontier models by 3-6 months" lands very differently than "matches GPT-4o at a fraction of the cost." R1's disruption came from surprise. V4 fits the pattern everyone now expects from DeepSeek — good open-weight models, aggressive pricing, growing capability. The market has priced that story in.

The more interesting disruption moment comes later. When the full V4 release lands, when Huawei's Ascend production scales and pricing drops further, and when multimodal support arrives — that's when the competitive pressure on U.S. frontier models gets acute again.

For now: V4-Flash at $0.28/M output is worth testing immediately for any high-volume developer use case. V4-Pro is worth benchmarking against your actual workload before committing. Neither deserves panic or uncritical hype.

Which is, come to think of it, exactly how you should evaluate most model releases.