정상록

Posted on Apr 25

DeepSeek V4 is out: 1.6T parameters, 1M context, MIT license — and it costs 1/6 of Opus 4.7

DeepSeek released V4 on April 24, 2026 as an open-weight model under MIT license. The headline numbers reset expectations for what frontier-grade AI costs:

V4-Pro: 1.6T total params (49B active), 1M context, $1.74/$3.48 per 1M tokens
V4-Flash: 284B total (13B active), 1M context, $0.14/$0.28 per 1M tokens
License: MIT (full commercial + fine-tuning)
Training hardware: Huawei Ascend 950PR — zero NVIDIA chips
Weights: HuggingFace

For context, Claude Opus 4.7 sits at $15/$75 per 1M tokens. V4-Pro is roughly 22x cheaper at frontier-adjacent quality. That's not an incremental change — it's a category shift.

The pricing reality check

Let me put this in concrete terms. Imagine a SaaS service processing 100M tokens per month (a moderate usage tier for AI features in a B2B product):

Model	Monthly cost (input)	Monthly cost (output, 50M)
Claude Opus 4.7	$1,500	$3,750
GPT-5.5	$500	$1,500
DeepSeek V4-Pro	$174	$174
DeepSeek V4-Flash	$14	$14

That's the API price. With open weights, you can self-host V4-Flash (284B, 13B active) on a single H100 cluster and amortize the cost further. For a startup running a coding assistant or a document analysis SaaS, this means moving from "AI is the dominant infrastructure cost" to "AI is a rounding error."

What's actually new in the architecture

V4 isn't just V3 scaled up. There are two fundamental architectural changes worth understanding.

1. CSA + HCA hybrid attention

V3.2 used DSA (DeepSeek Sparse Attention). V4 introduces a two-level compression:

CSA (Compressed Sparse Attention): groups tokens along the sequence dimension to reduce attention FLOPs
HCA (Heavily Compressed Attention): additionally compresses the token dimension to shrink the KV cache

The result at 1M context length:

Inference FLOPs: 27% of V3.2 (V4-Flash: 10%)
KV cache: 10% of V3.2 (V4-Flash: 7%)

So context length grew 8x (128K → 1M) while memory pressure stayed roughly the same. The implication is that future scaling to 10M context shouldn't break the bank — the architecture is built for it.

2. Manifold-constrained hyper-connections (mHC)

DeepSeek replaced standard residual connections with mHC. The exact mechanism needs more study from the technical report, but the high-level effect is improved information flow stability in deep networks while increasing expressive capacity. This matters for stability when training trillion-parameter models with smaller compute budgets.

Benchmarks

Here's where V4-Pro lands:

Benchmark	V4-Pro	Notes
MMLU	90.1%	~2pp behind Gemini 3.1 Pro
MMLU-Pro	87.5%	Knowledge-intensive gap remains
HumanEval	90.0%	Top-tier coding
SWE-bench Verified	80.6%	Open-source #1
LiveCodeBench	93.5%	Best open model
GPQA Diamond	90.1%	Behind Gemini on hard science
GDPval (professional)	1554 pts	Open-source #1, overall #6

DeepSeek's own technical report says V4 is "narrowly behind" GPT-5.4 and Gemini 3.1 Pro. They estimate the gap with frontier closed-source at 3-6 months. For coding workloads specifically, V4 is at parity or slightly ahead.

The Huawei angle (this is the bigger story)

V4 was trained entirely on Huawei Ascend 950PR — the first frontier-grade open model trained without any NVIDIA hardware. Last year, Jensen Huang publicly described this scenario as a "disaster." It's now reality.

US export controls on AI chips were designed to slow down Chinese AI development. V4 demonstrates that the constraints have been routed around, at least at the model training level. For anyone building infrastructure decisions, this means:

NVIDIA's effective monopoly on AI training is over at the prosumer/enterprise tier
Chinese AI ecosystem is now fully self-sufficient for frontier model development
Geopolitical assumptions in your AI roadmap need updating if you assumed China would lag 2-3 years

Practical takeaways

If you're building with LLMs in 2026, here's what V4 changes:

For API consumers

Re-benchmark your stack: Run V4-Flash against your current Claude/GPT workload. For coding, document analysis, and summarization tasks, you'll likely find quality is acceptable at 1/50 the cost.
Consider hybrid routing: V4-Flash for high-volume routine tasks, Claude/GPT for the hardest reasoning. This routing pattern can cut total AI costs by 80%+.
Re-evaluate margin economics: If your SaaS product was barely viable due to AI costs, V4 might unlock previously infeasible business models.

For sovereign AI / regulated industries

Open weights + MIT license means:

Download V4-Flash, deploy on-prem
Never send sensitive data to external APIs
Comply with GDPR / HIPAA / industry-specific data residency rules
Fine-tune on proprietary data without licensing constraints

This is a meaningful unlock for finance, healthcare, legal, and government use cases that have been blocked from frontier AI by data residency requirements.

For the 1M context window

You can now load entire codebases, multi-year contract sets, or full project documentation into a single context. This doesn't kill RAG entirely (it's still more efficient at scale), but it does eliminate the need for RAG infrastructure for many lightweight document analysis tasks.

A practical example: instead of building a RAG system over your company's design docs, you can now load all docs (within 1M tokens) into context and ask questions directly. For internal tools, this is often simpler and produces better results.

For fine-tuning

MIT license + 1.6T base parameters opens the door to:

Domain-specific models (legal, medical, financial)
Language-specific models (e.g., Korean, Japanese, Spanish optimization)
Task-specific specialists (e.g., contract review, code review agents)

V3.2 fine-tunes have already shown IMO gold medal-level math performance via fine-tuning alone. V4 raises the ceiling further.

Closing thought

The AI market shifted yesterday. The closed-source moat narrative — "frontier models will always be 12-18 months ahead of open source" — is now harder to defend. DeepSeek's own admission of a 3-6 month gap, combined with MIT licensing and 1/6 pricing, means the practical advantage of using closed-source frontier models has shrunk to specific use cases.

For most builders, V4 is now the default open option. For sovereign AI deployments, it might be the default option, period.

References:

DEV Community