Skila AI

Posted on Apr 25 • Originally published at news.skila.ai

DeepSeek Just Open-Sourced a Claude-Tier Model. The Pricing Math Breaks Everything.

#ai #programming #machinelearning #tutorial

Originally published at news.skila.ai

DeepSeek shipped V4-Pro and V4-Flash on April 24, 2026. Open weights. MIT license. One-million-token context window.

On SWE-bench Verified it scores 80.6%. Claude Opus 4.6 scores 80.8%. The gap is 0.2 points.

Output tokens cost $3.48 per million. Anthropic charges $25 for Opus 4.6 output. OpenAI charges $30 for GPT-5.5 output. That is not a discount. That is a category break.

If you built your AI cost model last month on closed-frontier APIs, it just broke. Here is exactly what DeepSeek shipped, what the benchmarks actually say, and what you should change about your stack this week.

What Actually Shipped

Two models, both released under MIT license on Hugging Face:

DeepSeek V4-Pro. 1.6 trillion total parameters. 49 billion active per token via Mixture-of-Experts. Pre-trained on 33 trillion tokens. Context window: 1,048,576 tokens. API pricing: $0.50 per million input, $3.48 per million output.
DeepSeek V4-Flash. Smaller, faster, cheaper sibling at $0.28 per million tokens. Built for high-throughput agentic loops where you do not need the Pro-tier reasoning.

Both models ship with open weights. You can download them, run them on your own infrastructure, fine-tune them, and serve them at whatever margin you want.

The Benchmarks Are the Story

SWE-bench Verified: 80.6% (Claude Opus 4.6: 80.8%, GPT-5.5: high-70s)
Terminal-Bench 2.0: 67.9% (Claude Opus 4.6: 65.4%) — DeepSeek wins
LiveCodeBench: 93.5% (Claude Opus 4.6: 88.8%) — DeepSeek wins
Codeforces rating: 3,206 — top fraction of 1% of competitive programmers worldwide

On the three benchmarks that matter most for AI coding agents — agentic tasks, terminal operations, and algorithmic coding — DeepSeek either matches or beats the closed-frontier leader. And it does it at 14% of the price.

The Pricing Collapse

Model	Input $/M	Output $/M	SWE-bench
DeepSeek V4-Pro	$0.50	$3.48	80.6%
Claude Opus 4.6	$15.00	$25.00	80.8%
GPT-5.5	$5.00	$30.00	~78%
DeepSeek V4-Flash	$0.14	$0.28	—

If you run an AI coding agent generating 10 million output tokens per day:

$250/day on Claude Opus 4.6
$300/day on GPT-5.5
$34.80/day on DeepSeek V4-Pro

That is a $215–265 daily delta for workloads that benchmark within noise of each other.

The Huawei Chip Story

DeepSeek V4-Pro was trained entirely on Huawei Ascend chips. No Nvidia H100s. No H200s. The full training run — 33 trillion tokens, 1.6 trillion parameters — ran on Chinese-manufactured silicon that US export controls cannot reach.

US policy for three years assumed cutting off Nvidia shipments would cap Chinese frontier AI. That assumption is now empirically false.

The 1M Context Window (Asterisk Required)

Every 1M-context model — Gemini 3.1 Pro, DeepSeek V4-Pro, Claude Opus 4.7 — drops accuracy below 70% on needle-in-a-haystack tasks past 200K tokens. The lost-in-the-middle effect kicks in past 500K, causing the model to forget the middle 40% of the prompt.

Treat the 1M context window as useful for the first 150K–200K tokens. Stuff critical information at the top and bottom of your prompt — never in the middle.

What This Means for Your Stack

Add a second tier. Run V4-Flash for high-volume low-stakes work. Keep Claude Opus 4.6 for compliance-bound or multi-turn planning tasks.
Self-hosting is back on the table. Open weights mean you can serve V4-Pro at cost on your own GPU cluster.
Frontier pricing is going to move. Anthropic and OpenAI cannot hold $25–30/M output when a benchmark-equivalent open model charges $3.48. Expect price cuts within 90 days.

The Catch

Data policy: The DeepSeek API routes through Chinese infrastructure. May not clear GDPR, SOC 2, or HIPAA reviews. Self-hosted weights solve this.
Real-world gap: Early community reports show V4-Pro is slightly behind Claude on long-context reasoning and multi-turn planning despite leading on benchmarks.

Verdict

DeepSeek V4-Pro is the most important open-source AI release since Llama 3. It ties the closed-frontier leader on the benchmark that matters most for coding agents. It costs 14% of the price. It runs on chips US export controls cannot stop.

Add V4-Flash to your high-volume tier this week. Evaluate V4-Pro against your critical-path workloads over the next month. Keep Claude Opus 4.6 for compliance-bound work.

Full article with benchmarks, pricing table, and self-hosting notes: news.skila.ai

DEV Community