DeepSeek V4 lands with 1M context, open weights, and very aggressive API pricing

#ai #localmodels #coding #deeplearning

DeepSeek V4 lands with 1M context, open weights, and very aggressive API pricing

DeepSeek just put a serious marker down for the next round of open-model competition. The company says DeepSeek-V4 Preview is live now in chat, apps, and API, with open weights on Hugging Face and ModelScope.

The short version for builders: this is not just another benchmark post. DeepSeek is shipping a long-context model family with cheap API access, coding/agent claims, and enough compatibility work that teams can test it without rewriting their stack.

What shipped

DeepSeek says V4 comes in two variants:

DeepSeek-V4-Pro: the higher-capability model, aimed at reasoning, world knowledge, and harder agent/coding work.
DeepSeek-V4-Flash: the cheaper/faster option, with similar reasoning claims but weaker on harder agent tasks and world knowledge.

The company’s launch post says the preview is available today through chat.deepseek.com, the official app, and the API. For API users, the new model names are:

deepseek-v4-pro
deepseek-v4-flash

DeepSeek’s API docs list both models with:

1M context length
maximum output up to 384K
JSON output
tool calls
context caching
OpenAI-format base URL at https://api.deepseek.com
Anthropic-format base URL at https://api.deepseek.com/anthropic
thinking and non-thinking modes

Pricing is the part that will get tested very quickly. DeepSeek’s docs currently list:

Model	Input cache hit	Input cache miss	Output
DeepSeek-V4-Flash	$0.0028 / 1M tokens	$0.14 / 1M tokens	$0.28 / 1M tokens
DeepSeek-V4-Pro	$0.003625 / 1M tokens	$0.435 / 1M tokens	$0.87 / 1M tokens

That is extremely cheap if the real-world performance holds up, especially for agent loops and long-context workflows where token burn gets ugly fast.

Open weights, not just hosted API

DeepSeek has also published V4 on Hugging Face. The model pages show MIT licensing and large safetensor releases: roughly 862B total parameters for V4-Pro and 158B for V4-Flash, based on the Hugging Face metadata.

DeepSeek’s own write-up says V4 uses a hybrid attention setup combining Compressed Sparse Attention and Heavily Compressed Attention, plus Manifold-Constrained Hyper-Connections and Muon optimizer training. The claim to watch is long-context efficiency: the model card says V4-Pro needs 27% of the single-token inference FLOPs and 10% of the KV cache compared with DeepSeek-V3.2 at the 1M-token setting.

As always: treat vendor benchmarks as a starting gun, not a verdict.

Why builders should care

There are three practical angles here.

First, long-context agents get cheaper to experiment with. A 1M-token context window plus low cache-hit pricing makes it more realistic to keep big codebases, transcripts, logs, or research corpora in play without immediately blowing up the bill.

Second, API compatibility matters. Supporting both OpenAI-style and Anthropic-style APIs lowers the switching cost. If your app already abstracts providers, V4 should be a weekend experiment rather than a quarter-long migration.

Third, this keeps pressure on the closed-model labs. DeepSeek is explicitly positioning V4-Pro against top closed models and says its internal agentic coding experience is ahead of Sonnet 4.5 while still behind Opus 4.6 thinking mode. That is DeepSeek’s claim, not an independent result — but it is exactly the comparison engineering teams will run.

Caveats

This is a preview, and the big claims need independent testing: coding reliability, tool-call behavior, latency under load, refusal/safety behavior, and whether 1M context stays useful or just technically available.

Also note the model-name transition. DeepSeek’s API docs say the older deepseek-chat and deepseek-reasoner names are being deprecated, with the current aliases pointing to V4-Flash non-thinking and thinking modes during the transition. If you run production workloads on those aliases, pin and test deliberately instead of assuming nothing changed.

Still, this is a proper launch: new model family, open weights, hosted API, long context, agent/coding positioning, and pricing that could change the economics for a lot of AI products.