V4 Pro launched April 24, 2026. Been running it on production agents since.
Specs
- Total params: 1.6T (MoE)
- Active params: 49B
- Context: 1M tokens (verified)
- Modes: Think / Non-Think dual
- License: MIT
- Pricing: $1.74/1M input, $3.48/1M output
API Setup (OpenAI-compatible)
client = OpenAI(
base_url="https://integrate.api.nvidia.com/v1",
api_key="<NVIDIA_NIM_KEY>"
)
response = client.chat.completions.create(
model="deepseek-ai/deepseek-v4-pro",
messages=[...]
)
Real-World Performance
- Long context tasks: Finally viable at scale (full conversation logs)
- Thinking mode: 8-15s, much better multi-step planning vs V3
- Non-thinking mode: ~2s, fast enough for content pipelines
- Function calling: More reliable than V3.2
Cost Comparison (per 1M tokens)
| Model | Input | Output |
|---|---|---|
| DeepSeek V4 Pro | $1.74 | $3.48 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| GPT-4o | $2.50 | $10.00 |
For agent workloads (lots of input, structured output), V4 Pro is the new sweet spot.
My agent automation guides updated for V4: https://yanmiay.gumroad.com
Top comments (0)