Originally published on AI Tech Connect.
The headline number and why it matters On the SWE-Bench Verified leaderboard, DeepSeek V4 Pro now sits at 80.6% — clear of every other open-weight model and ahead of all the closed-source frontier coders shipping in May 2026. Claude Sonnet 4 sits at roughly 77.2%, GPT-5 at about 74.9%, Gemini 2.5 around 71.8%. On GPQA Diamond, V4 Pro scores 90.1, putting it within striking distance of the top closed-source reasoning models. And it runs on a 1M-token context window. You can download the weights today, point them at your own GPU cluster, and never send a single byte of source code to a third-party API. That last sentence is the whole article, really. The benchmark lead will move within weeks — open-weight models leapfrog each other constantly, and Llama 4, Qwen 3.5, Gemma 4 and Mistral…
Top comments (0)