May 2026 Coding Agent Leaderboard: Who's Actually Leading

#machinelearning #research #ai

Originally published on AI Tech Connect.

What the May 2026 board actually says Two months ago the SWE-bench Verified leaderboard was a Claude-vs-OpenAI seesaw with both labs trading places around 80%. As of mid-May 2026, the picture has changed sharply. Anthropic's Claude Mythos Preview sits at the top with a verified pass rate of 93.9% — the first time any model has crossed 90% on the benchmark. OpenAI's GPT-5.5, released on 23 April 2026, is reported at 88.7% per marc0.dev's May snapshot and OpenAI's own materials. Anthropic's previous flagship, the Adaptive variant of Claude Opus 4.7, follows at 87.6%. The middle of the table is where things get interesting for builders. Google's Gemini 3.1 Pro and DeepSeek's V4 Pro Max are tied at 80.6% — one closed and one open-weight, separated by orders of magnitude in licence cost.…

Read the full article on AI Tech Connect →

DEV Community

May 2026 Coding Agent Leaderboard: Who's Actually Leading

Top comments (0)