EPAM's AI/Run Agent Tops SWE-bench Verified at 76.8%

#product #research #ai #machinelearning

Originally published on AI Tech Connect.

What you need to know A harness, not a model, is on top. EPAM's AI/Run developer agent leads SWE-bench Verified at roughly 76.8% as of mid-June 2026, ahead of Anthropic's roughly 73.2%, per the leaderboard tracked by Epoch AI. AI/Run is built on frontier base models, not a new one. The gains come from agent engineering — planning, repo-aware retrieval, tool use and iterative test-run-fix loops. SWE-bench Verified is narrow. It is a 500-task, human-validated subset measuring issue resolution on real Python GitHub repositories, where the agent's patch must pass the repo's own tests. It is not general coding ability. This is an ownable skill. Indian and UK builders cannot train a frontier model in a garage, but they can absolutely engineer a strong harness — and that is increasingly where…

Read the full article on AI Tech Connect →

DEV Community

EPAM's AI/Run Agent Tops SWE-bench Verified at 76.8%

Top comments (0)