DEV Community

Vilius
Vilius

Posted on

Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro

Tonight at 23:00 BST we're running fresh benchmarks on 10 LLMs we haven't tested before.

The lineup:

  • DeepSeek V4 Pro & Flash
  • Grok 4.20 & 4.1 Fast
  • GPT-5.5 Pro & GPT-5.4 Pro
  • Xiaomi MiMo V2.5 Pro
  • Google Lyria 3 Pro & Clip
  • inclusionAI Ring 2.6

All tested on the same 10 real-world agent coding tasks. Same methodology, same scoring, same brutal honesty about what breaks.

Results go live immediately after the run — watch benchmarks.workswithagents.dev.

Update to follow once results are in.

Top comments (0)