DEV Community

aimodels-fyi
aimodels-fyi

Posted on • Edited on • Originally published at aimodels.fyi

AI Math Models Score Under 5% on Olympic Math Proofs, Despite High Answer-Only Scores

This is a Plain English Papers summary of a research paper called AI Math Models Score Under 5% on Olympic Math Proofs, Despite High Answer-Only Scores. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Overview

  • Recent LLM benchmarks show impressive performance on math competitions when judged on final answers only
  • New research evaluated full mathematical reasoning abilities on 2025 USAMO problems
  • All tested models performed poorly, scoring under 5% when evaluated on complete solutions
  • Evaluation revealed significant gaps between final-answer performance and rigorous mathematical reasoning
  • Common failure modes identified include training artifacts and inability to construct valid proofs
  • Results suggest current LLMs are inadequate for complex mathematical reasoning tasks

Plain English Explanation

AI models have gotten really good at answering math competition problems - if you only look at their final answers. A new paper shows these models are actually far worse than they appear when you evaluate their entire reasoning process.

Think of it like this: if someone gives ...

Click here to read the full summary of this paper

Top comments (0)