Microsoft Unveils MAI-Thinking-1: 35B Active, 1T Parameters, 97% on AIME 2025

#ai #machinelearning #research #deeplearning

Microsoft's MAI-Thinking-1 hits 97% on AIME 2025 with 35B active params in a 1T MoE model, trained on 30T human tokens without distillation.

Microsoft unveiled MAI-Thinking-1, a 35B active parameter reasoning model scoring 97% on AIME 2025. The model is the first output of what Microsoft calls a 'hill-climbing machine' — a closed-loop pipeline for iteratively improving reasoning models.

Key facts

MAI-Thinking-1: 35B active, 1T total MoE parameters.
97.0% on AIME 2025 math benchmark.
87.7% on LiveCodeBench v6 coding benchmark.
52.8% on SWE-Bench Pro software engineering benchmark.
Trained from scratch on 30T human-generated tokens.

Microsoft has introduced MAI-Thinking-1, a reasoning model with 35 billion active parameters inside a 1 trillion total parameter mixture-of-experts (MoE) architecture. The model achieves 97.0% on AIME 2025, 87.7% on LiveCodeBench v6, and 52.8% on SWE-Bench Pro — strong scores for its active parameter count [According to @rohanpaul_ai].

The Hill-Climbing Pipeline

Microsoft frames MAI-Thinking-1 as the first release from a systematic process it calls a 'hill-climbing machine.' This pipeline integrates data generation, training setup, reward design, safety testing, and evaluation into a single iterative loop. The implication: Microsoft plans to release increasingly capable reasoning models by feeding each cycle's outputs back into the next training run.

The base model was trained from scratch on 30 trillion tokens, predominantly human-generated. Microsoft explicitly states it avoided distillation from third-party models during pre-training — a notable claim given the industry's reliance on synthetic data from frontier models.

Performance and Architecture

MAI-Thinking-1 uses reinforcement learning to teach math reasoning, coding, tool use, helpfulness, and safety. The MoE design activates only 35B parameters per token, keeping inference costs closer to a dense 35B model while maintaining the representational capacity of a 1T parameter system.

The unique take: Microsoft is positioning this as a reproducible process, not a one-off model. If the hill-climbing machine delivers consistent gains per cycle, Microsoft could close the gap with OpenAI and Anthropic on reasoning benchmarks without needing to match their total compute spend per model — the pipeline becomes the moat, not the checkpoint.

What to watch

Watch for the next model in Microsoft's hill-climbing pipeline, likely within 6-12 months, and whether scores on AIME and SWE-Bench Pro improve by more than 5 points. Also track whether Microsoft publishes a paper detailing the pipeline architecture — the lack of one suggests the process itself is a trade secret.

[Updated 03 Jun via simon_willison]

A technical paper accompanying the release [per Simon Willison] reveals that MAI-Thinking-1 was trained on a proprietary web crawl of 1.2 trillion pages, filtered to 794 billion pages using a UT1 block list and a proprietary AI-content detection model to remove adult content, piracy, and AI-generated text. The paper also details that Common Crawl contributed 24.2 billion pages after similar filtering and deduplication, confirming the model relies on public web data despite Microsoft's claim of "clean and commercially licensed" training material.

Originally published on gentic.news