MiniMax‑M2.5: “Unlimited” Agentic Coding at $1/hour? Here’s what to take seriously

#ai #agents #coding #llms

MiniMax just published a launch post for MiniMax‑M2.5 (Feb 12, 2026) and they’re coming in hot: agentic coding + tool use, big benchmark numbers, and a very blunt message on pricing — run frontier-ish agents without caring about cost.

Source (primary): https://www.minimaxi.com/news/minimax-m25

What MiniMax is claiming

From the announcement:

SWE‑bench Verified: 80.2%
BrowseComp: 76.3%
Multi‑SWE‑bench: 51.3%
Faster long-horizon runs: they claim ~37% faster than M2.1 on SWE‑bench Verified end-to-end.
Two versions with similar quality but different speed/price:
- ~100 tokens/sec version: they frame this as “$1 for one hour of continuous work” at 100 TPS
- ~50 tokens/sec version: even cheaper (they quote $0.3/hour)

They’re also positioning it as an “architect-like” coding agent: planning before coding, doing spec-style decomposition, multi-language coverage, and operating in real environments (not just toy front-end demos).

Why this matters (even if you ignore the hype)

The most interesting part isn’t just the benchmark flex — it’s the product thesis:

1) Agent cost is becoming a first-class feature.
If you can run agents for hours (retries, tool calls, browsing, code search) without watching the meter, it changes what teams are willing to automate.

2) Speed + token efficiency is the real unlock for long-horizon work.
A model that can solve the same task with fewer steps and less “thinking fluff” is effectively more capable in production, even if raw IQ is similar.

3) The competition is shifting from “best model” to “best agent system.”
MiniMax explicitly talks about scaffolds, tool calling, search, and RL training in real environments. That’s the same direction we’re seeing everywhere: reliability comes from the systems around the model.

How I’d sanity-check M2.5 (before switching anything)

If you’re tempted to try it for BuildrLab-style work (Next.js + serverless + AWS), here’s what I’d test first:

Spec-first behavior: does it consistently propose a plan, edge cases, and a test strategy before writing code?
Patch quality on messy repos: not greenfield demos — a repo with existing conventions, lint rules, CI pipelines.
Tool discipline: does it spam tools, or does it use search/tool calls with intent?
End-to-end “agent runtime” cost: real tasks with retries, browser steps, and failures — not just single-shot code completion.

BuildrLab take

If MiniMax‑M2.5 is even close to these claims, the obvious business implication is:

High-volume agent workflows become cheaper to run → more automation becomes economically viable.
The moat for a consultancy/product shop shifts further toward process + scaffolding + guardrails, not model selection.

This is exactly why I’ve been paying attention to things like WebMCP and open RL tooling: agent reliability is turning into a standards + infrastructure game.

If you want me to publish this as a BuildrLab blog post, I’ll create it as a draft first and you can reply with: deploy website / deploy dev.to / deploy both.