DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at aicoderscope.com

MiMo Code Review 2026: Xiaomi's Open-Source Claude Code Challenger and Whether the 200-Step Benchmark Claims Hold Up

This article was originally published on aicoderscope.com

TL;DR: MiMo Code is Xiaomi's MIT-licensed fork of OpenCode that adds a persistent cross-session memory system and bundles free, limited-time access to a 1.02-trillion-parameter model. Xiaomi's own numbers say it beats Claude Code on three benchmarks, but those scores are self-reported, compare against Claude Code on Sonnet 4.6 (not Opus 4.8), and appear on no independent leaderboard. The memory architecture is the real reason to try it.

MiMo Code Claude Code OpenCode
Best for Ultra-long agentic tasks (200+ steps), free frontier-model trial Anthropic-stack teams wanting the strongest model + polish BYOK / local-model workflows, full provider freedom
Price / Cost $0 (MIT); free MiMo-V2.5-Pro via "MiMo Auto" for now $20/mo Pro, $100/mo Max, or API billing $0 core (BYOK); $10/mo Go tier
The catch Bundled model is cloud-only and time-limited; benchmarks unaudited Model lock to Anthropic Configuration overhead; default routes to a third-party model

Honest take: Try MiMo Code for the memory system and the free 1T-model window, not because it "beats Claude Code" — that claim leans on a benchmark setup Xiaomi designed and graded itself. For day-to-day work where you control the model, Claude Code and OpenCode are still the safer picks.


What MiMo Code actually is

Strip away the launch-day headlines and MiMo Code is a fork of OpenCode, the MIT-licensed terminal coding agent. Xiaomi's MiMo team kept the parts that made OpenCode good — multiple model providers, the TUI, language-server (LSP) integration, MCP support, and the plugin system — and bolted on the features they think win long tasks.

The additions, straight from the repository README:

  • Persistent memory: project memory, checkpoints, notes, and task tracking that survive across sessions
  • Intelligent context reconstruction with explicit token budgeting
  • Subagent orchestration with parallel execution
  • Goal and stop-condition evaluation via an independent judge model
  • Compose mode for specs-driven development (the same idea AWS Kiro built its whole product around — see our Kiro review)
  • Voice input (needs a MiMo login or compatible provider)
  • "Dream & distill" — a background pass that extracts reusable knowledge and turns repeated workflows into shortcuts

V0.1.0 was open-sourced on June 10, 2026; v0.1.1 followed on June 15. The source code is MIT-licensed, though the bundled model and hosted service carry separate Use Restrictions and Xiaomi's MiMo Terms of Service — read those before you wire it into anything commercial.

The benchmark claims — and the asterisks that matter

Here's the comparison Xiaomi published. Both harnesses were run with the same MiMo-V2.5-Pro model where noted, and Claude Code was run on Claude Sonnet 4.6.

Benchmark MiMo Code Claude Code (Sonnet 4.6) Gap
SWE-bench Verified 82% 79% +3 pts
SWE-bench Pro 62% 55% +7 pts
Terminal-Bench 2 73% 69% +4 pts

Xiaomi's headline framing: run the same MiMo-V2.5-Pro inside both harnesses and MiMo Code still comes out roughly five points ahead on each benchmark — their argument that "the agent architecture matters as much as the model." They also ran a double-blind A/B with 576 developers: under 200 execution steps the two tools split about 50/50, but past 200 steps MiMo Code reportedly won more than 65% of head-to-head matchups.

Now the parts the press release skips:

These are self-reported. MiMo Code does not appear on the official Terminal-Bench 2.0 leaderboard or any independent SWE-bench board. Xiaomi designed the test harness, picked the configuration, and graded the runs.

The Claude Code comparison is on Sonnet 4.6, not Opus 4.8. That matters a lot. On the verified Terminal-Bench 2.1 leaderboard, Claude Code paired with Opus 4.8 scores 78.9% — nine points above the 69% Xiaomi attributes to "Claude Code." In other words, Xiaomi benchmarked against Claude Code's mid-tier model and reported it as Claude Code's ceiling.

The independent #1 is something else entirely. On Terminal-Bench 2.0, Codex CLI with GPT-5.5 holds the top spot at 82.0% (83.4% on the 2.1 update) — above MiMo Code's self-reported 73%. If you want the agent that wins audited benchmarks today, the honest answer is still the Codex CLI / Claude Code tier, not MiMo Code.

None of this makes the numbers fake. Vendor benchmarks are a starting point, not a verdict. But "beats Claude Code" is doing a lot of work in the headlines, and the fine print deflates most of it.

Installing it

MiMo Code is terminal-native on Linux, macOS, and WSL. Two install paths:

# macOS / Linux — official installer
curl -fsSL https://mimo.xiaomi.com/install | bash

# Windows (via WSL) or any npm environment
npm install -g @mimo-ai/cli
Enter fullscreen mode Exit fullscreen mode

Confirm it landed:

$ mimo --version
mimo 0.1.1

$ mimo
# launches the TUI; on first run it prompts you to either log in
# to MiMo or configure your own provider (OpenAI-compatible, Anthropic, etc.)
Enter fullscreen mode Exit fullscreen mode

The zero-configuration path is MiMo Auto — an anonymous channel that routes to MiMo-V2.5-Pro free of charge "for a limited time." You can start coding without an API key, which is the single biggest reason to try it this month: it's a free trial of a 1T-parameter frontier-class model with no signup friction. When that window closes, you fall back to bring-your-own-key, exactly like upstream OpenCode.

The model under the hood — and why you can't run it locally

The bundled model, MiMo-V2.5-Pro, is a sparse mixture-of-experts design:

Spec Value
Total parameters 1.02 trillion
Active parameters 42 billion
Context window 1M tokens
Attention Hybrid SWA + global, 6:1 ratio, 128-token window
Pre-training 27T tokens, FP8 mixed precision

The 42B active parameters mean inference compute is closer to a 42B dense model — but you still need enough memory to hold the full 1.02T-parameter weights resident. That puts genuine local inference firmly out of reach for any single-workstation setup. There is no 8GB, 24GB, or even 96GB VRAM tier that runs this thing; you're looking at a multi-GPU server. For the overwhelming majority of readers, MiMo-V2.5-Pro is a cloud model you access through MiMo Auto, full stop.

If your actual goal is local, private, AI coding — the most common request we get — MiMo Code isn't the tool. Keep MiMo Code's harness if you like it, but point it at a small local model instead, the same way you would with OpenCode. Our OpenCode + Ollama setup guide covers that workflow, and runaihome.com's local-model-by-VRAM guide tells you which quantized coder fits your card. For FOSS-first tooling more broadly, aifoss.dev tracks the open-weight coding stack.

Where it actually shines: long tasks that would lose context

The persistent-memory pitch isn't marketing fluff — it targets a real failure mode. Run any agent through a 150-step refactor and you watch it forget decisions it made an hour earlier: it re-reads files it already understood, re-derives the project's conventions, and occasionally undoes its own work because the relevant context fell out of the window.

I hit exactly this on a long migration with a stock OpenCode setup last month: around step 90 the agent "rediscovered" a config file it had already rewritten and proposed reverting it. MiMo Code's checkpoint-and-notes system is built to stop that — it persists task state and project memory outside the live context window, then reconstructs only what's relevant on each turn. Whether it's worth switching tools for

Top comments (0)