This article was originally published on aicoderscope.com
TL;DR: MiMo Code is Xiaomi's MIT-licensed fork of OpenCode that adds a persistent cross-session memory system and bundles free, limited-time access to a 1.02-trillion-parameter model. Xiaomi's own numbers say it beats Claude Code on three benchmarks, but those scores are self-reported, compare against Claude Code on Sonnet 4.6 (not Opus 4.8), and appear on no independent leaderboard. The memory architecture is the real reason to try it.
| MiMo Code | Claude Code | OpenCode | |
|---|---|---|---|
| Best for | Ultra-long agentic tasks (200+ steps), free frontier-model trial | Anthropic-stack teams wanting the strongest model + polish | BYOK / local-model workflows, full provider freedom |
| Price / Cost | $0 (MIT); free MiMo-V2.5-Pro via "MiMo Auto" for now | $20/mo Pro, $100/mo Max, or API billing | $0 core (BYOK); $10/mo Go tier |
| The catch | Bundled model is cloud-only and time-limited; benchmarks unaudited | Model lock to Anthropic | Configuration overhead; default routes to a third-party model |
Honest take: Try MiMo Code for the memory system and the free 1T-model window, not because it "beats Claude Code" — that claim leans on a benchmark setup Xiaomi designed and graded itself. For day-to-day work where you control the model, Claude Code and OpenCode are still the safer picks.
What MiMo Code actually is
Strip away the launch-day headlines and MiMo Code is a fork of OpenCode, the MIT-licensed terminal coding agent. Xiaomi's MiMo team kept the parts that made OpenCode good — multiple model providers, the TUI, language-server (LSP) integration, MCP support, and the plugin system — and bolted on the features they think win long tasks.
The additions, straight from the repository README:
- Persistent memory: project memory, checkpoints, notes, and task tracking that survive across sessions
- Intelligent context reconstruction with explicit token budgeting
- Subagent orchestration with parallel execution
- Goal and stop-condition evaluation via an independent judge model
- Compose mode for specs-driven development (the same idea AWS Kiro built its whole product around — see our Kiro review)
- Voice input (needs a MiMo login or compatible provider)
- "Dream & distill" — a background pass that extracts reusable knowledge and turns repeated workflows into shortcuts
V0.1.0 was open-sourced on June 10, 2026; v0.1.1 followed on June 15. The source code is MIT-licensed, though the bundled model and hosted service carry separate Use Restrictions and Xiaomi's MiMo Terms of Service — read those before you wire it into anything commercial.
The benchmark claims — and the asterisks that matter
Here's the comparison Xiaomi published. Both harnesses were run with the same MiMo-V2.5-Pro model where noted, and Claude Code was run on Claude Sonnet 4.6.
| Benchmark | MiMo Code | Claude Code (Sonnet 4.6) | Gap |
|---|---|---|---|
| SWE-bench Verified | 82% | 79% | +3 pts |
| SWE-bench Pro | 62% | 55% | +7 pts |
| Terminal-Bench 2 | 73% | 69% | +4 pts |
Xiaomi's headline framing: run the same MiMo-V2.5-Pro inside both harnesses and MiMo Code still comes out roughly five points ahead on each benchmark — their argument that "the agent architecture matters as much as the model." They also ran a double-blind A/B with 576 developers: under 200 execution steps the two tools split about 50/50, but past 200 steps MiMo Code reportedly won more than 65% of head-to-head matchups.
Now the parts the press release skips:
These are self-reported. MiMo Code does not appear on the official Terminal-Bench 2.0 leaderboard or any independent SWE-bench board. Xiaomi designed the test harness, picked the configuration, and graded the runs.
The Claude Code comparison is on Sonnet 4.6, not Opus 4.8. That matters a lot. On the verified Terminal-Bench 2.1 leaderboard, Claude Code paired with Opus 4.8 scores 78.9% — nine points above the 69% Xiaomi attributes to "Claude Code." In other words, Xiaomi benchmarked against Claude Code's mid-tier model and reported it as Claude Code's ceiling.
The independent #1 is something else entirely. On Terminal-Bench 2.0, Codex CLI with GPT-5.5 holds the top spot at 82.0% (83.4% on the 2.1 update) — above MiMo Code's self-reported 73%. If you want the agent that wins audited benchmarks today, the honest answer is still the Codex CLI / Claude Code tier, not MiMo Code.
None of this makes the numbers fake. Vendor benchmarks are a starting point, not a verdict. But "beats Claude Code" is doing a lot of work in the headlines, and the fine print deflates most of it.
Installing it
MiMo Code is terminal-native on Linux, macOS, and WSL. Two install paths:
# macOS / Linux — official installer
curl -fsSL https://mimo.xiaomi.com/install | bash
# Windows (via WSL) or any npm environment
npm install -g @mimo-ai/cli
Confirm it landed:
$ mimo --version
mimo 0.1.1
$ mimo
# launches the TUI; on first run it prompts you to either log in
# to MiMo or configure your own provider (OpenAI-compatible, Anthropic, etc.)
The zero-configuration path is MiMo Auto — an anonymous channel that routes to MiMo-V2.5-Pro free of charge "for a limited time." You can start coding without an API key, which is the single biggest reason to try it this month: it's a free trial of a 1T-parameter frontier-class model with no signup friction. When that window closes, you fall back to bring-your-own-key, exactly like upstream OpenCode.
The model under the hood — and why you can't run it locally
The bundled model, MiMo-V2.5-Pro, is a sparse mixture-of-experts design:
| Spec | Value |
|---|---|
| Total parameters | 1.02 trillion |
| Active parameters | 42 billion |
| Context window | 1M tokens |
| Attention | Hybrid SWA + global, 6:1 ratio, 128-token window |
| Pre-training | 27T tokens, FP8 mixed precision |
The 42B active parameters mean inference compute is closer to a 42B dense model — but you still need enough memory to hold the full 1.02T-parameter weights resident. That puts genuine local inference firmly out of reach for any single-workstation setup. There is no 8GB, 24GB, or even 96GB VRAM tier that runs this thing; you're looking at a multi-GPU server. For the overwhelming majority of readers, MiMo-V2.5-Pro is a cloud model you access through MiMo Auto, full stop.
If your actual goal is local, private, AI coding — the most common request we get — MiMo Code isn't the tool. Keep MiMo Code's harness if you like it, but point it at a small local model instead, the same way you would with OpenCode. Our OpenCode + Ollama setup guide covers that workflow, and runaihome.com's local-model-by-VRAM guide tells you which quantized coder fits your card. For FOSS-first tooling more broadly, aifoss.dev tracks the open-weight coding stack.
Where it actually shines: long tasks that would lose context
The persistent-memory pitch isn't marketing fluff — it targets a real failure mode. Run any agent through a 150-step refactor and you watch it forget decisions it made an hour earlier: it re-reads files it already understood, re-derives the project's conventions, and occasionally undoes its own work because the relevant context fell out of the window.
I hit exactly this on a long migration with a stock OpenCode setup last month: around step 90 the agent "rediscovered" a config file it had already rewritten and proposed reverting it. MiMo Code's checkpoint-and-notes system is built to stop that — it persists task state and project memory outside the live context window, then reconstructs only what's relevant on each turn. Whether it's worth switching tools for
Top comments (0)