MiniMax M2.1 achieves 74% SWE-bench and 88.6% VIBE with 10B active params. The $0.30/1M token Digital Employee for agentic workflows.
Key Statistics
- 230B Total Parameters (MoE)
- 10B Active Parameters
- 197K Context Window
- 88.6% VIBE Benchmark
Key Takeaways
- 10B Active Parameters: 230B MoE architecture with only 10B active per token - most efficient SOTA model
- 88.6% VIBE Benchmark: 74% SWE-bench Verified and industry-leading scores on full-stack app building
- 90% Cost Reduction: $0.30/1M input tokens - approximately 10% of Claude Sonnet 4.5's price
- Digital Employee: End-to-end office automation beyond just coding - admin, PM, and dev workflows
- Multilingual Excellence: Excels in Rust, Java, Go, Kotlin, TypeScript, and more programming languages
- Framework Support: Native compatibility with Claude Code, Cline, Kilo, Roo Code, and BlackBox
Table of Contents
- What Is MiniMax M2.1
- Company Background
- Technical Specifications
- Key Improvements
- Benchmark Performance
- Digital Employee
- Pricing & Access
- Getting Started
- When to Use M2.1
What Is MiniMax M2.1
Breaking: MiniMax M2.1 released December 23, 2025 - just one day after GLM-4.7. Two major Chinese AI models in 24 hours signals accelerating competition in the open-source coding model space.
MiniMax M2.1 represents a fundamental shift in how we think about AI coding assistants. Released December 23, 2025, it's not just another model optimized for chat - it's designed from the ground up to be a "Digital Employee" capable of handling end-to-end workflows in real production environments.
The key innovation is efficiency: M2.1 uses a Mixture-of-Experts (MoE) architecture with 230 billion total parameters but only activates 10 billion per token. This means you get access to the knowledge of a 230B model at the inference cost of a 10B model - making it exceptionally fast and affordable for the rapid-fire cycles of agentic workflows.
The Core Value Proposition
Frontier performance at 10% the cost. MiniMax M2.1 achieves 74% on SWE-bench Verified - competitive with Claude Sonnet 4.5 - while costing approximately $0.30/1M input tokens compared to Claude's $3.00/1M.
This isn't just about saving money. The 10B active parameter footprint means M2.1 is significantly faster for agentic loops - the Plan -> Code -> Run -> Fix cycles that define modern AI-assisted development.
Core Capabilities
- Multilingual Coding: Systematic enhancements in Rust, Java, Go, C++, Kotlin, TypeScript, and more - covering the complete stack from systems to applications.
- Digital Employee: End-to-end office automation: admin tasks, project management, data analysis, and software development workflows.
- Vibe Coding: Improved design comprehension and aesthetic output for web apps, 3D simulations, and native mobile development.
Company Background: MiniMax
MiniMax is part of China's "AI Tigers" - the leading AI startups alongside DeepSeek, Zhipu (Z.ai), Baichuan, and Moonshot/Kimi. Founded in December 2021 and headquartered in Shanghai, MiniMax has rapidly grown to a $4 billion valuation with backing from tech giants and strategic investors.
Company Profile
| Attribute | Value |
|---|---|
| Founded | December 2021 |
| Headquarters | Shanghai, China |
| Valuation | $4 billion |
| Total Funding | $850M+ (since 2023) |
| IPO Target | Hong Kong Q1 2026 |
Key Investors
- Alibaba (Lead)
- Tencent
- MiHoYo
- Hillhouse
- HongShan
- IDG Capital
Notable: MiHoYo (Genshin Impact developer) investment signals gaming/creative AI applications. 70% of revenue comes from overseas markets.
Product Portfolio
| Product | Category | Notes |
|---|---|---|
| Talkie | AI Companion App | 29M MAU, #4 US AI app downloads |
| Hailuo AI | Video Generation | Competing with OpenAI Sora in AI video generation |
| Conch AI | Educational AI | Strong presence in Asian education markets |
| MiniMax Agent | AI Agent Platform | Built on M2.1, primary offering for developers |
IPO Context: M2.1's release comes just days after MiniMax passed the Hong Kong Stock Exchange listing hearing (December 21, 2025). The model launch appears strategically timed to build momentum before their planned Q1 2026 IPO.
Technical Specifications
Architecture Deep Dive
| Specification | M2.1 | M2 (Previous) |
|---|---|---|
| Architecture | Sparse MoE | Sparse MoE |
| Total Parameters | 230B | 230B |
| Active Parameters | 10B per token | 10B per token |
| Context Window | 197K tokens | 128K tokens |
| License | MIT (Open-Source) | MIT (Open-Source) |
| Sparsity Ratio | ~23:1 | ~23:1 |
| Recommended Params | temp: 1.0, top_p: 0.95, top_k: 40 | temp: 1.0, top_p: 0.95 |
Why 10B Active Matters
The 23:1 sparsity ratio is the key to M2.1's efficiency. For every token processed, only 10B of the 230B parameters are activated. This design choice has three major implications:
- Speed: Inference is dramatically faster than dense models of similar capability
- Cost: Lower compute per token translates directly to lower API pricing
- Agentic Loops: Fast sequential calls enable responsive Plan -> Code -> Run -> Fix cycles
Key Improvements Over M2
M2 (released October 2025) focused on cost and accessibility. M2.1 shifts focus to real-world complex tasks - particularly usability across more programming languages and office scenarios.
Multi-Language Programming Excellence
Real-world systems are polyglot. M2.1 systematically enhances capabilities across the full development stack:
| Level | Languages |
|---|---|
| Systems Level | Rust, C++, Golang |
| Enterprise | Java, Kotlin |
| Web & Mobile | TypeScript, JavaScript, Objective-C, Swift |
Vibe Coding & Aesthetic Design
M2.1 addresses the "widely recognized weakness in mobile development" across the industry:
- Native App Mastery: Significantly strengthened Android (Kotlin) and iOS (Swift/Objective-C) development
- Design Comprehension: Improved understanding of layout, typography, and color schemes
- 3D & Simulation: Complex interactions, scientific visualizations, high-quality 3D scenes
Interleaved Thinking Architecture
As one of the first open-source models to systematically introduce Interleaved Thinking:
- Composite Instructions: Handles multi-step office workflows with integrated execution
- Concise Outputs: More efficient thought chains, lower token consumption
- Self-Correction: Reads errors, adjusts immediately without explicit prompting
Benchmark Performance
Software Engineering Benchmarks
| Benchmark | M2.1 | Claude Sonnet 4.5 | GLM-4.7 | DeepSeek V3.2 |
|---|---|---|---|---|
| SWE-bench Verified | 74.0% | ~77% | 73.8% | 73.1% |
| SWE-Multilingual | 72.5% | Lower | - | - |
| Multi-SWE-Bench | 49.4% | Lower | - | - |
| AIME 2025 (Math) | 78.3% | - | 95.7% | 93.1% |
VIBE Benchmark: A New Standard
What is VIBE?
Visual & Interactive Benchmark for Execution
MiniMax introduced VIBE to measure what traditional benchmarks miss: the ability to build functional applications "from zero to one." Unlike SWE-bench which tests bug fixes, VIBE tests full-stack creation.
The key innovation is Agent-as-a-Verifier (AaaV) - an automated assessment in real runtime environments that judges both code correctness AND visual/interactive quality.
| VIBE Subset | M2.1 Score | What It Tests |
|---|---|---|
| VIBE-Web | 91.5% | Frontend development, layouts, interactions |
| VIBE-Android | 89.7% | Native Android app development (Kotlin) |
| VIBE-iOS | Strong | Native iOS app development (Swift) |
| VIBE-Simulation | Strong | 3D rendering, physics, interactive scenes |
| VIBE-Backend | Strong | API development, database integration |
| VIBE Aggregate | 88.6% | Overall full-stack capability |
Framework Generalization
M2.1 was specifically evaluated across multiple coding agent frameworks, demonstrating exceptional stability:
- Claude Code
- Droid (Factory AI)
- Cline
- Kilo Code
- Roo Code
- BlackBox
Also supports context management conventions: Skill.md, Claude.md/agent.md/.cursorrule, and Slash Commands.
Digital Employee Capabilities
The "Digital Employee" is M2.1's signature feature - moving beyond coding assistance to full office automation. It accepts web content in text form and controls mouse clicks and keyboard inputs via text-based commands.
Administration
- Collect equipment requests from Slack
- Search internal servers for pricing
- Calculate budgets and verify limits
- Record inventory changes
Project Management
- Search for blocked issues
- Consult team members for solutions
- Update issue status
- Track project progress
Software Development
- Find Merge Request history
- Identify file modifications
- Notify relevant team members
- Automate code review workflows
Showcase Demonstrations
MiniMax provides interactive demos showing M2.1's capabilities:
| Project | Technology | Highlights |
|---|---|---|
| 3D Christmas Tree | React Three Fiber | 7,000+ instances, gesture interaction, particle animations |
| 3D Lego Sandbox | Three.js | Grid snapping, collision detection, multi-angle rotation |
| Drum Machine | Web Audio API | 16-step sequencer with glitch effects |
| Photographer Portfolio | HTML/CSS | Brutalist typography, asymmetrical layout |
| Android Gravity Sim | Kotlin | Gyroscope-driven, Easter egg reveals |
| iOS Widget | Swift | Interactive Home Screen widget with animations |
| Rust Security Tool | Rust | CLI + TUI Linux audit tool with risk rating |
Pricing & Access
API Pricing Comparison
| Model | Input (per 1M) | Output (per 1M) | Relative Cost |
|---|---|---|---|
| MiniMax M2.1 | $0.30 | $1.20 | ~10% of Claude |
| M2.1 (OpenRouter) | $0.20-0.27 | $1.06-1.10 | Even cheaper |
| GLM-4.7 | $0.60 | $2.20 | ~15% of Claude |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Baseline |
| DeepSeek-V3.2 | $0.27 | $1.10 | ~10% of Claude |
Cost Comparison Example
At Scale: 10,000 API Calls (100K input + 50K output tokens each)
| Model | Cost |
|---|---|
| Claude Sonnet 4.5 | ~$10,500 |
| MiniMax M2.1 | ~$900 |
Annual savings at moderate usage: $100,000+
Access Methods
Hosted API:
- MiniMax Platform (platform.minimax.io)
- OpenRouter (openrouter.ai)
- Fireworks AI (fireworks.ai)
Self-Hosted:
- HuggingFace (MiniMaxAI/MiniMax-M2.1)
- ModelScope (Available)
- Ollama (
ollama pull minimax-m2.1)
Getting Started
Claude Code Integration
Configure settings.json:
{
"apiProvider": "openrouter",
"openRouterApiKey": "your-openrouter-key",
"apiModelId": "minimax/minimax-m2.1",
"customInstructions": "Use Interleaved Thinking for complex tasks"
}
API Quick Start
Python Example:
import openai
client = openai.OpenAI(
api_key="your-minimax-api-key",
base_url="https://api.minimax.io/v1"
)
response = client.chat.completions.create(
model="minimax-m2.1",
messages=[
{"role": "user", "content": "Build a React component for a todo list"}
],
temperature=1.0,
top_p=0.95
)
print(response.choices[0].message.content)
Hardware Requirements for Local Deployment
| Setup | Hardware | Context Support |
|---|---|---|
| Production (Recommended) | 4x H200/H20 or 4x A100/A800 (96GB each) | Up to 400K tokens |
| Extended Production | 8x 144GB GPUs (1.15TB total) | Up to 3M tokens |
| Consumer/Development | 2x RTX 4090 + quantization (AWQ/GPTQ) | Limited, ~14 tok/s at Q6 |
vLLM Recommended: Use vLLM nightly version (after commit cf3eacfe) with tensor-parallel-size 4. TP8 is not supported - use DP+EP for configurations with more than 4 GPUs.
When to Use MiniMax M2.1
Choose M2.1 When
- Multilingual codebase (Rust, Java, Go, Kotlin, TypeScript)
- Cost-sensitive projects needing frontier performance
- Agentic workflows requiring fast sequential calls
- Full-stack app development from scratch
- Office automation beyond just coding
- Using Claude Code, Cline, or Roo Code frameworks
Consider Alternatives When
- Deep mathematical reasoning is critical (use GLM-4.7)
- Extended autonomous research sessions (use Kimi K2)
- LaTeX-heavy documentation projects
- Role-play or character simulation
- Maximum absolute accuracy is required (use Claude)
- Multimodal input/output needed
M2.1 vs GLM-4.7 vs Kimi K2
| Dimension | MiniMax M2.1 | GLM-4.7 | Kimi K2 |
|---|---|---|---|
| Best For | Interactive IDE agents | Math & multi-turn sessions | Extended research |
| Speed | Fastest | Moderate | Slower |
| Active Params | 10B | 32B | - |
| API Pricing | $0.30/1M | $0.60/1M | $0.40/1M |
| Unique Feature | Digital Employee | Preserved Thinking | 200+ tool calls |
Community Endorsements
"We're excited for powerful open-source models like M2.1 that bring frontier performance (and in some cases exceed the frontier) for a wide variety of software development tasks. Developers deserve choice, and M2.1 provides that much needed choice!"
Eno Reyes, Co-Founder, CTO of Factory AI
"Our users have come to rely on MiniMax for frontier-grade coding assistance at a fraction of the cost, and early testing shows M2.1 excelling at everything from architecture and orchestration to code reviews and deployment."
Scott Breitenother, Co-Founder, CEO of Kilo
"M2.1 handles the nuances of complex, multi-step programming tasks with a level of consistency that is rare in this space. By providing high-quality reasoning and context awareness at scale, MiniMax has become a core component of how we help developers."
Robert Rizk, Co-Founder, CEO of BlackBox
"The latest M2.1 release builds on that foundation with meaningful improvements in speed and reliability, performing well across a wider range of languages and frameworks. It's a great choice for high-throughput, agentic coding workflows."
Matt Rubens, Co-Founder, CEO of RooCode
Frequently Asked Questions
What is MiniMax M2.1?
MiniMax M2.1 is an open-source large language model released December 23, 2025, featuring a 230B Mixture-of-Experts (MoE) architecture with only 10B active parameters per token. It's designed for real-world complex tasks including multi-language programming, agentic workflows, and office automation, positioning itself as a 'Digital Employee' rather than just a coding assistant.
Who is MiniMax?
MiniMax is a Shanghai-based AI company founded in December 2021 with a $4 billion valuation. Key investors include Alibaba, Tencent, and MiHoYo. They operate products like Talkie (29M MAU AI companion app), Hailuo AI (video generation), and Conch AI (education). They're planning a Hong Kong IPO in Q1 2026.
What does '10B active parameters' mean?
MiniMax M2.1 uses a Mixture-of-Experts (MoE) architecture where only 10B of its 230B total parameters are activated for each token processed. This provides access to 230B parameters worth of knowledge while only incurring the inference cost of a 10B model, making it exceptionally efficient for agentic workflows requiring many sequential calls.
How does M2.1 compare to Claude Sonnet 4.5?
M2.1 achieves 74% on SWE-bench Verified (Claude ~77%) and outperforms Claude Sonnet 4.5 in multilingual coding scenarios. The key advantage is cost: M2.1 costs approximately 10% of Claude Sonnet 4.5 ($0.30 vs $3.00 per 1M input tokens) while maintaining competitive performance, especially in agentic and tool-use scenarios.
What is the VIBE benchmark?
VIBE (Visual & Interactive Benchmark for Execution) is a new benchmark created by MiniMax that tests full-stack capability to build functional applications 'from zero to one.' It covers Web, Android, iOS, Simulation, and Backend subsets, using an Agent-as-a-Verifier (AaaV) paradigm that judges both code correctness and visual/interactive quality in real runtime environments.
What is the Digital Employee feature?
Digital Employee is M2.1's capability to perform end-to-end office automation tasks. It accepts web content in text form and controls mouse clicks and keyboard inputs via text commands. It handles workflows in administration (equipment requests, budget calculations), project management (issue tracking), and software development (Merge Request queries) autonomously.
How much does MiniMax M2.1 cost?
API pricing is $0.30/1M input tokens and $1.20/1M output tokens - approximately 10% of Claude Sonnet 4.5. MiniMax also offers Coding Plans: Starter ($10/month), Pro ($20/month), and Max ($50/month), providing significant value compared to Claude Code's pricing. OpenRouter offers slightly lower rates at $0.20-0.27/1M input.
Can I run MiniMax M2.1 locally?
Yes, M2.1 weights are available on HuggingFace and ModelScope under MIT license. You can deploy using vLLM (recommended), SGLang, or Ollama. However, the full model requires significant hardware - recommended production setup is 4x H200/H20 or 4x A100/A800 GPUs with 96GB VRAM each. Consumer setups require 2x RTX 4090 minimum with quantization.
What hardware do I need for local deployment?
Production: 4x H200/H20 or 4x A100/A800 GPUs (96GB VRAM each) supports up to 400K tokens context. Extended: 8x 144GB GPUs (1.15TB total) supports up to 3M tokens. Consumer/Development: 2x RTX 4090 minimum with AWQ/GPTQ/experts_int8 quantization. Q6 quantization achieves ~14 tokens/second.
Does M2.1 work with Claude Code?
Yes, M2.1 demonstrates excellent framework generalization. It works consistently with Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox. It also supports context management conventions like Skill.md, Claude.md/agent.md/.cursorrule files, and Slash Commands.
What are MiniMax M2.1's main limitations?
M2.1 is weaker on pure mathematical reasoning compared to GLM-4.7 (78.3% vs 95.7% on AIME 2025). It's not suited for extended autonomous research tasks where models like Kimi K2 Thinking excel. Users report inconsistencies in LaTeX understanding and role-play/character simulation. It's also text-only with no native multimodal capabilities.
How does M2.1 compare to GLM-4.7?
Both released within 24 hours (GLM-4.7 on Dec 22, M2.1 on Dec 23). M2.1 is faster with lower active parameters (10B vs 32B) and 4-7x cheaper on API pricing. GLM-4.7 excels in mathematical reasoning and has Preserved Thinking for multi-turn sessions. M2.1 leads in VIBE benchmark scores and has the Digital Employee feature. Choose M2.1 for speed/cost, GLM-4.7 for math/research.
Top comments (0)