DEV Community

Richard Gibbons
Richard Gibbons

Posted on • Originally published at digitalapplied.com on

MiniMax M2.1 Guide: Digital Employee for AI Coding

MiniMax M2.1 achieves 74% SWE-bench and 88.6% VIBE with 10B active params. The $0.30/1M token Digital Employee for agentic workflows.

Key Statistics

  • 230B Total Parameters (MoE)
  • 10B Active Parameters
  • 197K Context Window
  • 88.6% VIBE Benchmark

Key Takeaways

  • 10B Active Parameters: 230B MoE architecture with only 10B active per token - most efficient SOTA model
  • 88.6% VIBE Benchmark: 74% SWE-bench Verified and industry-leading scores on full-stack app building
  • 90% Cost Reduction: $0.30/1M input tokens - approximately 10% of Claude Sonnet 4.5's price
  • Digital Employee: End-to-end office automation beyond just coding - admin, PM, and dev workflows
  • Multilingual Excellence: Excels in Rust, Java, Go, Kotlin, TypeScript, and more programming languages
  • Framework Support: Native compatibility with Claude Code, Cline, Kilo, Roo Code, and BlackBox

Table of Contents

  1. What Is MiniMax M2.1
  2. Company Background
  3. Technical Specifications
  4. Key Improvements
  5. Benchmark Performance
  6. Digital Employee
  7. Pricing & Access
  8. Getting Started
  9. When to Use M2.1

What Is MiniMax M2.1

Breaking: MiniMax M2.1 released December 23, 2025 - just one day after GLM-4.7. Two major Chinese AI models in 24 hours signals accelerating competition in the open-source coding model space.

MiniMax M2.1 represents a fundamental shift in how we think about AI coding assistants. Released December 23, 2025, it's not just another model optimized for chat - it's designed from the ground up to be a "Digital Employee" capable of handling end-to-end workflows in real production environments.

The key innovation is efficiency: M2.1 uses a Mixture-of-Experts (MoE) architecture with 230 billion total parameters but only activates 10 billion per token. This means you get access to the knowledge of a 230B model at the inference cost of a 10B model - making it exceptionally fast and affordable for the rapid-fire cycles of agentic workflows.

The Core Value Proposition

Frontier performance at 10% the cost. MiniMax M2.1 achieves 74% on SWE-bench Verified - competitive with Claude Sonnet 4.5 - while costing approximately $0.30/1M input tokens compared to Claude's $3.00/1M.

This isn't just about saving money. The 10B active parameter footprint means M2.1 is significantly faster for agentic loops - the Plan -> Code -> Run -> Fix cycles that define modern AI-assisted development.

Core Capabilities

  • Multilingual Coding: Systematic enhancements in Rust, Java, Go, C++, Kotlin, TypeScript, and more - covering the complete stack from systems to applications.
  • Digital Employee: End-to-end office automation: admin tasks, project management, data analysis, and software development workflows.
  • Vibe Coding: Improved design comprehension and aesthetic output for web apps, 3D simulations, and native mobile development.

Company Background: MiniMax

MiniMax is part of China's "AI Tigers" - the leading AI startups alongside DeepSeek, Zhipu (Z.ai), Baichuan, and Moonshot/Kimi. Founded in December 2021 and headquartered in Shanghai, MiniMax has rapidly grown to a $4 billion valuation with backing from tech giants and strategic investors.

Company Profile

Attribute Value
Founded December 2021
Headquarters Shanghai, China
Valuation $4 billion
Total Funding $850M+ (since 2023)
IPO Target Hong Kong Q1 2026

Key Investors

  • Alibaba (Lead)
  • Tencent
  • MiHoYo
  • Hillhouse
  • HongShan
  • IDG Capital

Notable: MiHoYo (Genshin Impact developer) investment signals gaming/creative AI applications. 70% of revenue comes from overseas markets.

Product Portfolio

Product Category Notes
Talkie AI Companion App 29M MAU, #4 US AI app downloads
Hailuo AI Video Generation Competing with OpenAI Sora in AI video generation
Conch AI Educational AI Strong presence in Asian education markets
MiniMax Agent AI Agent Platform Built on M2.1, primary offering for developers

IPO Context: M2.1's release comes just days after MiniMax passed the Hong Kong Stock Exchange listing hearing (December 21, 2025). The model launch appears strategically timed to build momentum before their planned Q1 2026 IPO.


Technical Specifications

Architecture Deep Dive

Specification M2.1 M2 (Previous)
Architecture Sparse MoE Sparse MoE
Total Parameters 230B 230B
Active Parameters 10B per token 10B per token
Context Window 197K tokens 128K tokens
License MIT (Open-Source) MIT (Open-Source)
Sparsity Ratio ~23:1 ~23:1
Recommended Params temp: 1.0, top_p: 0.95, top_k: 40 temp: 1.0, top_p: 0.95

Why 10B Active Matters

The 23:1 sparsity ratio is the key to M2.1's efficiency. For every token processed, only 10B of the 230B parameters are activated. This design choice has three major implications:

  • Speed: Inference is dramatically faster than dense models of similar capability
  • Cost: Lower compute per token translates directly to lower API pricing
  • Agentic Loops: Fast sequential calls enable responsive Plan -> Code -> Run -> Fix cycles

Key Improvements Over M2

M2 (released October 2025) focused on cost and accessibility. M2.1 shifts focus to real-world complex tasks - particularly usability across more programming languages and office scenarios.

Multi-Language Programming Excellence

Real-world systems are polyglot. M2.1 systematically enhances capabilities across the full development stack:

Level Languages
Systems Level Rust, C++, Golang
Enterprise Java, Kotlin
Web & Mobile TypeScript, JavaScript, Objective-C, Swift

Vibe Coding & Aesthetic Design

M2.1 addresses the "widely recognized weakness in mobile development" across the industry:

  • Native App Mastery: Significantly strengthened Android (Kotlin) and iOS (Swift/Objective-C) development
  • Design Comprehension: Improved understanding of layout, typography, and color schemes
  • 3D & Simulation: Complex interactions, scientific visualizations, high-quality 3D scenes

Interleaved Thinking Architecture

As one of the first open-source models to systematically introduce Interleaved Thinking:

  • Composite Instructions: Handles multi-step office workflows with integrated execution
  • Concise Outputs: More efficient thought chains, lower token consumption
  • Self-Correction: Reads errors, adjusts immediately without explicit prompting

Benchmark Performance

Software Engineering Benchmarks

Benchmark M2.1 Claude Sonnet 4.5 GLM-4.7 DeepSeek V3.2
SWE-bench Verified 74.0% ~77% 73.8% 73.1%
SWE-Multilingual 72.5% Lower - -
Multi-SWE-Bench 49.4% Lower - -
AIME 2025 (Math) 78.3% - 95.7% 93.1%

VIBE Benchmark: A New Standard

What is VIBE?

Visual & Interactive Benchmark for Execution

MiniMax introduced VIBE to measure what traditional benchmarks miss: the ability to build functional applications "from zero to one." Unlike SWE-bench which tests bug fixes, VIBE tests full-stack creation.

The key innovation is Agent-as-a-Verifier (AaaV) - an automated assessment in real runtime environments that judges both code correctness AND visual/interactive quality.

VIBE Subset M2.1 Score What It Tests
VIBE-Web 91.5% Frontend development, layouts, interactions
VIBE-Android 89.7% Native Android app development (Kotlin)
VIBE-iOS Strong Native iOS app development (Swift)
VIBE-Simulation Strong 3D rendering, physics, interactive scenes
VIBE-Backend Strong API development, database integration
VIBE Aggregate 88.6% Overall full-stack capability

Framework Generalization

M2.1 was specifically evaluated across multiple coding agent frameworks, demonstrating exceptional stability:

  • Claude Code
  • Droid (Factory AI)
  • Cline
  • Kilo Code
  • Roo Code
  • BlackBox

Also supports context management conventions: Skill.md, Claude.md/agent.md/.cursorrule, and Slash Commands.


Digital Employee Capabilities

The "Digital Employee" is M2.1's signature feature - moving beyond coding assistance to full office automation. It accepts web content in text form and controls mouse clicks and keyboard inputs via text-based commands.

Administration

  • Collect equipment requests from Slack
  • Search internal servers for pricing
  • Calculate budgets and verify limits
  • Record inventory changes

Project Management

  • Search for blocked issues
  • Consult team members for solutions
  • Update issue status
  • Track project progress

Software Development

  • Find Merge Request history
  • Identify file modifications
  • Notify relevant team members
  • Automate code review workflows

Showcase Demonstrations

MiniMax provides interactive demos showing M2.1's capabilities:

Project Technology Highlights
3D Christmas Tree React Three Fiber 7,000+ instances, gesture interaction, particle animations
3D Lego Sandbox Three.js Grid snapping, collision detection, multi-angle rotation
Drum Machine Web Audio API 16-step sequencer with glitch effects
Photographer Portfolio HTML/CSS Brutalist typography, asymmetrical layout
Android Gravity Sim Kotlin Gyroscope-driven, Easter egg reveals
iOS Widget Swift Interactive Home Screen widget with animations
Rust Security Tool Rust CLI + TUI Linux audit tool with risk rating

Pricing & Access

API Pricing Comparison

Model Input (per 1M) Output (per 1M) Relative Cost
MiniMax M2.1 $0.30 $1.20 ~10% of Claude
M2.1 (OpenRouter) $0.20-0.27 $1.06-1.10 Even cheaper
GLM-4.7 $0.60 $2.20 ~15% of Claude
Claude Sonnet 4.5 $3.00 $15.00 Baseline
DeepSeek-V3.2 $0.27 $1.10 ~10% of Claude

Cost Comparison Example

At Scale: 10,000 API Calls (100K input + 50K output tokens each)

Model Cost
Claude Sonnet 4.5 ~$10,500
MiniMax M2.1 ~$900

Annual savings at moderate usage: $100,000+

Access Methods

Hosted API:

  • MiniMax Platform (platform.minimax.io)
  • OpenRouter (openrouter.ai)
  • Fireworks AI (fireworks.ai)

Self-Hosted:

  • HuggingFace (MiniMaxAI/MiniMax-M2.1)
  • ModelScope (Available)
  • Ollama (ollama pull minimax-m2.1)

Getting Started

Claude Code Integration

Configure settings.json:

{
  "apiProvider": "openrouter",
  "openRouterApiKey": "your-openrouter-key",
  "apiModelId": "minimax/minimax-m2.1",
  "customInstructions": "Use Interleaved Thinking for complex tasks"
}
Enter fullscreen mode Exit fullscreen mode

API Quick Start

Python Example:

import openai

client = openai.OpenAI(
    api_key="your-minimax-api-key",
    base_url="https://api.minimax.io/v1"
)

response = client.chat.completions.create(
    model="minimax-m2.1",
    messages=[
        {"role": "user", "content": "Build a React component for a todo list"}
    ],
    temperature=1.0,
    top_p=0.95
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Hardware Requirements for Local Deployment

Setup Hardware Context Support
Production (Recommended) 4x H200/H20 or 4x A100/A800 (96GB each) Up to 400K tokens
Extended Production 8x 144GB GPUs (1.15TB total) Up to 3M tokens
Consumer/Development 2x RTX 4090 + quantization (AWQ/GPTQ) Limited, ~14 tok/s at Q6

vLLM Recommended: Use vLLM nightly version (after commit cf3eacfe) with tensor-parallel-size 4. TP8 is not supported - use DP+EP for configurations with more than 4 GPUs.


When to Use MiniMax M2.1

Choose M2.1 When

  • Multilingual codebase (Rust, Java, Go, Kotlin, TypeScript)
  • Cost-sensitive projects needing frontier performance
  • Agentic workflows requiring fast sequential calls
  • Full-stack app development from scratch
  • Office automation beyond just coding
  • Using Claude Code, Cline, or Roo Code frameworks

Consider Alternatives When

  • Deep mathematical reasoning is critical (use GLM-4.7)
  • Extended autonomous research sessions (use Kimi K2)
  • LaTeX-heavy documentation projects
  • Role-play or character simulation
  • Maximum absolute accuracy is required (use Claude)
  • Multimodal input/output needed

M2.1 vs GLM-4.7 vs Kimi K2

Dimension MiniMax M2.1 GLM-4.7 Kimi K2
Best For Interactive IDE agents Math & multi-turn sessions Extended research
Speed Fastest Moderate Slower
Active Params 10B 32B -
API Pricing $0.30/1M $0.60/1M $0.40/1M
Unique Feature Digital Employee Preserved Thinking 200+ tool calls

Community Endorsements

"We're excited for powerful open-source models like M2.1 that bring frontier performance (and in some cases exceed the frontier) for a wide variety of software development tasks. Developers deserve choice, and M2.1 provides that much needed choice!"

Eno Reyes, Co-Founder, CTO of Factory AI

"Our users have come to rely on MiniMax for frontier-grade coding assistance at a fraction of the cost, and early testing shows M2.1 excelling at everything from architecture and orchestration to code reviews and deployment."

Scott Breitenother, Co-Founder, CEO of Kilo

"M2.1 handles the nuances of complex, multi-step programming tasks with a level of consistency that is rare in this space. By providing high-quality reasoning and context awareness at scale, MiniMax has become a core component of how we help developers."

Robert Rizk, Co-Founder, CEO of BlackBox

"The latest M2.1 release builds on that foundation with meaningful improvements in speed and reliability, performing well across a wider range of languages and frameworks. It's a great choice for high-throughput, agentic coding workflows."

Matt Rubens, Co-Founder, CEO of RooCode


Frequently Asked Questions

What is MiniMax M2.1?

MiniMax M2.1 is an open-source large language model released December 23, 2025, featuring a 230B Mixture-of-Experts (MoE) architecture with only 10B active parameters per token. It's designed for real-world complex tasks including multi-language programming, agentic workflows, and office automation, positioning itself as a 'Digital Employee' rather than just a coding assistant.

Who is MiniMax?

MiniMax is a Shanghai-based AI company founded in December 2021 with a $4 billion valuation. Key investors include Alibaba, Tencent, and MiHoYo. They operate products like Talkie (29M MAU AI companion app), Hailuo AI (video generation), and Conch AI (education). They're planning a Hong Kong IPO in Q1 2026.

What does '10B active parameters' mean?

MiniMax M2.1 uses a Mixture-of-Experts (MoE) architecture where only 10B of its 230B total parameters are activated for each token processed. This provides access to 230B parameters worth of knowledge while only incurring the inference cost of a 10B model, making it exceptionally efficient for agentic workflows requiring many sequential calls.

How does M2.1 compare to Claude Sonnet 4.5?

M2.1 achieves 74% on SWE-bench Verified (Claude ~77%) and outperforms Claude Sonnet 4.5 in multilingual coding scenarios. The key advantage is cost: M2.1 costs approximately 10% of Claude Sonnet 4.5 ($0.30 vs $3.00 per 1M input tokens) while maintaining competitive performance, especially in agentic and tool-use scenarios.

What is the VIBE benchmark?

VIBE (Visual & Interactive Benchmark for Execution) is a new benchmark created by MiniMax that tests full-stack capability to build functional applications 'from zero to one.' It covers Web, Android, iOS, Simulation, and Backend subsets, using an Agent-as-a-Verifier (AaaV) paradigm that judges both code correctness and visual/interactive quality in real runtime environments.

What is the Digital Employee feature?

Digital Employee is M2.1's capability to perform end-to-end office automation tasks. It accepts web content in text form and controls mouse clicks and keyboard inputs via text commands. It handles workflows in administration (equipment requests, budget calculations), project management (issue tracking), and software development (Merge Request queries) autonomously.

How much does MiniMax M2.1 cost?

API pricing is $0.30/1M input tokens and $1.20/1M output tokens - approximately 10% of Claude Sonnet 4.5. MiniMax also offers Coding Plans: Starter ($10/month), Pro ($20/month), and Max ($50/month), providing significant value compared to Claude Code's pricing. OpenRouter offers slightly lower rates at $0.20-0.27/1M input.

Can I run MiniMax M2.1 locally?

Yes, M2.1 weights are available on HuggingFace and ModelScope under MIT license. You can deploy using vLLM (recommended), SGLang, or Ollama. However, the full model requires significant hardware - recommended production setup is 4x H200/H20 or 4x A100/A800 GPUs with 96GB VRAM each. Consumer setups require 2x RTX 4090 minimum with quantization.

What hardware do I need for local deployment?

Production: 4x H200/H20 or 4x A100/A800 GPUs (96GB VRAM each) supports up to 400K tokens context. Extended: 8x 144GB GPUs (1.15TB total) supports up to 3M tokens. Consumer/Development: 2x RTX 4090 minimum with AWQ/GPTQ/experts_int8 quantization. Q6 quantization achieves ~14 tokens/second.

Does M2.1 work with Claude Code?

Yes, M2.1 demonstrates excellent framework generalization. It works consistently with Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox. It also supports context management conventions like Skill.md, Claude.md/agent.md/.cursorrule files, and Slash Commands.

What are MiniMax M2.1's main limitations?

M2.1 is weaker on pure mathematical reasoning compared to GLM-4.7 (78.3% vs 95.7% on AIME 2025). It's not suited for extended autonomous research tasks where models like Kimi K2 Thinking excel. Users report inconsistencies in LaTeX understanding and role-play/character simulation. It's also text-only with no native multimodal capabilities.

How does M2.1 compare to GLM-4.7?

Both released within 24 hours (GLM-4.7 on Dec 22, M2.1 on Dec 23). M2.1 is faster with lower active parameters (10B vs 32B) and 4-7x cheaper on API pricing. GLM-4.7 excels in mathematical reasoning and has Preserved Thinking for multi-turn sessions. M2.1 leads in VIBE benchmark scores and has the Digital Employee feature. Choose M2.1 for speed/cost, GLM-4.7 for math/research.

Top comments (0)