Richard Gibbons

Posted on Jan 2 • Originally published at digitalapplied.com on Dec 24, 2025

MiniMax M2.1 Guide: Digital Employee for AI Coding

#minimax #aicoding #opensourcellm #digitalemployee

MiniMax M2.1 achieves 74% SWE-bench and 88.6% VIBE with 10B active params. The $0.30/1M token Digital Employee for agentic workflows.

Key Statistics

230B Total Parameters (MoE)
10B Active Parameters
197K Context Window
88.6% VIBE Benchmark

Key Takeaways

10B Active Parameters: 230B MoE architecture with only 10B active per token - most efficient SOTA model
88.6% VIBE Benchmark: 74% SWE-bench Verified and industry-leading scores on full-stack app building
90% Cost Reduction: $0.30/1M input tokens - approximately 10% of Claude Sonnet 4.5's price
Digital Employee: End-to-end office automation beyond just coding - admin, PM, and dev workflows
Multilingual Excellence: Excels in Rust, Java, Go, Kotlin, TypeScript, and more programming languages
Framework Support: Native compatibility with Claude Code, Cline, Kilo, Roo Code, and BlackBox

What Is MiniMax M2.1
Company Background
Technical Specifications
Key Improvements
Benchmark Performance
Digital Employee
Pricing & Access
Getting Started
When to Use M2.1

What Is MiniMax M2.1

Breaking: MiniMax M2.1 released December 23, 2025 - just one day after GLM-4.7. Two major Chinese AI models in 24 hours signals accelerating competition in the open-source coding model space.

MiniMax M2.1 represents a fundamental shift in how we think about AI coding assistants. Released December 23, 2025, it's not just another model optimized for chat - it's designed from the ground up to be a "Digital Employee" capable of handling end-to-end workflows in real production environments.

The key innovation is efficiency: M2.1 uses a Mixture-of-Experts (MoE) architecture with 230 billion total parameters but only activates 10 billion per token. This means you get access to the knowledge of a 230B model at the inference cost of a 10B model - making it exceptionally fast and affordable for the rapid-fire cycles of agentic workflows.

The Core Value Proposition

Frontier performance at 10% the cost. MiniMax M2.1 achieves 74% on SWE-bench Verified - competitive with Claude Sonnet 4.5 - while costing approximately $0.30/1M input tokens compared to Claude's $3.00/1M.

This isn't just about saving money. The 10B active parameter footprint means M2.1 is significantly faster for agentic loops - the Plan -> Code -> Run -> Fix cycles that define modern AI-assisted development.

Core Capabilities

Multilingual Coding: Systematic enhancements in Rust, Java, Go, C++, Kotlin, TypeScript, and more - covering the complete stack from systems to applications.
Digital Employee: End-to-end office automation: admin tasks, project management, data analysis, and software development workflows.
Vibe Coding: Improved design comprehension and aesthetic output for web apps, 3D simulations, and native mobile development.

Company Background: MiniMax

MiniMax is part of China's "AI Tigers" - the leading AI startups alongside DeepSeek, Zhipu (Z.ai), Baichuan, and Moonshot/Kimi. Founded in December 2021 and headquartered in Shanghai, MiniMax has rapidly grown to a $4 billion valuation with backing from tech giants and strategic investors.

Company Profile

Attribute	Value
Founded	December 2021
Headquarters	Shanghai, China
Valuation	$4 billion
Total Funding	$850M+ (since 2023)
IPO Target	Hong Kong Q1 2026

Key Investors

Alibaba (Lead)
Tencent
MiHoYo
Hillhouse
HongShan
IDG Capital

Notable: MiHoYo (Genshin Impact developer) investment signals gaming/creative AI applications. 70% of revenue comes from overseas markets.

Product Portfolio

Product	Category	Notes
Talkie	AI Companion App	29M MAU, #4 US AI app downloads
Hailuo AI	Video Generation	Competing with OpenAI Sora in AI video generation
Conch AI	Educational AI	Strong presence in Asian education markets
MiniMax Agent	AI Agent Platform	Built on M2.1, primary offering for developers

IPO Context: M2.1's release comes just days after MiniMax passed the Hong Kong Stock Exchange listing hearing (December 21, 2025). The model launch appears strategically timed to build momentum before their planned Q1 2026 IPO.

Technical Specifications

Architecture Deep Dive

Specification	M2.1	M2 (Previous)
Architecture	Sparse MoE	Sparse MoE
Total Parameters	230B	230B
Active Parameters	10B per token	10B per token
Context Window	197K tokens	128K tokens
License	MIT (Open-Source)	MIT (Open-Source)
Sparsity Ratio	~23:1	~23:1
Recommended Params	temp: 1.0, top_p: 0.95, top_k: 40	temp: 1.0, top_p: 0.95

Why 10B Active Matters

The 23:1 sparsity ratio is the key to M2.1's efficiency. For every token processed, only 10B of the 230B parameters are activated. This design choice has three major implications:

Speed: Inference is dramatically faster than dense models of similar capability
Cost: Lower compute per token translates directly to lower API pricing
Agentic Loops: Fast sequential calls enable responsive Plan -> Code -> Run -> Fix cycles

Key Improvements Over M2

M2 (released October 2025) focused on cost and accessibility. M2.1 shifts focus to real-world complex tasks - particularly usability across more programming languages and office scenarios.

Multi-Language Programming Excellence

Real-world systems are polyglot. M2.1 systematically enhances capabilities across the full development stack:

Level	Languages
Systems Level	Rust, C++, Golang
Enterprise	Java, Kotlin
Web & Mobile	TypeScript, JavaScript, Objective-C, Swift

Vibe Coding & Aesthetic Design

M2.1 addresses the "widely recognized weakness in mobile development" across the industry:

Native App Mastery: Significantly strengthened Android (Kotlin) and iOS (Swift/Objective-C) development
Design Comprehension: Improved understanding of layout, typography, and color schemes
3D & Simulation: Complex interactions, scientific visualizations, high-quality 3D scenes

Interleaved Thinking Architecture

As one of the first open-source models to systematically introduce Interleaved Thinking:

Composite Instructions: Handles multi-step office workflows with integrated execution
Concise Outputs: More efficient thought chains, lower token consumption
Self-Correction: Reads errors, adjusts immediately without explicit prompting

Benchmark Performance

Software Engineering Benchmarks

Benchmark	M2.1	Claude Sonnet 4.5	GLM-4.7	DeepSeek V3.2
SWE-bench Verified	74.0%	~77%	73.8%	73.1%
SWE-Multilingual	72.5%	Lower	-	-
Multi-SWE-Bench	49.4%	Lower	-	-
AIME 2025 (Math)	78.3%	-	95.7%	93.1%

VIBE Benchmark: A New Standard

What is VIBE?

Visual & Interactive Benchmark for Execution

MiniMax introduced VIBE to measure what traditional benchmarks miss: the ability to build functional applications "from zero to one." Unlike SWE-bench which tests bug fixes, VIBE tests full-stack creation.

The key innovation is Agent-as-a-Verifier (AaaV) - an automated assessment in real runtime environments that judges both code correctness AND visual/interactive quality.

VIBE Subset	M2.1 Score	What It Tests
VIBE-Web	91.5%	Frontend development, layouts, interactions
VIBE-Android	89.7%	Native Android app development (Kotlin)
VIBE-iOS	Strong	Native iOS app development (Swift)
VIBE-Simulation	Strong	3D rendering, physics, interactive scenes
VIBE-Backend	Strong	API development, database integration
VIBE Aggregate	88.6%	Overall full-stack capability

Framework Generalization

M2.1 was specifically evaluated across multiple coding agent frameworks, demonstrating exceptional stability:

Claude Code
Droid (Factory AI)
Cline
Kilo Code
Roo Code
BlackBox

Also supports context management conventions: Skill.md, Claude.md/agent.md/.cursorrule, and Slash Commands.

Digital Employee Capabilities

The "Digital Employee" is M2.1's signature feature - moving beyond coding assistance to full office automation. It accepts web content in text form and controls mouse clicks and keyboard inputs via text-based commands.

Administration

Collect equipment requests from Slack
Search internal servers for pricing
Calculate budgets and verify limits
Record inventory changes

Project Management

Search for blocked issues
Consult team members for solutions
Update issue status
Track project progress

Software Development

Find Merge Request history
Identify file modifications
Notify relevant team members
Automate code review workflows

Showcase Demonstrations

MiniMax provides interactive demos showing M2.1's capabilities:

Project	Technology	Highlights
3D Christmas Tree	React Three Fiber	7,000+ instances, gesture interaction, particle animations
3D Lego Sandbox	Three.js	Grid snapping, collision detection, multi-angle rotation
Drum Machine	Web Audio API	16-step sequencer with glitch effects
Photographer Portfolio	HTML/CSS	Brutalist typography, asymmetrical layout
Android Gravity Sim	Kotlin	Gyroscope-driven, Easter egg reveals
iOS Widget	Swift	Interactive Home Screen widget with animations
Rust Security Tool	Rust	CLI + TUI Linux audit tool with risk rating

Pricing & Access

API Pricing Comparison

Model	Input (per 1M)	Output (per 1M)	Relative Cost
MiniMax M2.1	$0.30	$1.20	~10% of Claude
M2.1 (OpenRouter)	$0.20-0.27	$1.06-1.10	Even cheaper
GLM-4.7	$0.60	$2.20	~15% of Claude
Claude Sonnet 4.5	$3.00	$15.00	Baseline
DeepSeek-V3.2	$0.27	$1.10	~10% of Claude

Cost Comparison Example

At Scale: 10,000 API Calls (100K input + 50K output tokens each)

Model	Cost
Claude Sonnet 4.5	~$10,500
MiniMax M2.1	~$900

Annual savings at moderate usage: $100,000+

Access Methods

Hosted API:

MiniMax Platform (platform.minimax.io)
OpenRouter (openrouter.ai)
Fireworks AI (fireworks.ai)

Self-Hosted:

HuggingFace (MiniMaxAI/MiniMax-M2.1)
ModelScope (Available)
Ollama (ollama pull minimax-m2.1)

Getting Started

Claude Code Integration

Configure settings.json:

{
  "apiProvider": "openrouter",
  "openRouterApiKey": "your-openrouter-key",
  "apiModelId": "minimax/minimax-m2.1",
  "customInstructions": "Use Interleaved Thinking for complex tasks"
}

API Quick Start

Python Example:

import openai

client = openai.OpenAI(
    api_key="your-minimax-api-key",
    base_url="https://api.minimax.io/v1"
)

response = client.chat.completions.create(
    model="minimax-m2.1",
    messages=[
        {"role": "user", "content": "Build a React component for a todo list"}
    ],
    temperature=1.0,
    top_p=0.95
)

print(response.choices[0].message.content)

Hardware Requirements for Local Deployment

Setup	Hardware	Context Support
Production (Recommended)	4x H200/H20 or 4x A100/A800 (96GB each)	Up to 400K tokens
Extended Production	8x 144GB GPUs (1.15TB total)	Up to 3M tokens
Consumer/Development	2x RTX 4090 + quantization (AWQ/GPTQ)	Limited, ~14 tok/s at Q6

vLLM Recommended: Use vLLM nightly version (after commit cf3eacfe) with tensor-parallel-size 4. TP8 is not supported - use DP+EP for configurations with more than 4 GPUs.

When to Use MiniMax M2.1

Choose M2.1 When

Multilingual codebase (Rust, Java, Go, Kotlin, TypeScript)
Cost-sensitive projects needing frontier performance
Agentic workflows requiring fast sequential calls
Full-stack app development from scratch
Office automation beyond just coding
Using Claude Code, Cline, or Roo Code frameworks

Consider Alternatives When

Deep mathematical reasoning is critical (use GLM-4.7)
Extended autonomous research sessions (use Kimi K2)
LaTeX-heavy documentation projects
Role-play or character simulation
Maximum absolute accuracy is required (use Claude)
Multimodal input/output needed

M2.1 vs GLM-4.7 vs Kimi K2

Dimension	MiniMax M2.1	GLM-4.7	Kimi K2
Best For	Interactive IDE agents	Math & multi-turn sessions	Extended research
Speed	Fastest	Moderate	Slower
Active Params	10B	32B	-
API Pricing	$0.30/1M	$0.60/1M	$0.40/1M
Unique Feature	Digital Employee	Preserved Thinking	200+ tool calls

Community Endorsements

"We're excited for powerful open-source models like M2.1 that bring frontier performance (and in some cases exceed the frontier) for a wide variety of software development tasks. Developers deserve choice, and M2.1 provides that much needed choice!"

Eno Reyes, Co-Founder, CTO of Factory AI

"Our users have come to rely on MiniMax for frontier-grade coding assistance at a fraction of the cost, and early testing shows M2.1 excelling at everything from architecture and orchestration to code reviews and deployment."

Scott Breitenother, Co-Founder, CEO of Kilo

"M2.1 handles the nuances of complex, multi-step programming tasks with a level of consistency that is rare in this space. By providing high-quality reasoning and context awareness at scale, MiniMax has become a core component of how we help developers."

Robert Rizk, Co-Founder, CEO of BlackBox

"The latest M2.1 release builds on that foundation with meaningful improvements in speed and reliability, performing well across a wider range of languages and frameworks. It's a great choice for high-throughput, agentic coding workflows."

Matt Rubens, Co-Founder, CEO of RooCode

Frequently Asked Questions

What is MiniMax M2.1?

MiniMax M2.1 is an open-source large language model released December 23, 2025, featuring a 230B Mixture-of-Experts (MoE) architecture with only 10B active parameters per token. It's designed for real-world complex tasks including multi-language programming, agentic workflows, and office automation, positioning itself as a 'Digital Employee' rather than just a coding assistant.

Who is MiniMax?

MiniMax is a Shanghai-based AI company founded in December 2021 with a $4 billion valuation. Key investors include Alibaba, Tencent, and MiHoYo. They operate products like Talkie (29M MAU AI companion app), Hailuo AI (video generation), and Conch AI (education). They're planning a Hong Kong IPO in Q1 2026.

What does '10B active parameters' mean?

MiniMax M2.1 uses a Mixture-of-Experts (MoE) architecture where only 10B of its 230B total parameters are activated for each token processed. This provides access to 230B parameters worth of knowledge while only incurring the inference cost of a 10B model, making it exceptionally efficient for agentic workflows requiring many sequential calls.

How does M2.1 compare to Claude Sonnet 4.5?

M2.1 achieves 74% on SWE-bench Verified (Claude ~77%) and outperforms Claude Sonnet 4.5 in multilingual coding scenarios. The key advantage is cost: M2.1 costs approximately 10% of Claude Sonnet 4.5 ($0.30 vs $3.00 per 1M input tokens) while maintaining competitive performance, especially in agentic and tool-use scenarios.

What is the VIBE benchmark?

VIBE (Visual & Interactive Benchmark for Execution) is a new benchmark created by MiniMax that tests full-stack capability to build functional applications 'from zero to one.' It covers Web, Android, iOS, Simulation, and Backend subsets, using an Agent-as-a-Verifier (AaaV) paradigm that judges both code correctness and visual/interactive quality in real runtime environments.

What is the Digital Employee feature?

Digital Employee is M2.1's capability to perform end-to-end office automation tasks. It accepts web content in text form and controls mouse clicks and keyboard inputs via text commands. It handles workflows in administration (equipment requests, budget calculations), project management (issue tracking), and software development (Merge Request queries) autonomously.

How much does MiniMax M2.1 cost?

API pricing is $0.30/1M input tokens and $1.20/1M output tokens - approximately 10% of Claude Sonnet 4.5. MiniMax also offers Coding Plans: Starter ($10/month), Pro ($20/month), and Max ($50/month), providing significant value compared to Claude Code's pricing. OpenRouter offers slightly lower rates at $0.20-0.27/1M input.

Can I run MiniMax M2.1 locally?

Yes, M2.1 weights are available on HuggingFace and ModelScope under MIT license. You can deploy using vLLM (recommended), SGLang, or Ollama. However, the full model requires significant hardware - recommended production setup is 4x H200/H20 or 4x A100/A800 GPUs with 96GB VRAM each. Consumer setups require 2x RTX 4090 minimum with quantization.

What hardware do I need for local deployment?

Production: 4x H200/H20 or 4x A100/A800 GPUs (96GB VRAM each) supports up to 400K tokens context. Extended: 8x 144GB GPUs (1.15TB total) supports up to 3M tokens. Consumer/Development: 2x RTX 4090 minimum with AWQ/GPTQ/experts_int8 quantization. Q6 quantization achieves ~14 tokens/second.

Does M2.1 work with Claude Code?

Yes, M2.1 demonstrates excellent framework generalization. It works consistently with Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox. It also supports context management conventions like Skill.md, Claude.md/agent.md/.cursorrule files, and Slash Commands.

What are MiniMax M2.1's main limitations?

M2.1 is weaker on pure mathematical reasoning compared to GLM-4.7 (78.3% vs 95.7% on AIME 2025). It's not suited for extended autonomous research tasks where models like Kimi K2 Thinking excel. Users report inconsistencies in LaTeX understanding and role-play/character simulation. It's also text-only with no native multimodal capabilities.

How does M2.1 compare to GLM-4.7?

Both released within 24 hours (GLM-4.7 on Dec 22, M2.1 on Dec 23). M2.1 is faster with lower active parameters (10B vs 32B) and 4-7x cheaper on API pricing. GLM-4.7 excels in mathematical reasoning and has Preserved Thinking for multi-turn sessions. M2.1 leads in VIBE benchmark scores and has the Digital Employee feature. Choose M2.1 for speed/cost, GLM-4.7 for math/research.