DEV Community

Cover image for What Is MiniMax M3? The First Open-Weight Frontier Coding Model
Hassann
Hassann

Posted on • Originally published at apidog.com

What Is MiniMax M3? The First Open-Weight Frontier Coding Model

MiniMax M3 is an open-weight AI model released by MiniMax on June 1, 2026. It combines frontier-level coding, a context window of up to 1,000,000 tokens, and native multimodality for image, video, and desktop-computer interaction in one model.

Try Apidog today

That combination is the main reason developers should pay attention. Many models are strong at one or two of these areas, but M3 is positioned as the first open-weight model designed to do all three together. MiniMax has also said it will publish the open weights and a full technical report within roughly 10 days of launch, which means the model should become self-hostable shortly after release. If you’ve followed open-weight releases like Qwen 3.7, M3 is the next major entry. The launch details come from the MiniMax M3 announcement.

This guide focuses on what developers can do with M3: how it differs from other models, what MiniMax reported in benchmarks, how its sparse-attention architecture affects long-context workloads, how to access the API, and how to test integrations safely.

💡 If you plan to connect M3 to tools, agents, or production APIs, inspect the model’s responses and tool calls early. Tools like Apidog help you validate request and response shapes before you wire them into application code.

What makes M3 different

Most frontier models force a trade-off:

  • strong coding ability
  • very large context windows
  • multimodal input
  • agentic computer use
  • self-hostable/open-weight deployment

M3’s pitch is that you do not have to choose only one or two.

MiniMax M3 overview

In practical terms, M3 combines:

  • Frontier coding: MiniMax positions M3 against the strongest closed models on coding and agentic software benchmarks.
  • 1M-token context: You can pass very large inputs such as repositories, documentation sets, logs, or long conversations without aggressive truncation.
  • Native multimodality: M3 accepts image and video input. MiniMax also demonstrated desktop-computer operation, including opening a local ERP client and batch-entering invoices.
  • Open weights: Once released, teams can self-host, run private workloads, and potentially fine-tune for domain-specific use cases.

The open-weight part is especially important for developers working with sensitive data or custom infrastructure. If the weights are public, you are not limited to a hosted API forever. You can run M3 closer to your data and avoid full dependency on per-call vendor access.

For more context on why large Chinese AI labs are pushing models into the open, see this overview of the Chinese LLM price war of 2026.

The benchmark numbers to watch

MiniMax published benchmark results at launch. These are vendor-reported numbers, so treat them as MiniMax’s measurements until independent testing confirms them.

MiniMax M3 benchmark results

The standout result is 59.0% on SWE-Bench Pro. SWE-Bench Pro is a difficult software-engineering benchmark built around real-world coding tasks. You can read more about the methodology on the SWE-Bench project site.

According to MiniMax, M3:

  • beats GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro
  • lands close to Claude Opus 4.7
  • performs strongly enough to be considered near the closed-model frontier for coding tasks

M3 does not lead everywhere. On PostTrainBench, MiniMax reports:

  • M3: 0.37
  • Claude Opus 4.7: 0.42
  • GPT-5.5: 0.39

That matters because it gives a more realistic picture: M3 looks strong, but not universally ahead.

MiniMax has not yet disclosed parameter counts or active-parameter figures. Those details are expected in the technical report, so exact cost-per-parameter comparisons are not possible yet.

For a closer comparison, see MiniMax M3 vs Opus 4.7 vs GPT-5.5.

MSA architecture in plain English

M3’s long-context efficiency comes from MSA, short for MiniMax Sparse Attention.

Standard attention compares every token with every other token. That gets expensive quickly as context length grows. With a 1M-token prompt, naive attention would be extremely costly.

MiniMax Sparse Attention

Sparse attention changes the computation pattern. Instead of attending to the full sequence, each token attends to a selected subset of tokens.

MiniMax reports that MSA reduces per-token compute to roughly 1/20 of its previous-generation model. The reported inference improvements are:

  • Prefill: more than 9x faster
  • Decode: more than 15x faster

For developers, this affects architecture decisions.

Without efficient long context, you usually need to:

  • chunk documents
  • build retrieval pipelines
  • summarize old context
  • aggressively trim chat history
  • manually select files from a codebase

With cheaper long context, you can test simpler workflows first:

User request
→ include relevant repository files or long document set
→ ask M3 to reason over the full context
→ validate output
→ add retrieval/chunking only if needed
Enter fullscreen mode Exit fullscreen mode

That does not mean RAG becomes unnecessary. It means you can choose RAG for precision and scalability, not only because the model cannot fit your input.

What you can build with M3

M3 is designed for long-running agentic work: tasks where the model reads context, acts, checks results, and continues over many steps.

MiniMax showed examples including:

  • 24-hour CUDA kernel optimization: M3 autonomously worked on a kernel and reached a 9.4x speedup.
  • Autonomous paper reproduction: M3 reproduced a research paper across 18 commits and generated 23 experimental figures.
  • Computer use: M3 operated desktop software directly, such as opening a local ERP client and batch-entering invoices.

Good developer use cases include:

  • repository-wide code review
  • large-scale refactoring
  • migration planning
  • test generation across multiple packages
  • bug reproduction from logs and issue history
  • document analysis over large internal knowledge bases
  • tool-using agents that interact with APIs, files, browsers, or desktop apps

MiniMax’s product wrapper for this is MiniMax Code, which supports agent-team workflows such as multi-stage, concurrent, and dynamically adjustable processes.

One useful pattern is a Producer + Verifier loop:

Producer agent
→ proposes code, patch, query, or action

Verifier agent
→ checks correctness, schema, tests, or constraints

Only accepted output
→ moves to the next step
Enter fullscreen mode Exit fullscreen mode

This design helps reduce silent failures in agent workflows, especially when outputs trigger real tools or production APIs.

Testing M3 tool calls with Apidog

When you build agents on top of M3, the model is only one part of the system. The fragile part is usually the interface between the model and your tools.

Common issues include:

  • malformed JSON
  • missing required fields
  • wrong enum values
  • incorrect argument types
  • tool-call schema drift
  • responses that look valid but fail downstream

A practical workflow is:

  1. Define your tool schema.
  2. Send test prompts to M3.
  3. Capture the model’s tool-call output.
  4. Validate the output against the expected request schema.
  5. Save working examples as regression tests.
  6. Only then connect the call to real side-effecting systems.

For example, if your agent calls an invoice API, validate the payload before executing it:

{
  "customer_id": "cust_123",
  "invoice_date": "2026-06-01",
  "line_items": [
    {
      "description": "Consulting services",
      "quantity": 10,
      "unit_price": 150
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

You can capture and validate these responses in Apidog before letting the agent call production endpoints.

For more design guidance, see agentic workflow tool wiring: patterns and pitfalls.

How to access M3

MiniMax currently provides two access paths:

  1. subscription token plans
  2. API access

MiniMax M3 pricing and access

The subscription plans bundle a monthly token allowance.

For programmatic access, M3 uses an OpenAI-style chat-completions API.

Base URL:

https://api.minimax.io/v1
Enter fullscreen mode Exit fullscreen mode

Endpoint:

POST /chat/completions
Enter fullscreen mode Exit fullscreen mode

Model ID:

MiniMax-M3
Enter fullscreen mode Exit fullscreen mode

Authentication uses a bearer token:

POST https://api.minimax.io/v1/chat/completions
Authorization: Bearer $API_KEY
Content-Type: application/json
Enter fullscreen mode Exit fullscreen mode

A minimal request looks like this:

curl https://api.minimax.io/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMax-M3",
    "messages": [
      {
        "role": "user",
        "content": "Review this function and suggest improvements."
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

MiniMax says you can call the API using:

  • raw HTTP
  • the Anthropic SDK, which MiniMax recommends
  • the OpenAI SDK

The official MiniMax API reference has the full schema.

Two pricing details matter for implementation:

  • Inputs of 512K tokens or fewer are billed at the standard rate.
  • Inputs above 512K tokens use a higher long-context rate.

MiniMax also provides two service tiers:

  • standard, the default
  • priority

MiniMax has not published an exact per-token price in the provided launch details, so check the current docs before budgeting production usage.

For a step-by-step setup, see how to use the MiniMax M3 API. If you want no-cost options, see how to use MiniMax M3 for free.

Once you have an API key, you can Download Apidog, send your first request, and inspect the response shape before writing application code.

How M3 compares with other open-weight models

M3 enters a crowded open-weight model field. Current contenders include:

  • DeepSeek V4-pro
  • Qwen 3.7
  • Kimi k2.6
  • GLM-5.1

Each model has different strengths across coding, reasoning, multilingual tasks, and cost.

M3’s differentiator is the bundle:

frontier coding
+ 1M-token context
+ native multimodality
+ computer use
+ open weights
Enter fullscreen mode Exit fullscreen mode

Many open-weight peers compete strongly on one axis. M3 is trying to cover several at once.

That said, developers should wait for:

  • the published weights
  • the technical report
  • independent benchmarks
  • real-world latency and cost measurements
  • self-hosting requirements

If you are already evaluating open models, the Qwen 3.7 overview is a useful comparison point.

FAQ

Is MiniMax M3 open source?

M3 is described as open-weight. MiniMax has promised to publish the model weights and a technical report within roughly 10 days of the June 1, 2026 launch.

As of writing, the weights are not available yet, so you cannot download and self-host M3 today. Once released, you should be able to run M3 on your own infrastructure.

What is the context window?

M3 supports up to 1,000,000 tokens.

MiniMax says its MSA architecture makes that context size more practical by cutting per-token compute to roughly 1/20 of the previous-generation model.

Is MiniMax M3 free?

Not directly. MiniMax sells subscription token plans starting at $20/mo for Plus and also offers API access billed by tokens.

MiniMax has not published a free tier in the provided launch details, but how to use MiniMax M3 for free covers available no-cost routes.

How does M3 compare to Claude Opus 4.7?

According to MiniMax’s reported benchmarks:

  • M3 reaches 59.0% on SWE-Bench Pro.
  • M3 beats Opus 4.7 on SVG-Bench.
  • M3 trails Opus 4.7 on PostTrainBench: 0.37 vs 0.42.

These are vendor-reported numbers, so wait for independent testing before treating them as settled.

When will the weights be released?

MiniMax committed to releasing the open weights and technical report within about 10 days of the June 1, 2026 launch.

The technical report should also include parameter counts, which MiniMax has not disclosed yet.

Can M3 handle images and video?

Yes. M3 is natively multimodal and accepts image and video input.

It also supports computer use, meaning it can operate desktop applications directly rather than only describing visual content.

The short version

MiniMax M3 is an open-weight model that combines frontier coding, a 1M-token context window, and native multimodality. Its MSA architecture is designed to make long-context inference cheaper and faster, while MiniMax’s reported SWE-Bench Pro result places it near the closed-model frontier.

For developers, the immediate path is:

  1. get API access
  2. test basic chat-completion calls
  3. inspect response formats
  4. validate tool-call schemas
  5. start with small agent workflows
  6. scale once the weights, technical report, and independent benchmarks are available

If you are building with M3, test your first API calls and tool responses in Apidog before connecting them to production workflows.

Top comments (0)