MiniMax M3 is a frontier reasoning and coding model with a context window of up to 1,000,000 tokens. You can use that large context to send an entire repository, a long log file, or a design document and ask the model to reason across it in one request. For background on the model itself, start with what is MiniMax M3.
This guide focuses on implementation. You’ll generate an API key, send your first MiniMax M3 request with curl, Python, and Node.js, then inspect the raw request and response in Apidog before wiring it into your app. Download Apidog if you want to follow along.
Keep the official MiniMax API docs open while you build.
What you’ll need
- A MiniMax account at platform.minimax.io.
- A MiniMax API key.
- A billing method: pay-as-you-go credits or a subscription token plan.
For curl, you do not need anything else installed. For SDK examples, use Python 3.8+ or Node.js 18+.
Step 1: Create and store your API key
Sign in at platform.minimax.io, open the API keys section, and create a new key.
MiniMax supports two credential types:
- Regular API Key: bills against your pay-as-you-go balance.
- Subscription Key: consumes token credits from a Plus, Max, or Ultra plan. When plan tokens run out, calls using that key stop until renewal or until you switch to a pay-as-you-go key.
Choose the key type that matches your billing setup. Copy the key once and store it securely.
Do not hardcode the key in your source code. Export it as an environment variable:
export MINIMAX_API_KEY="your-key-here"
This keeps the secret out of your repository, shared files, and shell history where possible. If you work with API keys inside an editor, apply the same rules there. See VS Code extension API key security for common leak patterns.
Step 2: Send your first MiniMax M3 request
The chat endpoint is:
POST https://api.minimax.io/v1/chat/completions
Authentication uses a bearer token:
Authorization: Bearer $MINIMAX_API_KEY
The model ID is:
MiniMax-M3
Start with the smallest useful curl request:
curl https://api.minimax.io/v1/chat/completions \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "MiniMax-M3",
"messages": [
{
"role": "user",
"content": "Refactor this function to be async."
}
]
}'
You can call M3 in three ways:
- Raw HTTP
- OpenAI-compatible SDK
- Anthropic SDK
MiniMax recommends the Anthropic SDK, but the OpenAI SDK and raw HTTP also work against the same endpoint.
Python with the OpenAI SDK
The main change from a normal OpenAI setup is base_url:
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.minimax.io/v1",
api_key=os.environ["MINIMAX_API_KEY"],
)
response = client.chat.completions.create(
model="MiniMax-M3",
messages=[
{
"role": "user",
"content": "Refactor this function to be async.",
}
],
)
print(response.choices[0].message.content)
Node.js with the OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.minimax.io/v1",
apiKey: process.env.MINIMAX_API_KEY,
});
const response = await client.chat.completions.create({
model: "MiniMax-M3",
messages: [
{
role: "user",
content: "Refactor this function to be async.",
},
],
});
console.log(response.choices[0].message.content);
If you’ve used the Qwen 3.7 API, this pattern is familiar: point an OpenAI-compatible SDK at a different base_url. For full client options, check the OpenAI Python SDK docs and Anthropic SDK docs.
Step 3: Test the request in Apidog
Before embedding the call in your application, send it manually and inspect the raw response. This helps catch authentication issues, schema mismatches, and response-shape assumptions early.
Open Apidog and create the request.
- Create a new HTTP request.
- Set the method to
POST. - Set the URL to:
https://api.minimax.io/v1/chat/completions
- Open the Environments panel.
- Add an environment variable named
MINIMAX_API_KEY. - Store your key as the variable value.
- Add this request header:
Authorization: Bearer {{MINIMAX_API_KEY}}
- Add this header too:
Content-Type: application/json
- Set the body type to raw JSON.
- Paste the same payload used in the curl example:
{
"model": "MiniMax-M3",
"messages": [
{
"role": "user",
"content": "Refactor this function to be async."
}
]
}
- Click Send and inspect the response body.
[Screenshot: the MiniMax-M3 request and response in Apidog]
Storing the token as an environment variable lets you share the request with teammates without exposing the secret. It also makes it easy to switch between a pay-as-you-go key and a subscription key by changing one variable.
If you later enable streaming, Apidog can show server-sent events as they arrive, which is useful before you write parsing logic in your application.
Step 4: Toggle reasoning output
M3 is a reasoning model. By default, it returns the final answer. You can also ask it to expose intermediate reasoning by passing reasoning_split.
With the OpenAI SDK, pass it through extra_body:
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.minimax.io/v1",
api_key=os.environ["MINIMAX_API_KEY"],
)
response = client.chat.completions.create(
model="MiniMax-M3",
messages=[
{
"role": "user",
"content": "Refactor this function to be async.",
}
],
extra_body={
"reasoning_split": True
},
)
print(response.choices[0].message.reasoning_details[0]["text"]) # reasoning
print(response.choices[0].message.content) # final answer
When reasoning_split is enabled:
- Reasoning text is available at:
response.choices[0].message.reasoning_details[0]["text"]
- Final output stays at:
response.choices[0].message.content
Use reasoning output for complex tasks such as multi-step refactors, debugging, code review, and verification workflows. Turn it off for simple or latency-sensitive requests where extra reasoning tokens add cost and time without much value.
Step 5: Use the 1M-token context carefully
The large context window is the main reason to use M3. For example, you can send a large production log and ask one question across the whole file:
with open("production-2026-05-30.log") as f:
log_text = f.read()
response = client.chat.completions.create(
model="MiniMax-M3",
messages=[
{
"role": "user",
"content": (
"Find the root cause of the 502 spike at 14:20 UTC.\n\n"
f"{log_text}"
),
}
],
)
print(response.choices[0].message.content)
But do not send a million tokens by default.
MiniMax charges a standard rate for calls with 512K input tokens or fewer. Once input exceeds 512K tokens, a higher long-context rate applies. That means going from a 400K-token prompt to a 600K-token prompt crosses a pricing threshold.
Practical rules:
- Send only the relevant slice of logs, code, or documentation.
- Trim repeated context in agent loops.
- Summarize or chunk when a full-context call is not required.
- Watch token size before crossing the 512K threshold.
For more cost-control patterns, see how to reduce agent token costs.
Step 6: Add tool calling
M3 supports tool calling, so you can let the model request actions from your application.
First, declare the tools the model is allowed to call:
tools = [
{
"type": "function",
"function": {
"name": "run_tests",
"description": "Run the test suite for a given module path.",
"parameters": {
"type": "object",
"properties": {
"module": {
"type": "string"
}
},
"required": ["module"],
},
},
}
]
Then pass the tools with your chat request:
response = client.chat.completions.create(
model="MiniMax-M3",
messages=[
{
"role": "user",
"content": "Fix the failing test in auth/session.py and confirm it passes.",
}
],
tools=tools,
)
When the model decides to call a tool, the response includes a tool_calls array. Your application should:
- Read the tool call from the response.
- Run the corresponding local function.
- Append the tool result as a
toolmessage. - Call the API again so the model can continue.
That handshake is where many agent bugs happen. For wiring patterns and failure modes, read agentic workflow tool wiring.
Apidog is useful here too: save each step of the multi-turn exchange as a separate request so you can verify the initial request, tool-call response, tool result, and follow-up request independently.
Step 7: Send multimodal input
M3 also supports multimodal input, so you can include image content alongside text prompts.
Use the standard content-parts message shape for multimodal requests. The exact field names may change faster than the text endpoint, so check the MiniMax API reference before implementing production logic.
Pricing and tiers
There are two separate billing-related concepts to understand.
Token plans
Token plans define your credit budget.
MiniMax subscription tiers include:
- Plus: $20
- Max: $50
- Ultra: $120
Each tier bundles a pool of token credits consumed by your Subscription Key. Pay-as-you-go uses a regular API Key and bills against your balance instead.
Service tiers
Service tiers control scheduling priority:
-
standard: default tier, suitable for most workloads. -
priority: intended for latency-sensitive or SLA-bound traffic.
Your real cost depends on:
- Input token count
- Whether the request crosses the 512K input-token threshold
- Billing plan
- Service tier
For current pricing, check the MiniMax pricing and model page and the API docs.
FAQ
Is there a free way to try M3?
Yes. You can test the model without committing to a plan. See how to use MiniMax M3 for free.
Which SDKs work with the API?
You can use:
- Raw HTTP
- Anthropic SDK
- OpenAI SDK
MiniMax recommends the Anthropic SDK, but all three use the same endpoint:
https://api.minimax.io/v1/chat/completions
For OpenAI-compatible clients, set the MiniMax base URL:
https://api.minimax.io/v1
How do I stream responses?
Add stream: true to the request body:
{
"model": "MiniMax-M3",
"stream": true,
"messages": [
{
"role": "user",
"content": "Explain this error log."
}
]
}
The API returns server-sent events. SDKs typically expose an iterator you can loop over to read chunks as they arrive. Test streaming in Apidog first so you can inspect the event format before writing parser code.
What is the rate limit?
Rate limits depend on your account tier and whether you use standard or priority service. If you receive a 429, back off and retry. For latency-sensitive traffic, consider the priority tier. Check your account dashboard and the API docs for current limits.
How does the 512K threshold affect cost?
Calls with 512K input tokens or fewer use the standard rate. Calls above 512K input tokens use the higher long-context rate. Trim prompts aggressively, especially in agent loops where token cost compounds across calls.
Can I self-host M3 instead of using the API?
This guide covers the hosted API, which is the fastest path to implementation. Self-hosting depends on what MiniMax publishes for M3, so check the model page for current weight and license details.
Wrap
You now have the core pieces needed to call MiniMax M3:
- An API key stored as an environment variable
- Working curl, Python, and Node.js requests
- A way to inspect requests in Apidog
- Reasoning output with
reasoning_split - Awareness of the 512K long-context pricing threshold
- A basic tool-calling flow
The fastest next step is to run one real request manually. Add the endpoint to Apidog, store your bearer token as an environment variable, send the refactor prompt, and inspect the raw response. Once the response shape is clear, integrating MiniMax M3 into your code should be straightforward.



Top comments (0)