Mistral Large 3 launched in December 2025 as Mistral's flagship open-weight model. Six months later it remains the largest model Mistral has publicly released under a permissive license. This guide covers the architecture, benchmarks, pricing, and practical considerations for developers deciding whether to use it in 2026.
What Mistral Large 3 Is
Mistral Large 3 (model ID mistral-large-2512, the 2512 indicating December 2025) is a sparse Mixture-of-Experts (MoE) model with 675 billion total parameters and 41 billion active parameters per forward pass. Mistral trained it from scratch on 3,000 NVIDIA H200 GPUs.
The MoE architecture is the key efficiency decision. Instead of activating all 675B parameters for every token, the model routes each token through a subset of "expert" subnetworks. With 41B active parameters, Mistral Large 3 runs at roughly the same computational cost as a 41B dense model while accessing the capacity of a 675B one.
For self-hosting, that matters significantly: memory footprint and inference compute are determined by the active parameter count, not the total.
License: Apache 2.0
Mistral Large 3 is released under Apache 2.0. This means:
- Free commercial use
- Can be fine-tuned and redistributed
- No usage restrictions beyond standard Apache terms
- Model weights downloadable from HuggingFace:
mistralai/Mistral-Large-3-675B-Instruct-2512
The license is what distinguishes Mistral Large 3 from comparable models. GPT-4o and Claude Opus 4 are API-only. Llama 4 405B carries a Meta community license with usage restrictions above 700M monthly users. Mistral Large 3 has no such conditions.
For enterprise deployment teams that need to run models on-premises, in air-gapped environments, or with full weight access for fine-tuning, this is the practical differentiator.
Specifications at a Glance
Verified from official Mistral documentation and HuggingFace model card.
| Specification | Value |
|---|---|
| Total parameters | 675B |
| Active parameters | 41B |
| Architecture | Sparse MoE |
| Context window | 256,000 tokens |
| License | Apache 2.0 |
| Model ID (Mistral API) |
mistral-large-latest or mistral-large-2512
|
| HuggingFace ID | mistralai/Mistral-Large-3-675B-Instruct-2512 |
| Release date | December 2025 |
| Training hardware | 3,000 NVIDIA H200 GPUs |
Benchmarks
Scores from ArtificialAnalysis.ai and IntuitionLabs (independent evaluators).
| Benchmark | Score |
|---|---|
| MMLU | ~85.5% |
| MMLU-Pro | 73.11% |
| MATH-500 | 93.60% |
| HumanEval | Top tier (varies by test configuration) |
The MATH-500 score of 93.60% is high relative to most non-reasoning-specialized models. This reflects Mistral's investment in mathematical reasoning capability during training.
Important caveat: Benchmark scores are useful for directional comparison, not absolute guarantees. Model performance on your specific task will differ based on instruction format, system prompt, temperature, and the nature of your domain. Always run your own evaluation on a representative sample before making infrastructure decisions.
API Pricing
Verified from Mistral API documentation and pricepertoken.com (May 2026).
| Token type | Price |
|---|---|
| Input | $0.50 per 1M tokens |
| Output | $1.50 per 1M tokens |
At $0.50/$1.50 per million tokens, Mistral Large 3 is positioned as a high-quality model at roughly half the cost of closed-source frontier options. For comparison, the pricing tier aligns with mid-tier API models from OpenAI and Anthropic while offering open weights.
Where to Access It
Via Mistral API:
curl https://api.mistral.ai/v1/chat/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-large-latest",
"messages": [{"role": "user", "content": "Explain MoE architecture in one paragraph."}]
}'
Via Python SDK:
from mistralai import Mistral
client = Mistral(api_key="YOUR_MISTRAL_API_KEY")
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "user", "content": "Write a Python function to parse JSONL files."}
]
)
print(response.choices[0].message.content)
Via Azure AI Foundry:
Mistral Large 3 is available on Azure AI Foundry through Mistral's partnership with NVIDIA and Microsoft. This allows Azure customers to access it with Azure-native billing and compliance controls.
Self-hosted:
Download weights from HuggingFace and serve with vLLM, llama.cpp (for quantized versions), or Mistral's own inference tools. The 41B active parameter count makes it feasible to run on multi-GPU setups that cannot fit a 70B dense model at full precision.
Using the 256K Context Window
256,000 tokens is enough for approximately 200,000 words — a long novel, several codebases, or months of conversation history in a single context. In practice, there are two considerations:
Cost: Input tokens are billed at $0.50/1M, so a 200K-token context costs $0.10 per call. For use cases that genuinely need long context, that is reasonable. For tasks that do not, context management (summarization, retrieval) remains more economical.
Quality at context extremes: Models trained with long context windows often exhibit "lost in the middle" degradation — important information buried in the middle of a very long context is weighted less than information at the beginning or end. This is a general LLM limitation, not specific to Mistral. If your use case places critical information in the middle of large documents, test explicitly at that position.
Function Calling and Tool Use
Mistral Large 3 supports native function calling in the same JSON schema format as OpenAI:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
response = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto"
)
Tool use quality is one of the model's strong points, consistent with the MMLU-Pro scores.
Mistral Large 3 vs the Broader Mistral 2026 Lineup
As of May 2026, Mistral has several models in production. Understanding the lineup helps pick the right one:
| Model | Best for | Notes |
|---|---|---|
| Mistral Large 3 | Highest-quality tasks, self-hosting, fine-tuning | 675B MoE, Apache 2.0, $0.50/$1.50/1M |
| Mistral Small 4 | General use, lower cost | Dense model, merges reasoning + vision + coding |
| Mistral Medium 3.5 | Long-running agentic tasks | Powers Le Chat Work Mode and Mistral Vibe |
| Codestral | Code-specialized tasks | Dedicated coding model |
| Ministral | Edge / low-latency | Small models for on-device use |
A note on Le Chat Work Mode: Le Chat's "Work Mode" feature — which runs multi-step agentic tasks like email research, calendar management, and document production — is powered by Mistral Medium 3.5, not Mistral Large 3. The two are separate products at different capability levels. Large 3 is the open flagship; Work Mode is a product feature built on Medium 3.5's instruction-following and long-context strengths.
When to Choose Mistral Large 3
It makes sense when:
- You need open weights (Apache 2.0) for self-hosting, air-gap deployment, or fine-tuning
- You want a strong general model at a moderate API price point
- Your use case involves long documents and needs genuine 256K context support
- You are building on Azure and want Mistral native on Azure AI Foundry
Consider alternatives when:
- You need the latest reasoning capabilities (dedicated reasoning models like o3 or Gemini 2.5 Pro outperform Large 3 on multi-step reasoning tasks)
- You need multimodal (image/audio) input — Large 3 is text-only; Mistral Small 4 includes vision
- Cost is the primary constraint — Mistral Small 4 at a lower price tier handles most general tasks
Fine-Tuning
Apache 2.0 means you can fine-tune and redistribute Mistral Large 3 without restrictions. For fine-tuning at this model size, common approaches:
- QLoRA (quantized low-rank adaptation): Reduces memory requirement significantly; feasible on consumer multi-GPU hardware with quantization
- LoRA at fp16/bf16: Requires approximately 80–100 GB VRAM in practice for the 41B active weight subset
- Full fine-tuning: Requires a multi-node setup; practical only with datacenter hardware
Mistral's own fine-tuning API (fine_tuning.jobs.create) is available through the Mistral API for managed fine-tuning without local GPU infrastructure.
Model specifications verified from Mistral official announcement (mistral.ai/news/mistral-3), HuggingFace model card (mistralai/Mistral-Large-3-675B-Instruct-2512), and Mistral API documentation. Benchmarks from ArtificialAnalysis.ai and IntuitionLabs. Pricing from Mistral API documentation and pricepertoken.com (verified May 2026). Le Chat Work Mode attribution confirmed via testingcatalog.com and mistral.ai/news/vibe-remote-agents-mistral-medium-3-5.
Top comments (0)