Mistral released Medium 3.5 on April 29, 2026. The hosted API model ID is mistral-medium-3.5, the chat endpoint is https://api.mistral.ai/v1/chat/completions, and the request format is close to the OpenAI Chat Completions schema. For most existing OpenAI-compatible clients, migration is mainly a base URL and model-name change. Medium 3.5 adds a 256K context window, native vision, function calling, 24-language support, and a reported 77.6% on SWE-Bench Verified, making it relevant for code-heavy agent workflows.
This guide shows how to call Mistral Medium 3.5 from curl, Python, and Node.js; configure streaming, tool calling, JSON output, and vision input; handle common errors; and use Apidog to inspect requests, token usage, and cost while iterating on prompts. For comparable model guides, see how to use the DeepSeek V4 API and how to use the GPT-5.5 API.
TL;DR
- Endpoint:
POST https://api.mistral.ai/v1/chat/completions - Auth:
Authorization: Bearer <MISTRAL_API_KEY> - Model ID:
mistral-medium-3.5 - Context window: 256K tokens
- Pricing: $1.5 per million input tokens, $7.5 per million output tokens
- Features: reasoning, vision, native function calling, structured JSON output, 24-language coverage
- Open weights:
mistralai/Mistral-Medium-3.5-128Bon Hugging Face under a Modified MIT License with a large-revenue carve-out - Benchmarks: 77.6% SWE-Bench Verified, 91.4 τ³-Telecom
- Use Apidog to save requests, store API keys as secrets, compare model outputs, and track cost per call.
What changed in Medium 3.5
Medium 3 shipped earlier as a text-only model with a 128K context window. Medium 3.5 changes the API surface and deployment profile:
- 256K context instead of 128K
- Native vision input
- Function calling at the model level
- Merged instruction-following, reasoning, and coding capabilities in one dense 128B model
- Hosted API plus open weights
The practical impact: you can send a larger codebase, a long transcript, or a full document set in one request and still use tools or structured output. For agents, that reduces the amount of custom orchestration needed around chunking, tool selection, and JSON validation.
The main tradeoff is cost. Medium 3 was priced at $0.40 per million input tokens and $2.00 per million output tokens. Medium 3.5 is $1.5 input and $7.5 output. Treat Medium 3.5 as the higher-accuracy tier for reasoning, coding, vision, and agents—not as the default model for every high-volume request.
Prerequisites
Before making the first request, prepare:
- A Mistral account at console.mistral.ai with billing enabled.
- A project-scoped API key.
- Python, Node.js, curl, or another HTTP client.
- An API workspace such as Apidog if you want reusable requests, secret variables, and visible token usage.
Export your key locally:
export MISTRAL_API_KEY="..."
Endpoint and authentication
Mistral exposes chat completions at:
POST https://api.mistral.ai/v1/chat/completions
Authentication uses a bearer token:
Authorization: Bearer <MISTRAL_API_KEY>
Minimal curl request:
curl https://api.mistral.ai/v1/chat/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-medium-3.5",
"messages": [
{
"role": "user",
"content": "Explain dense merged checkpoints in two sentences."
}
]
}'
A successful response includes:
choices[]choices[0].message.contentusage.prompt_tokensusage.completion_tokensusage.total_tokens-
idfor tracing
Failures return an error envelope with code and message, similar to OpenAI-style APIs.
Request parameters
Use these parameters for most Medium 3.5 integrations:
| Parameter | Type | Values | Notes |
|---|---|---|---|
model |
string | mistral-medium-3.5 |
Required |
messages |
array | role/content pairs | Required |
temperature |
float |
0 to 1.5
|
Mistral recommends 0.7 for general use, 0.3 for code |
top_p |
float |
0 to 1
|
Default 1.0
|
max_tokens |
int |
1 to context limit |
Caps output length |
stream |
bool |
true / false
|
Enables SSE streaming |
tools |
array | OpenAI-style tool spec | Native function calling |
tool_choice |
string/object |
auto, any, none, or specific tool |
any forces a tool call |
response_format |
object |
{"type":"json_object"} or JSON schema |
Structured output |
random_seed |
int | any int | Reproducibility |
safe_prompt |
bool |
true / false
|
Adds Mistral’s safety preamble |
presence_penalty |
float |
-2 to 2
|
Penalizes repeated topics |
frequency_penalty |
float |
-2 to 2
|
Penalizes repeated tokens |
Two migration gotchas:
- OpenAI
tool_choice="required"becomes Mistraltool_choice="any". - OpenAI
seedbecomes Mistralrandom_seed.
Python client
Install the official SDK:
pip install mistralai
Call Medium 3.5:
import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{"role": "system", "content": "Reply in code only."},
{"role": "user", "content": "Write a Rust function that debounces events."},
],
temperature=0.3,
max_tokens=2048,
)
print("Content:", response.choices[0].message.content)
print("Total tokens:", response.usage.total_tokens)
cost = (
response.usage.prompt_tokens * 1.5 / 1_000_000
+ response.usage.completion_tokens * 7.5 / 1_000_000
)
print("Cost estimate USD:", cost)
If your app already uses the OpenAI Python SDK, switch the base URL and model ID:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["MISTRAL_API_KEY"],
base_url="https://api.mistral.ai/v1",
)
response = client.chat.completions.create(
model="mistral-medium-3.5",
messages=[
{"role": "user", "content": "Hello, Mistral."}
],
)
print(response.choices[0].message.content)
Use the native mistralai SDK when you want first-class support for Mistral-specific behavior. Use the OpenAI SDK route when you are maintaining a provider-agnostic client.
Node.js client
Install the native SDK:
npm install @mistralai/mistralai
Use it like this:
import { Mistral } from "@mistralai/mistralai";
const client = new Mistral({
apiKey: process.env.MISTRAL_API_KEY,
});
const response = await client.chat.complete({
model: "mistral-medium-3.5",
messages: [
{
role: "user",
content: "Explain dense merged checkpoints in plain English.",
},
],
temperature: 0.7,
});
console.log(response.choices[0].message.content);
console.log("Usage:", response.usage);
Or use the OpenAI SDK with a base URL override:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.MISTRAL_API_KEY,
baseURL: "https://api.mistral.ai/v1",
});
const response = await client.chat.completions.create({
model: "mistral-medium-3.5",
messages: [
{
role: "user",
content: "Hello, Mistral.",
},
],
});
console.log(response.choices[0].message.content);
Streaming responses
Enable streaming with stream: true. In Python with the native SDK:
stream = client.chat.stream(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": "Stream a 300-word essay on merged checkpoints.",
}
],
)
for chunk in stream:
delta = chunk.data.choices[0].delta.content or ""
print(delta, end="", flush=True)
The streamed response follows an OpenAI-like shape. Content arrives through choices[].delta.content.
For debugging streamed output, the Apidog response viewer is useful for comparing latency, chunks, and token usage across repeated runs.
Tool calling
Medium 3.5 supports native function calling. Define tools in the tools array, let the model select a tool, execute the function in your code, then send the tool result back to the model.
Example tool definition:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Return the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"type": "string", "enum": ["c", "f"]},
},
"required": ["city"],
},
},
}
]
Call the model:
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": "Weather in Lagos in Celsius?",
}
],
tools=tools,
tool_choice="auto",
)
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)
print(tool_call.function.arguments)
Then:
- Parse
tool_call.function.arguments. - Execute
get_weather()locally. - Append the result as a
role: "tool"message. - Call the model again so it can produce the final answer.
To force a tool call, use:
"tool_choice": "any"
Do not use OpenAI’s required value; Mistral uses any.
JSON mode and structured output
For arbitrary valid JSON, use:
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": "Return a JSON object with name, language, and use_cases.",
}
],
response_format={"type": "json_object"},
)
For schema-constrained output, pass a JSON schema:
schema = {
"type": "json_schema",
"json_schema": {
"name": "release_note",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"date": {"type": "string"},
"bullets": {
"type": "array",
"items": {"type": "string"},
},
},
"required": ["title", "date", "bullets"],
"additionalProperties": False,
},
"strict": True,
},
}
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "system",
"content": "Reply with a single JSON object matching the schema.",
},
{
"role": "user",
"content": "Summarize today's Mistral Medium 3.5 release.",
},
],
response_format=schema,
)
print(response.choices[0].message.content)
Use strict schema output when downstream code expects a stable shape. Use json_object when you only need valid JSON and will validate separately with Pydantic, Zod, or another schema library.
Vision input
Medium 3.5 supports image input alongside text. Send an array of content parts in the user message:
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image and what is it doing wrong?",
},
{
"type": "image_url",
"image_url": "https://example.com/diagram.png",
},
],
}
],
)
print(response.choices[0].message.content)
Image inputs are billed as input tokens at the same $1.5 per million input-token rate. The exact count depends on the image and appears in usage.prompt_tokens.
For production vision workloads:
- Crop to the relevant region.
- Downscale when high resolution is unnecessary.
- Log image-token cost separately.
- Avoid sending repeated frames unless the task requires them.
Build the request collection in Apidog
A repeatable API workflow is better than one-off curl commands. In Apidog:
- Download Apidog and create a project.
- Create an environment.
- Add
MISTRAL_API_KEYas a secret variable. - Add
BASE_URL=https://api.mistral.ai/v1. - Create a
POST {{BASE_URL}}/chat/completionsrequest. - Add the header:
Authorization: Bearer {{MISTRAL_API_KEY}}
- Save a baseline body:
{
"model": "mistral-medium-3.5",
"messages": [
{
"role": "user",
"content": "Explain this API in practical terms."
}
],
"temperature": 0.3,
"max_tokens": 1000
}
- Parameterize
model,temperature,max_tokens, andtool_choice. - Inspect
usageafter every run. - Add a post-response cost calculation:
const usage = response.body.usage;
const cost =
usage.prompt_tokens * 1.5 / 1_000_000 +
usage.completion_tokens * 7.5 / 1_000_000;
console.log(`Estimated cost: $${cost}`);
If you already use the DeepSeek V4 API collection, duplicate it, change the base URL to https://api.mistral.ai/v1, update the model to mistral-medium-3.5, and run the same prompts against both providers. The same approach works when comparing against GPT-5.5.
Error handling
Common errors:
| Code | Meaning | Fix |
|---|---|---|
400 |
Bad request | Validate messages, tools, and JSON schema |
401 |
Invalid key | Regenerate the key at console.mistral.ai |
402 |
Payment required | Add credit or a payment method |
403 |
Model not allowed | Check project scope and model ID |
422 |
Invalid parameter | Check max_tokens, tool_choice, and schema fields |
429 |
Rate limit | Retry with exponential backoff and jitter |
500 |
Server error | Retry once, then check status |
503 |
Overloaded | Retry later or fall back to another model |
Basic retry pattern:
import time
import random
def call_with_retry(fn, max_attempts=4):
for attempt in range(max_attempts):
try:
return fn()
except Exception as exc:
status = getattr(exc, "status_code", None)
if status not in (429, 500, 502, 503, 504):
raise
if attempt == max_attempts - 1:
raise
sleep = (2 ** attempt) + random.random()
time.sleep(sleep)
Do not automatically retry 400, 401, 402, 403, or 422. Those usually indicate invalid configuration, malformed payloads, billing issues, or permissions problems.
Cost control patterns
Medium 3.5 is more expensive than Medium 3, so route requests intentionally.
1. Default to a cheaper model, escalate when needed
Use Medium 3 for simpler requests. Escalate to Medium 3.5 when:
- A validator fails.
- The prompt requires vision.
- The task needs long context.
- The request involves complex code generation or agentic tool use.
2. Cap output tokens
Output tokens cost $7.5 per million. Set max_tokens explicitly:
{
"max_tokens": 1500
}
Do not rely on the model to stop at the length you expected.
3. Keep system prompts short
System prompts are billed every time. If your system prompt is 2,000 tokens and can be reduced to 500, you cut repeated input cost significantly.
4. Log usage
Persist these fields:
{
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801
}
Also log estimated cost:
cost_usd = (
prompt_tokens * 1.5 / 1_000_000
+ completion_tokens * 7.5 / 1_000_000
)
5. Be selective with vision
For image workflows:
- Crop irrelevant areas.
- Compress or downscale images when acceptable.
- Avoid sending duplicate screenshots.
- Measure token usage before scaling.
Comparing Medium 3.5 to other Mistral tiers
Mistral lineup as of late April 2026:
| Model | Context | Input $/M | Output $/M | Vision | Best for |
|---|---|---|---|---|---|
mistral-small |
32K | $0.10 | $0.30 | No | High-volume classification, light chat |
mistral-medium-3 |
128K | $0.40 | $2.00 | No | Bulk throughput, longer chat |
mistral-medium-3.5 |
256K | $1.5 | $7.5 | Yes | Reasoning, code, vision, agents |
mistral-large |
128K | $2.00 | $6.00 | Limited | Frontier-tier text reasoning |
Medium 3.5 is the tier that combines long context, vision, and merged reasoning capabilities. Choose it by workload requirements, not by model name alone.
Migrating from another provider
For OpenAI-compatible code, migration is mostly configuration.
From OpenAI:
- base_url="https://api.openai.com/v1"
- model="gpt-5.5"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"
From DeepSeek:
- base_url="https://api.deepseek.com/v1"
- model="deepseek-v4-pro"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"
Then check these fields:
- tool_choice="required"
+ tool_choice="any"
- seed=123
+ random_seed=123
Before production rollout:
- Run your existing prompt test suite.
- Compare structured-output validity.
- Compare tool-call argument quality.
- Mirror production traffic in shadow mode.
- Review response diffs in Apidog before switching live traffic.
Real-world use cases
Medium 3.5 is especially useful for:
- Code review assistants: 77.6% SWE-Bench Verified and 256K context help with PR-level review involving diffs plus surrounding files.
- Long-document QA: 256K context can fit many contracts, RFPs, and policy documents without chunking.
- Multimodal extraction: Extract structured fields from receipts, screenshots, or diagrams without running OCR as a separate step.
- Agent loops: Native function calling and strong multi-turn dialogue performance reduce tool-call retries and malformed JSON loops.
FAQ
What is the API model ID?
Use:
mistral-medium-3.5
The Hugging Face checkpoint is:
mistralai/Mistral-Medium-3.5-128B
Use the Hugging Face ID if you serve the open weights yourself. Use the short model ID for the hosted API.
Is Medium 3.5 OpenAI-compatible?
Mostly. Headers, endpoint shape, messages, and many parameters are close enough that OpenAI Python and Node clients can work with a base URL override.
The two main differences are:
-
tool_choice="any"instead of OpenAI’srequired -
random_seedinstead of OpenAI’sseed
Can I run Medium 3.5 locally?
Yes. The weights are open under a Modified MIT License with a large-revenue carve-out. The model has 128B parameters, so local serving requires significant GPU memory. Quantized GGUF builds from unsloth/Mistral-Medium-3.5-128B-GGUF can run on a single high-end consumer card. The patterns from how to run DeepSeek V4 locally translate directly.
Does it support streaming with tool calls?
Yes. Streaming tool calls return argument fragments incrementally on delta.tool_calls. Accumulate the fragments until the stream closes, then parse the completed JSON arguments.
How do I count input tokens before sending?
Use the mistral-common Python package tokenizer. It matches the tokenizer used by the API, so counts should align with usage.prompt_tokens.
What context length should I plan for?
The cap is 256K tokens, but cost scales linearly. A 200K-token request costs about $0.30 in input tokens before generation starts. Most production requests should stay far below the maximum unless the task genuinely requires long context.
Is there a free tier?
Mistral does not advertise a permanent free tier, though new accounts may include trial credit. For sustained free experimentation on similar-tier models, see how to use the DeepSeek V4 API for free.


Top comments (0)