Hassann

Posted on Apr 30 • Originally published at apidog.com

How to Use the Mistral Medium 3.5 API?

Mistral released Medium 3.5 on April 29, 2026. The hosted API model ID is mistral-medium-3.5, the chat endpoint is https://api.mistral.ai/v1/chat/completions, and the request format is close to the OpenAI Chat Completions schema. For most existing OpenAI-compatible clients, migration is mainly a base URL and model-name change. Medium 3.5 adds a 256K context window, native vision, function calling, 24-language support, and a reported 77.6% on SWE-Bench Verified, making it relevant for code-heavy agent workflows.

Try Apidog today

This guide shows how to call Mistral Medium 3.5 from curl, Python, and Node.js; configure streaming, tool calling, JSON output, and vision input; handle common errors; and use Apidog to inspect requests, token usage, and cost while iterating on prompts. For comparable model guides, see how to use the DeepSeek V4 API and how to use the GPT-5.5 API.

TL;DR

Endpoint: POST https://api.mistral.ai/v1/chat/completions
Auth: Authorization: Bearer <MISTRAL_API_KEY>
Model ID: mistral-medium-3.5
Context window: 256K tokens
Pricing: $1.5 per million input tokens, $7.5 per million output tokens
Features: reasoning, vision, native function calling, structured JSON output, 24-language coverage
Open weights: mistralai/Mistral-Medium-3.5-128B on Hugging Face under a Modified MIT License with a large-revenue carve-out
Benchmarks: 77.6% SWE-Bench Verified, 91.4 τ³-Telecom
Use Apidog to save requests, store API keys as secrets, compare model outputs, and track cost per call.

What changed in Medium 3.5

Medium 3 shipped earlier as a text-only model with a 128K context window. Medium 3.5 changes the API surface and deployment profile:

256K context instead of 128K
Native vision input
Function calling at the model level
Merged instruction-following, reasoning, and coding capabilities in one dense 128B model
Hosted API plus open weights

The practical impact: you can send a larger codebase, a long transcript, or a full document set in one request and still use tools or structured output. For agents, that reduces the amount of custom orchestration needed around chunking, tool selection, and JSON validation.

The main tradeoff is cost. Medium 3 was priced at $0.40 per million input tokens and $2.00 per million output tokens. Medium 3.5 is $1.5 input and $7.5 output. Treat Medium 3.5 as the higher-accuracy tier for reasoning, coding, vision, and agents—not as the default model for every high-volume request.

Prerequisites

Before making the first request, prepare:

A Mistral account at console.mistral.ai with billing enabled.
A project-scoped API key.
Python, Node.js, curl, or another HTTP client.
An API workspace such as Apidog if you want reusable requests, secret variables, and visible token usage.

Export your key locally:

export MISTRAL_API_KEY="..."

Endpoint and authentication

Mistral exposes chat completions at:

POST https://api.mistral.ai/v1/chat/completions

Authentication uses a bearer token:

Authorization: Bearer <MISTRAL_API_KEY>

Minimal curl request:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-medium-3.5",
    "messages": [
      {
        "role": "user",
        "content": "Explain dense merged checkpoints in two sentences."
      }
    ]
  }'

A successful response includes:

choices[]
choices[0].message.content
usage.prompt_tokens
usage.completion_tokens
usage.total_tokens
id for tracing

Failures return an error envelope with code and message, similar to OpenAI-style APIs.

Request parameters

Use these parameters for most Medium 3.5 integrations:

Parameter	Type	Values	Notes
`model`	string	`mistral-medium-3.5`	Required
`messages`	array	role/content pairs	Required
`temperature`	float	`0` to `1.5`	Mistral recommends `0.7` for general use, `0.3` for code
`top_p`	float	`0` to `1`	Default `1.0`
`max_tokens`	int	`1` to context limit	Caps output length
`stream`	bool	`true` / `false`	Enables SSE streaming
`tools`	array	OpenAI-style tool spec	Native function calling
`tool_choice`	string/object	`auto`, `any`, `none`, or specific tool	`any` forces a tool call
`response_format`	object	`{"type":"json_object"}` or JSON schema	Structured output
`random_seed`	int	any int	Reproducibility
`safe_prompt`	bool	`true` / `false`	Adds Mistral’s safety preamble
`presence_penalty`	float	`-2` to `2`	Penalizes repeated topics
`frequency_penalty`	float	`-2` to `2`	Penalizes repeated tokens

Two migration gotchas:

OpenAI tool_choice="required" becomes Mistral tool_choice="any".
OpenAI seed becomes Mistral random_seed.

Python client

Install the official SDK:

pip install mistralai

Call Medium 3.5:

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {"role": "system", "content": "Reply in code only."},
        {"role": "user", "content": "Write a Rust function that debounces events."},
    ],
    temperature=0.3,
    max_tokens=2048,
)

print("Content:", response.choices[0].message.content)
print("Total tokens:", response.usage.total_tokens)

cost = (
    response.usage.prompt_tokens * 1.5 / 1_000_000
    + response.usage.completion_tokens * 7.5 / 1_000_000
)

print("Cost estimate USD:", cost)

If your app already uses the OpenAI Python SDK, switch the base URL and model ID:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["MISTRAL_API_KEY"],
    base_url="https://api.mistral.ai/v1",
)

response = client.chat.completions.create(
    model="mistral-medium-3.5",
    messages=[
        {"role": "user", "content": "Hello, Mistral."}
    ],
)

print(response.choices[0].message.content)

Use the native mistralai SDK when you want first-class support for Mistral-specific behavior. Use the OpenAI SDK route when you are maintaining a provider-agnostic client.

Node.js client

Install the native SDK:

npm install @mistralai/mistralai

Use it like this:

import { Mistral } from "@mistralai/mistralai";

const client = new Mistral({
  apiKey: process.env.MISTRAL_API_KEY,
});

const response = await client.chat.complete({
  model: "mistral-medium-3.5",
  messages: [
    {
      role: "user",
      content: "Explain dense merged checkpoints in plain English.",
    },
  ],
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
console.log("Usage:", response.usage);

Or use the OpenAI SDK with a base URL override:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MISTRAL_API_KEY,
  baseURL: "https://api.mistral.ai/v1",
});

const response = await client.chat.completions.create({
  model: "mistral-medium-3.5",
  messages: [
    {
      role: "user",
      content: "Hello, Mistral.",
    },
  ],
});

console.log(response.choices[0].message.content);

Streaming responses

Enable streaming with stream: true. In Python with the native SDK:

stream = client.chat.stream(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": "Stream a 300-word essay on merged checkpoints.",
        }
    ],
)

for chunk in stream:
    delta = chunk.data.choices[0].delta.content or ""
    print(delta, end="", flush=True)

The streamed response follows an OpenAI-like shape. Content arrives through choices[].delta.content.

For debugging streamed output, the Apidog response viewer is useful for comparing latency, chunks, and token usage across repeated runs.

Tool calling

Medium 3.5 supports native function calling. Define tools in the tools array, let the model select a tool, execute the function in your code, then send the tool result back to the model.

Example tool definition:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Return the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "unit": {"type": "string", "enum": ["c", "f"]},
                },
                "required": ["city"],
            },
        },
    }
]

Call the model:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": "Weather in Lagos in Celsius?",
        }
    ],
    tools=tools,
    tool_choice="auto",
)

tool_call = response.choices[0].message.tool_calls[0]

print(tool_call.function.name)
print(tool_call.function.arguments)

Then:

Parse tool_call.function.arguments.
Execute get_weather() locally.
Append the result as a role: "tool" message.
Call the model again so it can produce the final answer.

To force a tool call, use:

"tool_choice": "any"

Do not use OpenAI’s required value; Mistral uses any.

JSON mode and structured output

For arbitrary valid JSON, use:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": "Return a JSON object with name, language, and use_cases.",
        }
    ],
    response_format={"type": "json_object"},
)

For schema-constrained output, pass a JSON schema:

schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "release_note",
        "schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "date": {"type": "string"},
                "bullets": {
                    "type": "array",
                    "items": {"type": "string"},
                },
            },
            "required": ["title", "date", "bullets"],
            "additionalProperties": False,
        },
        "strict": True,
    },
}

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "system",
            "content": "Reply with a single JSON object matching the schema.",
        },
        {
            "role": "user",
            "content": "Summarize today's Mistral Medium 3.5 release.",
        },
    ],
    response_format=schema,
)

print(response.choices[0].message.content)

Use strict schema output when downstream code expects a stable shape. Use json_object when you only need valid JSON and will validate separately with Pydantic, Zod, or another schema library.

Vision input

Medium 3.5 supports image input alongside text. Send an array of content parts in the user message:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is in this image and what is it doing wrong?",
                },
                {
                    "type": "image_url",
                    "image_url": "https://example.com/diagram.png",
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Image inputs are billed as input tokens at the same $1.5 per million input-token rate. The exact count depends on the image and appears in usage.prompt_tokens.

For production vision workloads:

Crop to the relevant region.
Downscale when high resolution is unnecessary.
Log image-token cost separately.
Avoid sending repeated frames unless the task requires them.

Build the request collection in Apidog

A repeatable API workflow is better than one-off curl commands. In Apidog:

Download Apidog and create a project.
Create an environment.
Add MISTRAL_API_KEY as a secret variable.
Add BASE_URL=https://api.mistral.ai/v1.
Create a POST {{BASE_URL}}/chat/completions request.
Add the header:

   Authorization: Bearer {{MISTRAL_API_KEY}}

Save a baseline body:

   {
     "model": "mistral-medium-3.5",
     "messages": [
       {
         "role": "user",
         "content": "Explain this API in practical terms."
       }
     ],
     "temperature": 0.3,
     "max_tokens": 1000
   }

Parameterize model, temperature, max_tokens, and tool_choice.
Inspect usage after every run.
Add a post-response cost calculation:

   const usage = response.body.usage;

   const cost =
     usage.prompt_tokens * 1.5 / 1_000_000 +
     usage.completion_tokens * 7.5 / 1_000_000;

   console.log(`Estimated cost: $${cost}`);

If you already use the DeepSeek V4 API collection, duplicate it, change the base URL to https://api.mistral.ai/v1, update the model to mistral-medium-3.5, and run the same prompts against both providers. The same approach works when comparing against GPT-5.5.

Error handling

Common errors:

Code	Meaning	Fix
`400`	Bad request	Validate `messages`, `tools`, and JSON schema
`401`	Invalid key	Regenerate the key at console.mistral.ai
`402`	Payment required	Add credit or a payment method
`403`	Model not allowed	Check project scope and model ID
`422`	Invalid parameter	Check `max_tokens`, `tool_choice`, and schema fields
`429`	Rate limit	Retry with exponential backoff and jitter
`500`	Server error	Retry once, then check status
`503`	Overloaded	Retry later or fall back to another model

Basic retry pattern:

import time
import random

def call_with_retry(fn, max_attempts=4):
    for attempt in range(max_attempts):
        try:
            return fn()
        except Exception as exc:
            status = getattr(exc, "status_code", None)

            if status not in (429, 500, 502, 503, 504):
                raise

            if attempt == max_attempts - 1:
                raise

            sleep = (2 ** attempt) + random.random()
            time.sleep(sleep)

Do not automatically retry 400, 401, 402, 403, or 422. Those usually indicate invalid configuration, malformed payloads, billing issues, or permissions problems.

Cost control patterns

Medium 3.5 is more expensive than Medium 3, so route requests intentionally.

1. Default to a cheaper model, escalate when needed

Use Medium 3 for simpler requests. Escalate to Medium 3.5 when:

A validator fails.
The prompt requires vision.
The task needs long context.
The request involves complex code generation or agentic tool use.

2. Cap output tokens

Output tokens cost $7.5 per million. Set max_tokens explicitly:

{
  "max_tokens": 1500
}

Do not rely on the model to stop at the length you expected.

3. Keep system prompts short

System prompts are billed every time. If your system prompt is 2,000 tokens and can be reduced to 500, you cut repeated input cost significantly.

4. Log `usage`

Persist these fields:

{
  "prompt_tokens": 1234,
  "completion_tokens": 567,
  "total_tokens": 1801
}

Also log estimated cost:

cost_usd = (
    prompt_tokens * 1.5 / 1_000_000
    + completion_tokens * 7.5 / 1_000_000
)

5. Be selective with vision

For image workflows:

Crop irrelevant areas.
Compress or downscale images when acceptable.
Avoid sending duplicate screenshots.
Measure token usage before scaling.

Comparing Medium 3.5 to other Mistral tiers

Mistral lineup as of late April 2026:

Model	Context	Input $/M	Output $/M	Vision	Best for
`mistral-small`	32K	$0.10	$0.30	No	High-volume classification, light chat
`mistral-medium-3`	128K	$0.40	$2.00	No	Bulk throughput, longer chat
`mistral-medium-3.5`	256K	$1.5	$7.5	Yes	Reasoning, code, vision, agents
`mistral-large`	128K	$2.00	$6.00	Limited	Frontier-tier text reasoning

Medium 3.5 is the tier that combines long context, vision, and merged reasoning capabilities. Choose it by workload requirements, not by model name alone.

Migrating from another provider

For OpenAI-compatible code, migration is mostly configuration.

From OpenAI:

- base_url="https://api.openai.com/v1"
- model="gpt-5.5"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"

From DeepSeek:

- base_url="https://api.deepseek.com/v1"
- model="deepseek-v4-pro"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"

Then check these fields:

- tool_choice="required"
+ tool_choice="any"

- seed=123
+ random_seed=123

Before production rollout:

Run your existing prompt test suite.
Compare structured-output validity.
Compare tool-call argument quality.
Mirror production traffic in shadow mode.
Review response diffs in Apidog before switching live traffic.

Real-world use cases

Medium 3.5 is especially useful for:

Code review assistants: 77.6% SWE-Bench Verified and 256K context help with PR-level review involving diffs plus surrounding files.
Long-document QA: 256K context can fit many contracts, RFPs, and policy documents without chunking.
Multimodal extraction: Extract structured fields from receipts, screenshots, or diagrams without running OCR as a separate step.
Agent loops: Native function calling and strong multi-turn dialogue performance reduce tool-call retries and malformed JSON loops.

FAQ

What is the API model ID?

Use:

mistral-medium-3.5

The Hugging Face checkpoint is:

mistralai/Mistral-Medium-3.5-128B

Use the Hugging Face ID if you serve the open weights yourself. Use the short model ID for the hosted API.

Is Medium 3.5 OpenAI-compatible?

Mostly. Headers, endpoint shape, messages, and many parameters are close enough that OpenAI Python and Node clients can work with a base URL override.

The two main differences are:

tool_choice="any" instead of OpenAI’s required
random_seed instead of OpenAI’s seed

Can I run Medium 3.5 locally?

Yes. The weights are open under a Modified MIT License with a large-revenue carve-out. The model has 128B parameters, so local serving requires significant GPU memory. Quantized GGUF builds from unsloth/Mistral-Medium-3.5-128B-GGUF can run on a single high-end consumer card. The patterns from how to run DeepSeek V4 locally translate directly.

Does it support streaming with tool calls?

Yes. Streaming tool calls return argument fragments incrementally on delta.tool_calls. Accumulate the fragments until the stream closes, then parse the completed JSON arguments.

How do I count input tokens before sending?

Use the mistral-common Python package tokenizer. It matches the tokenizer used by the API, so counts should align with usage.prompt_tokens.

What context length should I plan for?

The cap is 256K tokens, but cost scales linearly. A 200K-token request costs about $0.30 in input tokens before generation starts. Most production requests should stay far below the maximum unless the task genuinely requires long context.

Is there a free tier?

Mistral does not advertise a permanent free tier, though new accounts may include trial credit. For sustained free experimentation on similar-tier models, see how to use the DeepSeek V4 API for free.

DEV Community

How to Use the Mistral Medium 3.5 API?

TL;DR

What changed in Medium 3.5

Prerequisites

Endpoint and authentication

Request parameters

Python client

Node.js client

Streaming responses

Tool calling

JSON mode and structured output

Vision input

Build the request collection in Apidog

Error handling

Cost control patterns

1. Default to a cheaper model, escalate when needed

2. Cap output tokens

3. Keep system prompts short

4. Log `usage`

5. Be selective with vision

Comparing Medium 3.5 to other Mistral tiers

Migrating from another provider

Real-world use cases

FAQ

What is the API model ID?

Is Medium 3.5 OpenAI-compatible?

Can I run Medium 3.5 locally?

Does it support streaming with tool calls?

How do I count input tokens before sending?

What context length should I plan for?

Is there a free tier?

Top comments (0)

TL;DR

What changed in Medium 3.5

Prerequisites

Endpoint and authentication

Request parameters

Python client

Node.js client

Streaming responses

Tool calling

JSON mode and structured output

Vision input

Build the request collection in Apidog

Error handling

Cost control patterns

1. Default to a cheaper model, escalate when needed

2. Cap output tokens

3. Keep system prompts short

4. Log usage

5. Be selective with vision

Comparing Medium 3.5 to other Mistral tiers

Migrating from another provider

Real-world use cases

FAQ

What is the API model ID?

Is Medium 3.5 OpenAI-compatible?

Can I run Medium 3.5 locally?

Does it support streaming with tool calls?

How do I count input tokens before sending?

What context length should I plan for?

Is there a free tier?

4. Log `usage`