DEV Community

Cover image for How to Use the Mistral Medium 3.5 API?
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Use the Mistral Medium 3.5 API?

Mistral released Medium 3.5 on April 29, 2026. The hosted API model ID is mistral-medium-3.5, the chat endpoint is https://api.mistral.ai/v1/chat/completions, and the request format is close to the OpenAI Chat Completions schema. For most existing OpenAI-compatible clients, migration is mainly a base URL and model-name change. Medium 3.5 adds a 256K context window, native vision, function calling, 24-language support, and a reported 77.6% on SWE-Bench Verified, making it relevant for code-heavy agent workflows.

Try Apidog today

This guide shows how to call Mistral Medium 3.5 from curl, Python, and Node.js; configure streaming, tool calling, JSON output, and vision input; handle common errors; and use Apidog to inspect requests, token usage, and cost while iterating on prompts. For comparable model guides, see how to use the DeepSeek V4 API and how to use the GPT-5.5 API.

TL;DR

  • Endpoint: POST https://api.mistral.ai/v1/chat/completions
  • Auth: Authorization: Bearer <MISTRAL_API_KEY>
  • Model ID: mistral-medium-3.5
  • Context window: 256K tokens
  • Pricing: $1.5 per million input tokens, $7.5 per million output tokens
  • Features: reasoning, vision, native function calling, structured JSON output, 24-language coverage
  • Open weights: mistralai/Mistral-Medium-3.5-128B on Hugging Face under a Modified MIT License with a large-revenue carve-out
  • Benchmarks: 77.6% SWE-Bench Verified, 91.4 τ³-Telecom
  • Use Apidog to save requests, store API keys as secrets, compare model outputs, and track cost per call.

What changed in Medium 3.5

Medium 3 shipped earlier as a text-only model with a 128K context window. Medium 3.5 changes the API surface and deployment profile:

  • 256K context instead of 128K
  • Native vision input
  • Function calling at the model level
  • Merged instruction-following, reasoning, and coding capabilities in one dense 128B model
  • Hosted API plus open weights

Mistral Medium 3.5 overview

The practical impact: you can send a larger codebase, a long transcript, or a full document set in one request and still use tools or structured output. For agents, that reduces the amount of custom orchestration needed around chunking, tool selection, and JSON validation.

The main tradeoff is cost. Medium 3 was priced at $0.40 per million input tokens and $2.00 per million output tokens. Medium 3.5 is $1.5 input and $7.5 output. Treat Medium 3.5 as the higher-accuracy tier for reasoning, coding, vision, and agents—not as the default model for every high-volume request.

Prerequisites

Before making the first request, prepare:

  1. A Mistral account at console.mistral.ai with billing enabled.
  2. A project-scoped API key.
  3. Python, Node.js, curl, or another HTTP client.
  4. An API workspace such as Apidog if you want reusable requests, secret variables, and visible token usage.

Mistral API key setup

Export your key locally:

export MISTRAL_API_KEY="..."
Enter fullscreen mode Exit fullscreen mode

Endpoint and authentication

Mistral exposes chat completions at:

POST https://api.mistral.ai/v1/chat/completions
Enter fullscreen mode Exit fullscreen mode

Authentication uses a bearer token:

Authorization: Bearer <MISTRAL_API_KEY>
Enter fullscreen mode Exit fullscreen mode

Minimal curl request:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-medium-3.5",
    "messages": [
      {
        "role": "user",
        "content": "Explain dense merged checkpoints in two sentences."
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

A successful response includes:

  • choices[]
  • choices[0].message.content
  • usage.prompt_tokens
  • usage.completion_tokens
  • usage.total_tokens
  • id for tracing

Failures return an error envelope with code and message, similar to OpenAI-style APIs.

Request parameters

Use these parameters for most Medium 3.5 integrations:

Parameter Type Values Notes
model string mistral-medium-3.5 Required
messages array role/content pairs Required
temperature float 0 to 1.5 Mistral recommends 0.7 for general use, 0.3 for code
top_p float 0 to 1 Default 1.0
max_tokens int 1 to context limit Caps output length
stream bool true / false Enables SSE streaming
tools array OpenAI-style tool spec Native function calling
tool_choice string/object auto, any, none, or specific tool any forces a tool call
response_format object {"type":"json_object"} or JSON schema Structured output
random_seed int any int Reproducibility
safe_prompt bool true / false Adds Mistral’s safety preamble
presence_penalty float -2 to 2 Penalizes repeated topics
frequency_penalty float -2 to 2 Penalizes repeated tokens

Two migration gotchas:

  • OpenAI tool_choice="required" becomes Mistral tool_choice="any".
  • OpenAI seed becomes Mistral random_seed.

Python client

Install the official SDK:

pip install mistralai
Enter fullscreen mode Exit fullscreen mode

Call Medium 3.5:

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {"role": "system", "content": "Reply in code only."},
        {"role": "user", "content": "Write a Rust function that debounces events."},
    ],
    temperature=0.3,
    max_tokens=2048,
)

print("Content:", response.choices[0].message.content)
print("Total tokens:", response.usage.total_tokens)

cost = (
    response.usage.prompt_tokens * 1.5 / 1_000_000
    + response.usage.completion_tokens * 7.5 / 1_000_000
)

print("Cost estimate USD:", cost)
Enter fullscreen mode Exit fullscreen mode

If your app already uses the OpenAI Python SDK, switch the base URL and model ID:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["MISTRAL_API_KEY"],
    base_url="https://api.mistral.ai/v1",
)

response = client.chat.completions.create(
    model="mistral-medium-3.5",
    messages=[
        {"role": "user", "content": "Hello, Mistral."}
    ],
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Use the native mistralai SDK when you want first-class support for Mistral-specific behavior. Use the OpenAI SDK route when you are maintaining a provider-agnostic client.

Node.js client

Install the native SDK:

npm install @mistralai/mistralai
Enter fullscreen mode Exit fullscreen mode

Use it like this:

import { Mistral } from "@mistralai/mistralai";

const client = new Mistral({
  apiKey: process.env.MISTRAL_API_KEY,
});

const response = await client.chat.complete({
  model: "mistral-medium-3.5",
  messages: [
    {
      role: "user",
      content: "Explain dense merged checkpoints in plain English.",
    },
  ],
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
console.log("Usage:", response.usage);
Enter fullscreen mode Exit fullscreen mode

Or use the OpenAI SDK with a base URL override:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MISTRAL_API_KEY,
  baseURL: "https://api.mistral.ai/v1",
});

const response = await client.chat.completions.create({
  model: "mistral-medium-3.5",
  messages: [
    {
      role: "user",
      content: "Hello, Mistral.",
    },
  ],
});

console.log(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Streaming responses

Enable streaming with stream: true. In Python with the native SDK:

stream = client.chat.stream(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": "Stream a 300-word essay on merged checkpoints.",
        }
    ],
)

for chunk in stream:
    delta = chunk.data.choices[0].delta.content or ""
    print(delta, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

The streamed response follows an OpenAI-like shape. Content arrives through choices[].delta.content.

For debugging streamed output, the Apidog response viewer is useful for comparing latency, chunks, and token usage across repeated runs.

Tool calling

Medium 3.5 supports native function calling. Define tools in the tools array, let the model select a tool, execute the function in your code, then send the tool result back to the model.

Example tool definition:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Return the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "unit": {"type": "string", "enum": ["c", "f"]},
                },
                "required": ["city"],
            },
        },
    }
]
Enter fullscreen mode Exit fullscreen mode

Call the model:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": "Weather in Lagos in Celsius?",
        }
    ],
    tools=tools,
    tool_choice="auto",
)

tool_call = response.choices[0].message.tool_calls[0]

print(tool_call.function.name)
print(tool_call.function.arguments)
Enter fullscreen mode Exit fullscreen mode

Then:

  1. Parse tool_call.function.arguments.
  2. Execute get_weather() locally.
  3. Append the result as a role: "tool" message.
  4. Call the model again so it can produce the final answer.

To force a tool call, use:

"tool_choice": "any"
Enter fullscreen mode Exit fullscreen mode

Do not use OpenAI’s required value; Mistral uses any.

JSON mode and structured output

For arbitrary valid JSON, use:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": "Return a JSON object with name, language, and use_cases.",
        }
    ],
    response_format={"type": "json_object"},
)
Enter fullscreen mode Exit fullscreen mode

For schema-constrained output, pass a JSON schema:

schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "release_note",
        "schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "date": {"type": "string"},
                "bullets": {
                    "type": "array",
                    "items": {"type": "string"},
                },
            },
            "required": ["title", "date", "bullets"],
            "additionalProperties": False,
        },
        "strict": True,
    },
}

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "system",
            "content": "Reply with a single JSON object matching the schema.",
        },
        {
            "role": "user",
            "content": "Summarize today's Mistral Medium 3.5 release.",
        },
    ],
    response_format=schema,
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Use strict schema output when downstream code expects a stable shape. Use json_object when you only need valid JSON and will validate separately with Pydantic, Zod, or another schema library.

Vision input

Medium 3.5 supports image input alongside text. Send an array of content parts in the user message:

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is in this image and what is it doing wrong?",
                },
                {
                    "type": "image_url",
                    "image_url": "https://example.com/diagram.png",
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Image inputs are billed as input tokens at the same $1.5 per million input-token rate. The exact count depends on the image and appears in usage.prompt_tokens.

For production vision workloads:

  • Crop to the relevant region.
  • Downscale when high resolution is unnecessary.
  • Log image-token cost separately.
  • Avoid sending repeated frames unless the task requires them.

Build the request collection in Apidog

A repeatable API workflow is better than one-off curl commands. In Apidog:

  1. Download Apidog and create a project.
  2. Create an environment.
  3. Add MISTRAL_API_KEY as a secret variable.
  4. Add BASE_URL=https://api.mistral.ai/v1.
  5. Create a POST {{BASE_URL}}/chat/completions request.
  6. Add the header:
   Authorization: Bearer {{MISTRAL_API_KEY}}
Enter fullscreen mode Exit fullscreen mode
  1. Save a baseline body:
   {
     "model": "mistral-medium-3.5",
     "messages": [
       {
         "role": "user",
         "content": "Explain this API in practical terms."
       }
     ],
     "temperature": 0.3,
     "max_tokens": 1000
   }
Enter fullscreen mode Exit fullscreen mode
  1. Parameterize model, temperature, max_tokens, and tool_choice.
  2. Inspect usage after every run.
  3. Add a post-response cost calculation:
   const usage = response.body.usage;

   const cost =
     usage.prompt_tokens * 1.5 / 1_000_000 +
     usage.completion_tokens * 7.5 / 1_000_000;

   console.log(`Estimated cost: $${cost}`);
Enter fullscreen mode Exit fullscreen mode

If you already use the DeepSeek V4 API collection, duplicate it, change the base URL to https://api.mistral.ai/v1, update the model to mistral-medium-3.5, and run the same prompts against both providers. The same approach works when comparing against GPT-5.5.

Error handling

Common errors:

Code Meaning Fix
400 Bad request Validate messages, tools, and JSON schema
401 Invalid key Regenerate the key at console.mistral.ai
402 Payment required Add credit or a payment method
403 Model not allowed Check project scope and model ID
422 Invalid parameter Check max_tokens, tool_choice, and schema fields
429 Rate limit Retry with exponential backoff and jitter
500 Server error Retry once, then check status
503 Overloaded Retry later or fall back to another model

Basic retry pattern:

import time
import random

def call_with_retry(fn, max_attempts=4):
    for attempt in range(max_attempts):
        try:
            return fn()
        except Exception as exc:
            status = getattr(exc, "status_code", None)

            if status not in (429, 500, 502, 503, 504):
                raise

            if attempt == max_attempts - 1:
                raise

            sleep = (2 ** attempt) + random.random()
            time.sleep(sleep)
Enter fullscreen mode Exit fullscreen mode

Do not automatically retry 400, 401, 402, 403, or 422. Those usually indicate invalid configuration, malformed payloads, billing issues, or permissions problems.

Cost control patterns

Medium 3.5 is more expensive than Medium 3, so route requests intentionally.

1. Default to a cheaper model, escalate when needed

Use Medium 3 for simpler requests. Escalate to Medium 3.5 when:

  • A validator fails.
  • The prompt requires vision.
  • The task needs long context.
  • The request involves complex code generation or agentic tool use.

2. Cap output tokens

Output tokens cost $7.5 per million. Set max_tokens explicitly:

{
  "max_tokens": 1500
}
Enter fullscreen mode Exit fullscreen mode

Do not rely on the model to stop at the length you expected.

3. Keep system prompts short

System prompts are billed every time. If your system prompt is 2,000 tokens and can be reduced to 500, you cut repeated input cost significantly.

4. Log usage

Persist these fields:

{
  "prompt_tokens": 1234,
  "completion_tokens": 567,
  "total_tokens": 1801
}
Enter fullscreen mode Exit fullscreen mode

Also log estimated cost:

cost_usd = (
    prompt_tokens * 1.5 / 1_000_000
    + completion_tokens * 7.5 / 1_000_000
)
Enter fullscreen mode Exit fullscreen mode

5. Be selective with vision

For image workflows:

  • Crop irrelevant areas.
  • Compress or downscale images when acceptable.
  • Avoid sending duplicate screenshots.
  • Measure token usage before scaling.

Comparing Medium 3.5 to other Mistral tiers

Mistral lineup as of late April 2026:

Model Context Input $/M Output $/M Vision Best for
mistral-small 32K $0.10 $0.30 No High-volume classification, light chat
mistral-medium-3 128K $0.40 $2.00 No Bulk throughput, longer chat
mistral-medium-3.5 256K $1.5 $7.5 Yes Reasoning, code, vision, agents
mistral-large 128K $2.00 $6.00 Limited Frontier-tier text reasoning

Medium 3.5 is the tier that combines long context, vision, and merged reasoning capabilities. Choose it by workload requirements, not by model name alone.

Migrating from another provider

For OpenAI-compatible code, migration is mostly configuration.

From OpenAI:

- base_url="https://api.openai.com/v1"
- model="gpt-5.5"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"
Enter fullscreen mode Exit fullscreen mode

From DeepSeek:

- base_url="https://api.deepseek.com/v1"
- model="deepseek-v4-pro"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"
Enter fullscreen mode Exit fullscreen mode

Then check these fields:

- tool_choice="required"
+ tool_choice="any"
Enter fullscreen mode Exit fullscreen mode
- seed=123
+ random_seed=123
Enter fullscreen mode Exit fullscreen mode

Before production rollout:

  1. Run your existing prompt test suite.
  2. Compare structured-output validity.
  3. Compare tool-call argument quality.
  4. Mirror production traffic in shadow mode.
  5. Review response diffs in Apidog before switching live traffic.

Real-world use cases

Medium 3.5 is especially useful for:

  • Code review assistants: 77.6% SWE-Bench Verified and 256K context help with PR-level review involving diffs plus surrounding files.
  • Long-document QA: 256K context can fit many contracts, RFPs, and policy documents without chunking.
  • Multimodal extraction: Extract structured fields from receipts, screenshots, or diagrams without running OCR as a separate step.
  • Agent loops: Native function calling and strong multi-turn dialogue performance reduce tool-call retries and malformed JSON loops.

FAQ

What is the API model ID?

Use:

mistral-medium-3.5
Enter fullscreen mode Exit fullscreen mode

The Hugging Face checkpoint is:

mistralai/Mistral-Medium-3.5-128B
Enter fullscreen mode Exit fullscreen mode

Use the Hugging Face ID if you serve the open weights yourself. Use the short model ID for the hosted API.

Is Medium 3.5 OpenAI-compatible?

Mostly. Headers, endpoint shape, messages, and many parameters are close enough that OpenAI Python and Node clients can work with a base URL override.

The two main differences are:

  • tool_choice="any" instead of OpenAI’s required
  • random_seed instead of OpenAI’s seed

Can I run Medium 3.5 locally?

Yes. The weights are open under a Modified MIT License with a large-revenue carve-out. The model has 128B parameters, so local serving requires significant GPU memory. Quantized GGUF builds from unsloth/Mistral-Medium-3.5-128B-GGUF can run on a single high-end consumer card. The patterns from how to run DeepSeek V4 locally translate directly.

Does it support streaming with tool calls?

Yes. Streaming tool calls return argument fragments incrementally on delta.tool_calls. Accumulate the fragments until the stream closes, then parse the completed JSON arguments.

How do I count input tokens before sending?

Use the mistral-common Python package tokenizer. It matches the tokenizer used by the API, so counts should align with usage.prompt_tokens.

What context length should I plan for?

The cap is 256K tokens, but cost scales linearly. A 200K-token request costs about $0.30 in input tokens before generation starts. Most production requests should stay far below the maximum unless the task genuinely requires long context.

Is there a free tier?

Mistral does not advertise a permanent free tier, though new accounts may include trial credit. For sustained free experimentation on similar-tier models, see how to use the DeepSeek V4 API for free.

Top comments (0)