Two things happened in late April 2026 that are worth understanding together. Mistral released Medium 3.5, a 128B dense open-weight model that replaces both Devstral 2 and Magistral in a single set of weights. At the same time, they shipped remote agents for their Vibe coding platform — cloud-based sessions that run in isolated sandboxes, open pull requests, and notify you when they're done.
The combination changes how you think about AI coding workflows. Until now, running an AI coding agent meant keeping a local terminal busy for the duration of a task. Remote agents decouple task execution from your machine. You queue work, the agent runs independently, and you review a pull request — closer to delegating a ticket than pair-programming in a live session.
This guide covers what Medium 3.5 is, how Vibe remote agents work, how to set up both, and when self-hosting makes more sense than the API.
What Is Mistral Medium 3.5?
Mistral Medium 3.5 is a 128B dense model released April 29, 2026. The "dense" part matters architecturally: unlike Mistral Large 3's 675B Mixture-of-Experts design (which activates only ~41B parameters per inference pass), Medium 3.5 uses all 128B parameters on every call. That means consistent latency at the cost of more VRAM — a deliberate trade-off for predictable performance in agentic loops.
The model consolidates three previously separate Mistral products:
- Mistral Medium 3.1 — general instruction-following
- Magistral — reasoning-focused tasks
- Devstral 2 — agentic coding and code review
You no longer need to route between model IDs based on task type. One endpoint handles conversational responses, multi-step reasoning with configurable effort, vision input, and long coding sessions.
Key specifications:
| Property | Value |
|---|---|
| Architecture | Dense transformer, 128B parameters |
| Context window | 256K tokens |
| Input modalities | Text and images |
| SWE-Bench Verified | 77.6% |
| API model ID | mistral-medium-3.5 |
| Input pricing | $1.50 per million tokens |
| Output pricing | $7.50 per million tokens |
| License | Modified MIT |
| Hugging Face weights | mistralai/Mistral-Medium-3.5-128B |
| Release date | April 29, 2026 |
The 256K context window is meaningfully larger than Claude Sonnet 4.6 (200K) and twice GPT-4o (128K), which matters for sessions that need to hold large codebases in context without chunking.
The Vision Encoder
One underreported detail: Mistral trained Medium 3.5's vision encoder from scratch to handle variable image sizes and aspect ratios, rather than forcing inputs into a fixed resolution. In practice, this means the model processes tall mobile screenshots, wide system diagrams, and landscape images without distortion or information loss — relevant for documentation analysis or visual debugging workflows.
What Are Vibe Remote Agents?
Vibe is Mistral's CLI coding agent, initially released alongside Devstral 2. The April 29 update adds remote execution: instead of holding your local terminal open, you hand the task off to a cloud sandbox that runs independently.
Each remote session runs in an isolated environment with its own dependency layer. The agent can:
- Read and edit files across a full repository
- Install packages within the sandbox
- Run tests and iterate on failures
- Open a pull request to GitHub with completed changes
- Post a task summary to Slack or Microsoft Teams
Integrations confirmed at launch include GitHub (code access and PR creation), Linear (issue reference and status updates), Jira (ticket linking), and Sentry (incident context for bug fixes).
This workflow sits between a CI job and a junior engineer. You assign a task with the relevant context, the agent works through it in isolation, and you review the output — without watching every intermediate step.
Session Teleportation
One practical feature: if you're mid-session locally and the task will take longer than expected, you can migrate the active session to a cloud agent without losing context. Task state, file diffs, and approval history carry across. Your local terminal is freed immediately.
Getting Started with Mistral Vibe
Vibe requires Python 3.12 or later and a Mistral API key with a Pro, Team, or Enterprise plan for remote agent access.
Installation
# Install with pip
pip install mistral-vibe
# Or with uv
uv pip install mistral-vibe
# Set your API key
export MISTRAL_API_KEY=your_key_here
Running a Local Session
Start a local coding session in your project directory:
cd your-project
vibe
Vibe reads your codebase, plans steps, and applies edits interactively. Local sessions use Medium 3.5 by default as of the April 29 release.
Launching a Remote Agent
To hand a task off to a cloud agent, use the --remote flag:
vibe --remote "Add connection pooling to the database module and add integration tests"
From inside an active local session, prefix your prompt with &:
> & fix the failing integration tests in src/auth/
Both methods start a cloud session and return control to your terminal. You receive a notification (via Slack, Teams, or email depending on your configuration) when the agent finishes or needs input.
Connecting GitHub for PR Automation
After installing the Vibe GitHub App and granting repository access, the agent creates a branch, applies changes, runs tests in the sandbox, and opens a PR with a change summary. You review it like any other pull request — the agent's commits appear under your configured bot account.
Using Medium 3.5 via API
The model is available through the Mistral API with the identifier mistral-medium-3.5 (alias: mistral-medium-3):
from mistralai import Mistral
client = Mistral(api_key="your_api_key")
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": (
"Review this function for SQL injection risk and rewrite it safely:\n\n"
"```
python\n"
"def get_user(user_id):\n"
" return db.execute(f'SELECT * FROM users WHERE id = {user_id}')\n"
"
```"
)
}
]
)
print(response.choices[0].message.content)
Multimodal Input
Pass images alongside text using the standard multimodal format:
response = client.chat.complete(
model="mistral-medium-3.5",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": "https://example.com/architecture-diagram.png"}
},
{
"type": "text",
"text": "Identify any single points of failure in this architecture."
}
]
}
]
)
Pricing in Context
At $1.50 / M input and $7.50 / M output, Medium 3.5 sits above flash-tier models but below frontier pricing. Community feedback on launch focused on the output cost being high for a "medium" tier label. The counter-argument from Mistral's side: one endpoint replaces three, and open weights make self-hosting viable for high-volume workloads.
| Model | Input / M | Output / M | SWE-Bench | Context |
|---|---|---|---|---|
| Mistral Medium 3.5 | $1.50 | $7.50 | 77.6% | 256K |
| Mistral Large 3 | $2.00 | $6.00 | [DATA NOT AVAILABLE] | 128K |
| GPT-4o | $2.50 | $10.00 | [DATA NOT AVAILABLE] | 128K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | ~80%+ | 200K |
Benchmark Performance
77.6% on SWE-Bench Verified places Medium 3.5 near the top of the open-weight leaderboard. SWE-Bench Verified measures an AI system's ability to resolve real GitHub issues in Python repositories — more grounded than synthetic coding tasks because the issues include ambiguity, missing context, and test suites that must actually pass.
The model also handles reasoning tasks previously requiring Magistral, with configurable reasoning effort built in. You can request more compute for harder problems without routing to a separate model.
One caveat: SWE-Bench Verified scores depend on the surrounding agent harness — scaffolding, retry logic, test execution. Mistral's reported 77.6% uses their own setup. Third-party evaluations using different harnesses will produce different numbers, so treat it as a directional signal rather than an absolute benchmark.
Self-Hosting the Open Weights
The full 128B weights are published at mistralai/Mistral-Medium-3.5-128B on Hugging Face under a modified MIT license, permitting commercial use with attribution.
Hardware requirements for production inference:
| Precision | VRAM for weights | GPU configuration |
|---|---|---|
| FP8 | ~128 GB | 4× H100 80GB |
| BF16 | ~256 GB | 4× H100 80GB |
FP8 runs with sufficient headroom for the full 256K context window on four 80GB GPUs. BF16 delivers higher per-token accuracy at the same hardware cost.
Serve with vLLM:
pip install vllm
vllm serve mistralai/Mistral-Medium-3.5-128B \
--tensor-parallel-size 4 \
--max-model-len 32768 \
--dtype float8
Speculative decoding with EAGLE:
Mistral released mistralai/Mistral-Medium-3.5-128B-EAGLE, an EAGLE speculative decoding companion that works with vLLM and SGLang to increase throughput on generation-heavy workloads like bulk code review or document processing.
Ollama for local evaluation:
ollama run mistral-medium-3.5
This downloads a quantized version for workstations with 64GB+ unified memory (Apple M3 Ultra, for example). Not a production configuration, but useful for offline evaluation before committing to a cloud or on-premises deployment.
For teams already using secure sandbox environments for code execution, Medium 3.5's open weights offer a path to running isolated coding agents entirely on-premises: the model generates code, a sandboxed executor runs it, and no data leaves your infrastructure.
Who Should Use This
Strong Fit
<ul>
<li>Teams already using Mistral APIs who want to consolidate from three model endpoints to one</li>
<li>Organizations with GitHub-based workflows who want async PR automation without a dedicated DevOps agent service</li>
<li>Infrastructure teams with H100 access who want to self-host and remove per-token costs</li>
<li>Projects with large codebases that exceed 128K context limits on other models</li>
<li>Teams using Linear, Jira, or Sentry who want issue-to-PR automation in a single workflow</li>
</ul>
Consider Alternatives
<ul>
<li>Teams needing the highest possible SWE-Bench scores without hardware constraints — frontier proprietary models still lead</li>
<li>Workflows already deeply integrated with Claude Code or GitHub Copilot Workspace — switching costs are real</li>
<li>High-volume pipelines where $7.50/M output exceeds unit economics and self-hosting isn't feasible</li>
<li>Small teams that can't dedicate 4× H100s but need more throughput than the API budget allows</li>
</ul>
FAQ
Q: Does Vibe remote agents require a paid Mistral plan?
Yes. Remote agents are available on Mistral Pro, Team, and Enterprise plans for Vibe and Le Chat. The free tier gives local CLI access to Vibe, but cloud agent sessions and async PR workflows require a paid plan.
Q: Is mistral-medium-3.5 a direct replacement for Devstral 2?
Functionally yes for coding tasks, but Medium 3.5 is a larger model. Devstral 2 was optimized for lightweight agentic coding on smaller hardware. Medium 3.5 adds vision, general reasoning, and longer context at the cost of significantly more VRAM. If you were using Devstral 2 for latency-sensitive inline completions, test Medium 3.5's response time on your workload before migrating.
Q: Can remote agents access private GitHub repositories?
Yes. Vibe uses a GitHub App integration that you install at the organization or repository level. Access is scoped to the specific repositories you grant — the agent cannot access repos outside that scope.
Q: How does "teleporting" a local session work in practice?
Inside an active local vibe session, running a task prefixed with & migrates the session context — including file diffs and task history — to a cloud sandbox. Your local terminal is freed immediately. If the agent needs approval for a destructive action (like deleting files or pushing to main), it pauses and sends you a notification rather than proceeding autonomously.
Q: At $7.50/M output tokens, when does self-hosting break even?
A rough calculation: four H100 80GB GPUs in a cloud environment cost approximately $8–12 per hour. At 5,000 output tokens per second throughput (FP8, tensor parallel), you'd generate ~18M tokens per hour at ~$0.50–0.67 per million tokens — significantly cheaper than the API at high volume. Break-even typically appears around 2–3 million output tokens per day depending on cloud costs.
Key Takeaways
- Mistral Medium 3.5 is a 128B dense open-weight model (modified MIT) that replaces Devstral 2 and Magistral at a single API endpoint (
mistral-medium-3.5) - 77.6% SWE-Bench Verified places it near the top of available open-weight coding models
- Vibe remote agents run async coding sessions in isolated cloud sandboxes, opening GitHub PRs automatically with Linear, Jira, Sentry, and Slack/Teams integration
- The CLI interface is minimal:
vibe --remote "task"or& taskprefix inside an active session - API pricing is $1.50 / M input, $7.50 / M output — self-hosting on open weights with 4× H100s removes per-token costs at sufficient scale
- The 256K context window and from-scratch vision encoder make it competitive for large-codebase and multimodal workflows
For teams already in the Mistral ecosystem via Devstral 2, the upgrade path is the clearest immediate use case. If you're evaluating agentic coding tools more broadly, the remote agent workflow and GitHub PR integration are worth testing against other approaches — including the latest comparison of AI coding agents — before committing.
Bottom Line
Mistral Medium 3.5 is the strongest open-weight coding model available right now, and Vibe remote agents make async AI-driven PR workflows practically accessible for the first time in the Mistral ecosystem. The $7.50/M output pricing will push high-volume teams toward self-hosting — but the open weights make that a genuine option, not a theoretical one.
Top comments (0)