This article was originally published on aifoss.dev
TL;DR: Devstral Small 2 is a 24B Apache 2.0 coding model from Mistral that scores 68% on SWE-bench Verified — a serious benchmark for a local model — and runs on a single RTX 4090. If you want an open-weight coding agent that keeps code on your machine and handles multi-file edits, this is the strongest 24B option available as of June 2026. The catch: it's purpose-built for agentic software engineering tasks, not casual code completion.
| Devstral Small 2 | Devstral 2 (123B) | Claude Sonnet 4.5 | |
|---|---|---|---|
| Best for | Local deployment, solo devs | Team servers, max quality | Cloud API, highest accuracy |
| SWE-bench Verified | 68.0% | 72.2% | 77.2% |
| VRAM (Q4_K_M) | ~14 GB | ~70 GB+ | API only |
| License | Apache 2.0 | Modified MIT | Proprietary |
| Context window | 256K | 256K | 200K |
| API cost (input/1M) | $0.10 | $0.40 | ~$3.00 |
Honest take: For local coding agents with a single consumer GPU, Devstral Small 2 is the model to run in mid-2026. It won't match Claude Sonnet 4.5 on hard tasks, but it costs you nothing per token and keeps your code off the internet.
What Devstral Small 2 Is
Mistral released Devstral Small 2 on December 9, 2025, alongside its larger sibling Devstral 2 (123B) and the Mistral Vibe CLI. Where Devstral 2 targets multi-GPU servers, Small 2 targets single-GPU workstations.
The model is fine-tuned specifically for software engineering agent tasks: exploring codebases, editing multiple files in a single pass, and calling tools in agentic loops. It handles those tasks differently from a general-purpose chat model — it's optimized to read file trees, understand diffs, and apply targeted edits rather than generate boilerplate from scratch.
Key specs (tested on Devstral-Small-2-24B-Instruct-2512):
- Parameters: 24B
- Context window: 256K tokens
- License: Apache 2.0 — commercial use allowed, no revenue threshold restrictions
- Released: December 9, 2025
-
Ollama tag:
devstral-small-2
The Apache 2.0 license is meaningful here. The 123B Devstral 2 ships under a modified MIT license that restricts organizations with over $20M in monthly revenue. Small 2 has no such clause — you can deploy it commercially without legal review.
Benchmark Reality Check
68.0% on SWE-bench Verified is the headline number. Here's what that actually means.
SWE-bench Verified tests models on real GitHub issues from popular Python repositories. A successful "resolve" means the model read the issue, edited the codebase, and passed the existing test suite — without being given the solution. It's a meaningful proxy for agentic software engineering capability.
For reference:
- GPT-4o: ~38% at launch (early 2024 snapshot)
- Claude Sonnet 3.5: ~49% at launch
- Devstral Small 2 (24B, local): 68.0%
- Devstral 2 (123B, API/server): 72.2%
- Claude Sonnet 4.5 (API, current): 77.2%
A 4.2-point gap between Small 2 and the 123B version is smaller than you'd expect given a 5x parameter difference. The large gap vs. GPT-4o and older Claude versions reflects how much Mistral specialized this model for software agent tasks. General-purpose models trained to be chatty assistants perform worse on this benchmark than a 24B model trained specifically to edit files.
The benchmark also doesn't tell you everything. On tasks requiring deep reasoning across a large unfamiliar codebase, or multi-file refactors that span many files, you'll notice the quality gap between 68% and 77% more clearly. For standalone functions, unit tests, and targeted bug fixes, the difference is often imperceptible.
Installation: Ollama in 3 Commands
Ollama is the fastest path to running Devstral Small 2 locally. If you don't have Ollama installed, the full Ollama setup guide covers it on Linux, macOS, and Windows.
# Pull the model (Q4_K_M by default, ~15 GB)
ollama pull devstral-small-2
# Run interactively
ollama run devstral-small-2
# Or specify a tag explicitly
ollama pull devstral-small-2:24b-instruct-2512-q4_K_M
Quantization options and VRAM requirements:
| Quantization | Size | Min VRAM | Quality |
|---|---|---|---|
| Q4_K_M (default) | ~15 GB | 16 GB | Good for coding tasks |
| Q6_K | ~20 GB | 22 GB | Noticeably better on complex edits |
| Q8_0 | ~26 GB | 28 GB | Near-lossless |
| FP16 | ~48 GB | 50 GB | Reference quality, multi-GPU only |
The Q4_K_M default fits comfortably on an RTX 4090 (24 GB). A Mac Mini M4 Pro with 48 GB unified memory can run Q8_0 with headroom. If you have a 16 GB GPU, Q4_K_M fits but you'll be tight on context — longer files will cause slowdowns.
For a deeper look at how quantization levels affect output quality for coding tasks, see the GGUF quantization guide.
Once pulled, test the model:
ollama run devstral-small-2 "Write a Python function that finds all duplicate entries in a list of dicts by a given key."
Expect output immediately — Ollama's built-in GGUF runtime handles tool-calling setup automatically.
Use It With Aider
Aider is where Devstral Small 2 actually shines. Its architect mode is a close match for how the model was designed to work: read the codebase, plan the edit, then apply it.
Install Aider if you haven't already — the Aider setup guide covers full configuration. Then point it at your local Ollama instance:
# Via local Ollama
aider --model ollama/devstral-small-2:latest
# With explicit context and editor model split
aider \
--model ollama/devstral-small-2:latest \
--architect \
--editor-model ollama/devstral-small-2:latest
The --architect flag puts Aider in a two-step mode: the first call plans the edit, the second applies it. This maps well to how Devstral was trained — it expects to reason about a file tree before making changes.
One practical note: Devstral Small 2 generates longer "thinking" sections when given open-ended architecture questions. For targeted bug fixes (aider --message "fix the race condition in worker.py line 42"), it's fast and accurate. For open-ended feature requests on a large codebase, give it explicit file context with aider file1.py file2.py rather than letting it figure out which files to open on its own.
Use It With Continue.dev
Continue.dev can use Devstral Small 2 via Ollama's OpenAI-compatible API. The model's 256K context window is an advantage here — you can index larger files as context without hitting limits.
In VS Code, open your Continue config (~/.continue/config.json) and add:
{
"models": [
{
"title": "Devstral Small 2 (local)",
"provider": "ollama",
"model": "devstral-small-2:latest",
"apiBase": "http://localhost:11434"
}
]
}
For the agent tab in Continue 0.9+, set it as the default agent model — the tool-calling support Devstral was trained on maps directly to Continue's agent tool loop. In the VS Code sidebar, select the model from the dropdown and switch to Agent mode.
If you're using Continue with a team and want a shared Ollama instance, run Ollama on the server with OLLAMA_HOST=0.0.0.0 ollama serve and point the apiBase at the server IP. See the Continue.dev + Ollama guide for multi-user setup details.
Use It With Mistral Vibe CLI
Mistral Vibe is the native CLI that shipped alongside Devstral 2. It's open-source (MIT license, available on GitHub), built specifically for Devstral, and runs in your terminal without an IDE.
Install and configure it:
bash
# Install via pip
pip install mistral-vibe
# Point at local Ollama (no API key needed)
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"
# Run in your project directory
vib
Top comments (0)