Richard Gibbons

Posted on Jan 2 • Originally published at digitalapplied.com on Dec 10, 2025

Devstral 2 & Mistral Vibe CLI: Complete Coding Guide

#devstral2 #mistralai #vibecli #opensourceai

Master Devstral 2 (72.2% SWE-bench) and Mistral Vibe CLI. Open-weight coding models that run locally. Complete autonomous agent guide.

Key Takeaways

72.2% SWE-bench Performance: Devstral 2 (123B parameters) achieves 72.2% on SWE-bench Verified, making it the highest-performing open-weight coding model and competitive with proprietary solutions like Claude Sonnet (77.2%).
7x Cheaper Than Claude: At $0.40/$2.00 per million tokens (input/output), Devstral 2 is approximately 7x cheaper than Claude Sonnet 4.5 ($3/$15). Free API access available through December 2025.
Run Locally on Consumer Hardware: Devstral Small 2 (24B) scores 68% on SWE-bench and runs on RTX 4090 (24GB VRAM) or Mac with 32GB RAM, enabling unlimited local AI coding without per-token API fees.
Mistral Vibe CLI Integration: Terminal-based agentic coding interface with file editing, codebase search, bash execution, and MCP integration. Integrates with Zed, Kilo Code, and Cline IDEs.

Devstral 2 Technical Specifications

Released December 9, 2025 | Free API through December 2025

Devstral 2 (123B) - Flagship

Specification	Value
Parameters	123 billion
Context Window	256K tokens
SWE-bench Verified	72.2%
API Pricing	$0.40 / $2.00 per 1M
License	Modified MIT
Hardware	4x H100 GPUs

Devstral Small 2 (24B) - Local-Friendly

Specification	Value
Parameters	24 billion
Context Window	256K tokens
SWE-bench Verified	68.0%
API Pricing	$0.10 / $0.30 per 1M
License	Apache 2.0
Hardware	RTX 4090 / 32GB Mac

Features: Vision/Multimodal Support, Tool Calling, Agentic Workflows, MCP Compatible

Free API Access: Mistral is offering free API access to Devstral 2 through December 2025. Pricing begins January 2026—excellent time to evaluate before costs apply.

Introduction

Mistral AI released Devstral 2 and Mistral Vibe CLI on December 9, 2025, delivering the most capable open-weight coding models available. Devstral 2 (123B parameters) achieves 72.2% on SWE-bench Verified—surpassing DeepSeek V3.2 (63.8%) and approaching Claude Sonnet 4.5 territory (77.2%)—while Devstral Small 2 (24B) scores 68% and runs on consumer laptops with 32GB RAM. At 7x cheaper than Claude Sonnet per token, this release fundamentally changes the economics of AI-assisted development.

The significance extends beyond benchmark numbers. Open-weight models like Devstral 2 run entirely on your infrastructure—your code never leaves your machine, eliminating data privacy concerns that limit AI adoption in security-conscious organizations. Devstral Small 2's Apache 2.0 license enables unrestricted commercial use at any scale, while Devstral 2 (123B) uses a Modified MIT license suitable for companies under $20M monthly revenue. For individual developers, Devstral Small 2 offers unlimited local AI coding assistance without the $20-200/month subscription costs of Claude Code, GitHub Copilot, or Cursor Pro.

Benchmark Comparison: Devstral 2 vs Competitors

Benchmark	Devstral 2 (123B)	Devstral Small (24B)	Claude Sonnet 4.5	DeepSeek V3.2
SWE-bench Verified	72.2%	68.0%	77.2%	63.8%
Terminal Bench 2	22.5%	~18%	42.8%	~20%
HumanEval+	89.7%	~85%	91.2%	87.4%
MBPP+	78.4%	~74%	79.8%	75.1%
Context Window	256K	256K	200K	128K
Head-to-Head Win Rate	vs DeepSeek: 42.8%	—	vs Devstral: 53.1%	vs Devstral: 28.6%

Choose Devstral 2 When

High-volume coding tasks (7x cheaper)
Privacy-sensitive codebases
Bug fixes, tests, refactoring
Self-hosted/air-gapped environments

Choose Claude When

Architectural decisions
Complex reasoning tasks
Terminal-heavy workflows
Security-critical code

Hybrid Strategy

Devstral for drafts and boilerplate
Claude for review and complex logic
Route by task complexity
Optimize cost vs quality

API Pricing & Cost Optimization: 7x Cheaper Than Claude

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context	Free Tier
Devstral 2	$0.40	$2.00	256K	Free until Jan 2026
Devstral Small 2	$0.10	$0.30	256K	Free until Jan 2026
Claude Sonnet 4.5	$3.00	$15.00	200K	None
Claude Opus 4.5	$15.00	$75.00	200K	None
DeepSeek V3.2	$0.27	$1.10	128K	Limited
GPT-4.1	$2.00	$8.00	128K	None

Cost Optimization Strategies

Start with Small 2: Devstral Small 2 at $0.10/$0.30 is 4x cheaper than the 123B model and sufficient for 90% of coding tasks. Scale up only when needed.
Local Deployment: Run Devstral Small 2 locally for zero marginal cost. RTX 4090 hardware amortizes quickly at high usage volumes.
Task Routing: Route high-volume tasks (tests, docs, boilerplate) to Devstral locally, complex reasoning to Claude API. Optimize cost vs quality.
Free Period Evaluation: Use free API access through December 2025 to evaluate both models on your workloads before pricing begins January 2026.

Mistral Vibe CLI: Terminal-Based Agentic Coding

Mistral Vibe CLI is a command-line AI coding assistant that provides a conversational interface to your codebase. Unlike cloud-based alternatives, Vibe can run entirely locally with Devstral models—your code never leaves your machine. Built in Python (not Node.js like Claude Code or Gemini CLI), it offers file manipulation, terminal access, semantic search, and MCP integration.

Core Capabilities

File Operations

read_file - View file contents
write_file - Create/update files
search_replace - Patch existing code
Multi-file editing across codebase

Terminal Access

bash - Stateful shell execution
Run tests, git operations
Execute build commands
! prefix for direct commands

Code Search

grep with ripgrep support
Fast recursive search
Auto-ignores .venv, .pyc
@ autocomplete for files

Installation & Setup

# Quick install (requires Python 3.12+)
curl -LsSf https://mistral.ai/vibe/install.sh | bash

# Or with uv (recommended for faster dependency management)
uv tool install mistral-vibe

# First run creates config and prompts for API key
vibe

# Configuration stored at:
# ~/.vibe/config.toml  - Settings
# ~/.vibe/.env         - API key (MISTRAL_API_KEY)

# Basic usage
vibe                                      # Interactive chat
vibe --prompt "add error handling"        # Non-interactive
!ls -la                                   # Direct shell command
@src/main.py                              # Reference file

IDE Integrations: Zed, Kilo Code, and Cline

Mistral Vibe integrates with popular development environments through the Agent Communication Protocol (ACP), enabling seamless multi-file operations within your preferred IDE.

Zed Editor - Native Integration

Built-in extension support
Fastest setup - just add API key
Best for speed-focused devs
Limited to Zed ecosystem

Kilo Code - VS Code Compatible

Feature-rich agent workflows
Advanced customization
Best for power users
Steeper learning curve

Cline - VS Code Extension

Familiar VS Code interface
Works with existing setup
Best for VS Code users
Requires extension install

MCP Integration: Extend Vibe's capabilities by configuring MCP (Model Context Protocol) servers in config.toml. Supports HTTP, streamable-HTTP, and stdio transports for database access, external APIs, and custom tools.

Deployment Options: vLLM vs llama.cpp vs Ollama

Method	Best For	Setup	Performance	Production
Mistral API	Quick start, no hardware	Very Easy	Fast (cloud)	Yes
vLLM (Recommended)	Production deployment	Medium	Fastest local	Yes
llama.cpp	Single-user local	Easy	Good	Development
Ollama	Beginner-friendly local	Very Easy	Good	Development
LM Studio	GUI preference	Very Easy	Moderate	Development

Deployment Commands

# vLLM (Production - Recommended by Mistral)
vllm serve mistralai/Devstral-Small-2-24B-Instruct-2512 \
  --tool-call-parser mistral \
  --enable-auto-tool-choice \
  --tensor-parallel-size 2

# llama.cpp (Development)
./llama-cli -m devstral-small-2-Q4_K_M.gguf \
  -p "You are a coding expert." \
  -n -1 -ctx 8192 -ngl 99 --jinja
# Note: --jinja required for system prompts
# -ngl 99 offloads all layers to GPU

# Ollama (Easiest)
ollama run devstral-small-2

# Requirements:
# - mistral_common >= 1.8.6 for correct tool calls
# - Use official GGUF files from bartowski or Mistral

Local Serving Note: Current llama.cpp/Ollama/LM Studio implementations may have subpar performance compared to vLLM. If you notice issues, report to the relevant framework and consider using Mistral API in the meantime.

Hardware Requirements: From Laptop to Data Center

Hobbyist / Freelancer - Consumer Hardware

Specification	Value
Model	Devstral Small 2 (Q4)
GPU	RTX 3090/4090 24GB
Mac	M2/M3 Max 32GB+
Context	16K-57K tokens
Speed	15-44 tok/s generation

Startup Team - Mid-Range Hardware

Specification	Value
Model	Devstral Small 2 (Q8)
GPU	RTX 5090 32GB / 2x 4090
Mac	Mac Studio M3 Ultra 64GB
Context	64K-120K tokens
Speed	25-60 tok/s generation

Enterprise Production - Data Center Hardware

Specification	Value
Model	Devstral 2 (123B)
GPU	4x H100 80GB
VRAM	320GB total
Context	Full 256K tokens
Use Case	Team serving, max quality

Performance Benchmarks - Real-World Speed

Metric	Value
RTX 4090 Prompt	1,296 tok/s
RTX 4090 Generation	44 tok/s
Mac M3 Max	~15-20 tok/s
vs Qwen 32B	826/26 tok/s (slower)

When NOT to Use Devstral: Honest Guidance

Don't Use Devstral For

Architectural Decisions - Claude provides nuanced tradeoff analysis; Devstral gives generic advice
Front-End Development - Limited UI/animation capabilities; use specialized tools
Novel Algorithms - Creative problem-solving beyond pattern matching favors proprietary models
Terminal-Heavy Tasks - Terminal Bench shows 22.5% vs Claude's 42.8%
Security-Critical Code - 5% quality gap matters; extra review recommended

When Human Expertise Wins

System Architecture - Understanding business context and real-world tradeoffs
Code Review - Catching subtle issues, mentoring junior developers
Security Audits - Threat modeling, compliance requirements
Performance Optimization - Understanding production constraints
Technical Leadership - Making build-vs-buy decisions

Common Mistakes to Avoid

Mistake #1: Starting with the 123B Model

The Error: Developers try the largest model assuming "bigger = better."

The Impact: Massive hardware requirements (4x H100), slower iteration, unnecessary cost, and potential licensing complications ($20M threshold).

The Fix: Start with Devstral Small 2 (24B)—it's sufficient for 90% of coding tasks and runs on consumer hardware.

Mistake #2: Ignoring Quantization Benefits

The Error: Running full precision (FP16/FP32) models when unnecessary.

The Impact: 2-3x higher memory usage (25GB vs 14GB), slower inference, can't fit in consumer GPU VRAM.

The Fix: Use Q4_K_M quantization—delivers 95%+ quality at 40% memory. Q4 fits in 24GB VRAM with 57K context.

Mistake #3: Misunderstanding the Modified MIT License

The Error: Using 123B model in large enterprise without checking license terms.

The Impact: Companies exceeding $20M monthly revenue cannot use it (including derivatives) without commercial license.

The Fix: Use Apache 2.0-licensed Devstral Small 2 for unrestricted commercial use, or obtain commercial license from Mistral.

Mistake #4: Loading Entire Codebase into Context

The Error: Assuming "more context = better results" and loading all files.

The Impact: Increased latency, higher API costs, may actually confuse the model with irrelevant context.

The Fix: Use Vibe CLI's semantic search (@ autocomplete) to load only relevant files. Let the tool manage context intelligently.

Mistake #5: Expecting Identical Results Across Frameworks

The Error: Assuming llama.cpp, Ollama, and vLLM produce identical outputs.

The Impact: Subpar performance, inconsistent results, frustration with local deployment.

The Fix: Use vLLM for production (recommended by Mistral). Report framework issues to maintainers. Use official GGUF files.

Enterprise Licensing Warning: The Devstral 2 (123B) Modified MIT license prohibits use by companies with over $20 million in monthly revenue without a commercial license. This applies to the base model, fine-tuned versions, and derivatives. For unrestricted commercial use at any scale, use Devstral Small 2 (Apache 2.0).

Conclusion

Devstral 2 and Mistral Vibe CLI represent a significant milestone for open-weight AI coding tools. The 72.2% SWE-bench score proves that open models can compete with proprietary solutions on core coding tasks, while the 7x cost advantage over Claude and completely free API access through December 2025 make evaluation compelling. Devstral Small 2's Apache 2.0 license removes all commercial use barriers—local, unlimited, and surprisingly capable.

The competitive landscape has shifted. Organizations can no longer assume that effective AI coding assistance requires sending code to third-party servers or paying per-token API fees. For privacy-conscious teams, budget-constrained startups, or developers who simply want unlimited local AI assistance, Devstral delivers genuine value. The hybrid strategy—Devstral for volume tasks, Claude for complex reasoning—offers the best of both worlds.

Frequently Asked Questions

What is Devstral 2 and how does it compare to proprietary coding models?

Devstral 2, released December 9, 2025, is Mistral AI's open-weight coding model designed for autonomous software engineering. The flagship model (123B parameters) achieves 72.2% on SWE-bench Verified, compared to Claude Sonnet 4.5's 77.2% and Claude Opus 4.5's 80.9%. In head-to-head testing, Devstral beats DeepSeek V3.2 42.8% vs 28.6%, but Claude Sonnet is preferred 53.1% of the time. The key advantage: Devstral is 7x cheaper per token and can run locally for zero marginal cost.

What are the hardware requirements for running Devstral 2 locally?

Devstral Small 2 (24B) runs on consumer hardware: RTX 4090 with 24GB VRAM achieves ~44 tokens/sec generation with Q4_K_M quantization (14-16GB). Mac M3 Max with 32GB unified memory achieves ~15-20 tok/sec. Full 123B model requires 4x H100 GPUs (production) or 256GB RAM (slow CPU inference). Context window scales with VRAM: RTX 4090 supports 57K tokens, RTX 5090 (32GB) supports 120K tokens.

What is Mistral Vibe CLI and how does it work?

Mistral Vibe CLI is a terminal-based AI coding assistant powered by Devstral models. It provides file operations (read_file, write_file, search_replace), bash execution in stateful terminal, grep search with ripgrep support, and todo task management. Install via curl -LsSf https://mistral.ai/vibe/install.sh | bash. Configuration stored at ~/.vibe/config.toml. Supports MCP (Model Context Protocol) for extending capabilities.

How do I install and set up Mistral Vibe CLI?

Install: curl -LsSf https://mistral.ai/vibe/install.sh | bash (requires Python 3.12+). First run creates ~/.vibe/config.toml and prompts for API key (stored in ~/.vibe/.env). Set MISTRAL_API_KEY environment variable or let Vibe prompt you. Basic usage: vibe for interactive chat, vibe --prompt 'task' for non-interactive, !command for direct shell execution, @filename for file references.

What are the licensing implications of using Devstral 2 commercially?

Devstral Small 2 (24B) uses Apache 2.0 license—fully permissive for any commercial use without restrictions. Devstral 2 (123B) uses Modified MIT license with critical limitation: companies with over $20 million monthly revenue cannot use it (including derivatives) without separate commercial license from Mistral. For enterprise without licensing concerns, use Small 2 or obtain commercial license.

How does Devstral 2 perform on real-world development tasks beyond SWE-bench?

On HumanEval+ (code generation), Devstral 2 scores 89.7% vs GPT-4's 87.1%. On MBPP+ (Python problems), 78.4% vs 76.2%. However, Terminal Bench 2 reveals limitations: Devstral scores 22.5% vs Claude's 42.8% for command-line tasks. Main gaps: architectural reasoning (generic vs nuanced advice), explanation quality, and novel algorithm design. Best for: bug fixes, test generation, refactoring. Use Claude for: architecture decisions, security code, complex reasoning.

What is the API pricing for Devstral 2 after the free period?

Starting January 2026: Devstral 2 (123B) costs $0.40 per million input tokens, $2.00 per million output tokens. Devstral Small 2 (24B) costs $0.10/$0.30. For comparison: Claude Sonnet 4.5 is $3/$15 (7.5x more expensive), Claude Opus 4.5 is $15/$75. Free API access continues through December 2025—excellent time to evaluate before pricing begins.

How do I deploy Devstral locally with vLLM vs llama.cpp vs Ollama?

vLLM (production recommended): vllm serve mistralai/Devstral-Small-2-24B-Instruct-2512 --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2. llama.cpp (development): ./llama-cli -m devstral-small-2-Q4_K_M.gguf -n -1 -ctx 8192 -ngl 99 --jinja (--jinja required for system prompts). Ollama (easiest): ollama run devstral-small-2. Note: local serving may have subpar performance—report issues to frameworks.

Can Devstral Small 2 process images (vision/multimodal)?

Yes, Devstral Small 2 (24B) supports image inputs and can power multimodal agents. This enables analyzing screenshots, processing diagrams, and understanding visual context in codebases. The architecture is based on Ministral 3, which includes vision capabilities. Both 24B and 123B models support multimodal inputs.

What is the $20 million revenue threshold for the 123B model?

The 123B model's Modified MIT license restricts use by companies exceeding $20 million in monthly revenue—they must obtain a commercial license from Mistral. This applies to the base model, fine-tuned versions, and derivatives regardless of who hosts them. For unrestricted commercial use at any scale, use Devstral Small 2 (24B) which has Apache 2.0 licensing with no revenue restrictions.

How does Devstral 2 compare to DeepSeek V3.2 and Qwen Coder?

Devstral 2 outperforms DeepSeek V3.2 in head-to-head testing: 42.8% win rate vs 28.6% loss rate. Devstral is 5x smaller than DeepSeek V3.2 (671B) and 8x smaller than Kimi K2 (984B) while achieving competitive performance. Devstral Small 2 outperforms Qwen 2.5 Coder 32B on the same hardware: 1296 tok/s vs 826 tok/s prompt processing, 44 tok/s vs 26 tok/s generation.

What IDE integrations are available for Mistral Vibe CLI?

Vibe CLI integrates with: Zed (native extension, fastest setup), Kilo Code (VS Code compatible, advanced features), and Cline (VS Code extension, familiar interface). All support the Agent Communication Protocol (ACP) for multi-file operations. Terminal remains the most flexible option for CLI enthusiasts. Configure MCP servers in config.toml for custom tool integration.

What is Terminal Bench 2 and why does Claude score higher?

Terminal Bench 2 measures command-line interaction capabilities—complex shell operations, system administration tasks, and CLI-heavy workflows. Claude Sonnet scores 42.8% while Devstral achieves 22.5%. This gap indicates Claude handles terminal-centric tasks better. For system administration scripts, complex git operations, or DevOps automation, consider routing those tasks to Claude while using Devstral for standard coding.

How do I configure MCP servers with Mistral Vibe CLI?

Vibe supports MCP (Model Context Protocol) servers via config.toml at ~/.vibe/config.toml. Configure HTTP, streamable-HTTP, or stdio transports to extend capabilities with database access, external APIs, or custom tools. Tool permissions are configurable: [tools.bash] permission = 'always' or disable specific tools with disabled_tools = ['search_replace', 'write_file']. See Mistral documentation for detailed server configuration.