JetBrains just open-sourced the missing piece of self-hosted AI pipelines

#ai #opensource #llm #programming

JetBrains just open-sourced Mellum2 — a 12B-parameter coding model built for the infrastructure layer of agentic AI systems. It's available under Apache 2.0 from day one, runs entirely on hardware you control, and is explicitly designed for the deployment scenarios where Claude Code and OpenAI Codex can't go: air-gapped environments, compliance-sensitive orgs, and teams that don't want to route every inference call through an external API.

"Frontier models will continue to push the limits, but practical AI products also require focal models: fast, specialized components that handle high-frequency tasks efficiently."

That's JetBrains framing Mellum2 not as a frontier model challenger, but as a specialist — fast, lean, and pointed squarely at software engineering workflows.

What actually changed

Mellum (the original) was a 4B-parameter model that did one thing: code completion inside JetBrains IDEs. It launched proprietary in late 2024 and went open-source in April 2025.

Mellum2 is a different animal. It's built for the broader set of tasks that now define how engineering teams ship AI: coordinating between models, handling sub-agent workloads, compressing context in retrieval pipelines. JetBrains calls it a "focal model" — not trying to beat GPT-4o on breadth, but winning on the high-frequency tasks that matter in production.

The architecture is Mixture-of-Experts (MoE): 12B total parameters, but only 2.5B active per token, routing through a subset of 64 experts. That's why the throughput numbers are interesting:

Single-request: matches Qwen2.5-7B (192 vs 193 tokens/sec on one H100)
Under concurrent load: 21% ahead of Qwen2.5-7B, 79% ahead of Qwen3-8B
EvalPlus (thinking variant): 78.4% — ahead of Qwen3.5-9B (71.8%) and Seed-Coder-8B (73.8%)

Two variants ship alongside the base: an instruct version for direct answers, and a thinking version that produces an explicit reasoning trace — aimed at harder multi-step and agentic tasks. The tradeoff is real though: Qwen3.5-9B still leads on broader reasoning benchmarks (GPQA Diamond, MMLU-Redux). JetBrains owns it: "The gap reflects a deliberate tradeoff in our training mix toward code and developer documentation rather than broad encyclopedic coverage."

The dependency argument

This is the real story. Claude Code runs locally but calls home to Anthropic. OpenAI Codex does the same to OpenAI. Cursor's power is tied to its platform, and its xAI partnership adds another layer of external control. Every one of these tools hands inference to someone else's infrastructure.

Mellum2 doesn't have to. Open weights, Apache 2.0, fully self-hostable. For teams in regulated industries, air-gapped environments, or anyone doing serious cost modeling at scale — that's not a minor footnote, it's the whole point.

JetBrains is making a bet: as AI embeds deeper into engineering workflows, deployment flexibility and operational control will matter more, not less.

What to do

If you're evaluating AI tooling for a compliance-sensitive environment — Mellum2 is now a credible option worth a benchmark run. Grab the weights on Hugging Face.
If you're building agentic pipelines — the MoE throughput advantage under load makes it worth testing as a routing or sub-agent model.
If you're on the frontier-model-only path — keep an eye on how the thinking variant matures. The EvalPlus numbers are already competitive for code-focused tasks.
If you run JetBrains IDEs — this is coming to your toolchain anyway. Understanding the architecture helps you configure it well.

Source: The New Stack

✏️ Drafted with KewBot (AI), edited and approved by Drew.