Dave Kurian

Posted on Jun 2 • Originally published at otf-kit.dev

JetBrains open-sources Mellum2 for private, high-speed AI coding infrastructure

#ai #llm #opensource #coding

Mellum2 open-source coding model: JetBrains’ fast AI alternative to Claude Code that runs anywhere

JetBrains’ Mellum2 open-source coding model lands with the rare combination of both speed and genuine infrastructure freedom—a 12B-parameter Mixture-of-Experts designed for agentic orchestration where every token stays on your side of the firewall. In a space dominated by third-party API coding tools and restrictive cloud access, Mellum2 is the first mainstream step toward practical, on-premises agentic AI infrastructure. For AI engineers and teams who demand control, privacy, and tight data locality, Mellum2 enables use cases that are simply out of reach for Claude Code and similar offerings.

What is Mellum2? An overview of JetBrains’ open-source coding model

Mellum2 is JetBrains’ freshly open-sourced, 12-billion-parameter code model built for infrastructure-level agentic AI. Announced in early June 2026, Mellum2 takes aim at a new class of AI engineering requirements: routing actions between sub-agents, managing retrieval contexts, and orchestrating code-based pipelines entirely within an enterprise’s control.

The roots go back to Mellum—a 4B-parameter proprietary coding model JetBrains first fitted into its IDEs in late 2024, then open-sourced in April 2025. Mellum was single-purpose: pure code completion, optimized for a JetBrains-centric workflow. Mellum2 jumps categories. From day one, it is both open and specialized for the jobs that define real AI infrastructure: not just completing code, but coordinating agentic workflows and reasoning across multiple moving parts.

Crucially, the shift isn’t just parameter count. Where Mellum was locked into a SaaS model and limited scope, Mellum2 is “open from the start,” as JetBrains put it in their own announcement. It launches with public weights, a permissive OSS license, and documentation built for self-serve deployment—making it a direct answer to self-hosted agentic AI ambitions, not just another cloud-bound product.

Main data points:

12B total parameters (Mixture-of-Experts, see below).
Open-sourced in early June 2026 by JetBrains.
Public repo, not limited to JetBrains IDEs.
Focused on infrastructure tasks: agent routing, context compression, sub-agent orchestration.

If Mellum was prototype infrastructure, Mellum2 is built for real pipelines.

How does Mellum2 differ from Anthropic’s Claude Code?

The dividing line is clear: Mellum2 runs wherever you control the stack. Claude Code and its kin are API-first, third-party cloud models. What does this mean in practice?

Parameters and model design: Mellum2 sports 12B parameters in a Mixture-of-Experts arrangement (active 2.5B per token), tuned for speed and high-frequency infrastructure tasks. While Anthropic doesn’t publish parameter counts for Claude Code, its design is optimized for cloud API access and general code understanding, not sub-agent orchestration.

Deployment model: Mellum2 can be set up fully on-premises or on dedicated VPC infrastructure. There is no required internet-facing component or dependency on JetBrains itself after download. By contrast, Claude Code cannot be self-hosted and always passes your queries over third-party APIs.

Privacy and data governance: Mellum2 guarantees that all input code, context, and outputs remain under your infrastructure team’s control—crucial for industries with data residency or privacy requirements. Claude Code is, by design, a black box beyond API boundaries.

Scope of usage: Where Claude Code is positioned as a context-rich coding assistant, Mellum2 targets the infrastructure “glue”—routing, context compression, and orchestrating the job of other agents and models, especially in sensitive, local environments.

In short: Mellum2 is built to work wherever Claude Code isn’t allowed.

What are the practical use cases for Mellum2 in AI development?

Mellum2’s real strength is infrastructure flexibility. Its main use cases reflect this bias:

1. Model and sub-agent coordination:

Instead of brute-forcing everything through a general LLM, engineers wire up Mellum2 to manage the flow—splitting complex jobs between specialized agents, handling state, and routing context based on custom business logic.

// Example: Using Mellum2 to broker tasks between retrieval agent and synthesis agent
const routingPrompt = `
[task]
Type: information retrieval
Context: {query}
Actions: route to subagent A, then summarize via subagent B.
`

const response = mellum2.run(routingPrompt, customContext)

2. Context compression in retrieval pipelines:

With expensive retrieval pipelines, compacting context before it reaches the main model is crucial for both speed and token budget. Mellum2’s focus on rapid inference makes it suited for pre-filtering and summarization.

python run_mellum2.py \
  --task compress_context \
  --input data/chunked_context.json \
  --output data/short_context.json

3. Private on-premises inference:

Compliance, data residency, and regulatory constraints often bar cloud models entirely. Mellum2 fills the gap, running on private clusters and answering code prompts where API-based assistants are non-starters.

4. Multi-agent workflow management:

Orchestrate complex pipelines (e.g., code review, static analysis, deployment scripting) by letting Mellum2 act as the “glue” between tool outputs and downstream triggers—without ever hitting the public cloud.

Enterprise impact:

In short, Mellum2 lets enterprises finally run agentic AI tooling in environments where Claude Code and co. will never be certified.

How to deploy and use Mellum2 today? Step-by-step guide for developers

Mellum2 is available from JetBrains’ official public repository. A typical deployment—local or cloud-based—can be up in under an hour, assuming adequate hardware for a 12B MoE model.

Requirements:

Hardware capable of 12B parameter models (multiple A100s recommended for production; smaller cards work for dev).
Python ≥3.10.
PyTorch ≥2.1 or compatible.
~32GB+ RAM, fast storage for weights.

1. Download weights

git clone 
cd mellum2
# Download official weights (link in repo readme)
python scripts/download_weights.py --variant base

2. Set up environment

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Run the model

To launch Mellum2 on a single-GPU node for dev/experimentation:

python run.py \
    --model_dir ./weights/mellum2-base \
    --mode instruct

Both “instruct” and “thinking” variants are available. The “instruct” mode answers directly, whereas the “thinking” option produces a reasoning trace before an answer.

4. Integrate with JetBrains IDEs (or your workflow)

JetBrains IDEs: Point the in-IDE AI assistant to your local Mellum2 endpoint using the built-in AI provider configuration UI.
Custom tools: Expose Mellum2 as a REST endpoint or implement as a sub-agent in your workflow (see provided Python wrappers).

5. Extend as needed

Mellum2’s open repo enables:

Custom post-training or fine-tuning for your codebase/domain.
Experimentation with context pipelines and agent routing.
Building adapters for other code and agent infrastructure.

There are no phone-home, callout, or cloud dependencies by default. Once installed, Mellum2 is yours—no caveats.

What makes Mellum2’s architecture and performance stand out?

Mellum2 isn’t just scaling up parameter count. Its Mixture-of-Experts (MoE) architecture means that while the weights total 12B parameters, only 2.5B are active per token. Each input is processed by a routed subset of 64 experts, rather than the full dense network.

What does this actually yield?

Speed: By limiting the number of active parameters, Mellum2 attains fast inference even on hardware that struggles with full-dense 12B models. It’s built for latency-sensitive infrastructure, not maximal model size.
Specialization: JetBrains’ engineers call Mellum2 a “focal model”: lean, targeted for engineered high-frequency tasks (routing, retrieval, orchestration), not general conversation.
Variants: Two shipped variants:
- “instruct” — answers directly, for fast agentic commands.
- “thinking” — explicit step-by-step reasoning trace, for multi-step agent flows.
Efficiency: MoE routing makes it more viable to deploy Mellum2 on mid-range enterprise hardware than you’d expect for its parameter size.

As JetBrains frames it: frontier models will keep pushing raw size and generality, but real AI products need smaller, sharper “focal models” to glue agentic systems together practically.

What are the limitations and future prospects for Mellum2?

Mellum2 is not trying to be a general-purpose coding assistant or conversation partner. Its “focal” specialty is a strength for infrastructure, but you can hit limits if you treat it like a ChatGPT replacement.

Limitations:

Out-of-scope for creative suggestions or open-ended coding support.
Not competitive with cloud LLMs for massive context or full-app codegen.
Dependency on community for further post-training/integration (but repo is open from day one).

Mellum2 is already in active use and JetBrains is inviting the open-source and research communities to contribute, fine-tune, and experiment on real workflows. The roadmap (per hints in the announcement) includes variants for more agents, context formats, and even tighter IDE integration.

In the broader AI dev tooling landscape, Mellum2 signals a move: real, on-prem agentic AI can now be practical, not hype. For industries where owning your entire inference and orchestration stack is a must—not just an ideal—this is the first credible open offering.

Closing thoughts

The Mellum2 open-source coding model is a real advance: a lean, fast, locally routable agentic AI built for the messy, tactical infrastructure jobs that general coding assistants cannot touch. By sidestepping API dependencies and cloud restrictions, Mellum2 finally lets engineering teams run and evolve their own agentic AI infrastructure privately—enabling use cases from secure pipeline orchestration to flexible context retrieval that simply cannot be done on Claude Code. For those ready to move AI out of the public cloud and into real, auditable pipelines, Mellum2 is usable and extensible today.