DEV Community

David
David

Posted on

how to run qwen3.6-27b locally — the dense 27B that beats the 35B MoE on coding

Alibaba just dropped Qwen3.6-27B, a 27-billion parameter dense model that scores 77.2% on SWE-bench Verified. That's higher than Qwen3.6-35B-A3B (73.4%) — the MoE version everyone was talking about last week.

I've been building Locally Uncensored, a desktop AI app, and we just added Qwen3.6-27B support.

install with ollama

If you already have Ollama set up, it's a one-liner:

ollama run qwen3.6-27b
Enter fullscreen mode Exit fullscreen mode

That's it. If you want a specific quantization:

ollama run qwen3.6-27b:q4_K_M   # 16GB RAM recommended
ollama run qwen3.6-27b:q8_0     # 27GB RAM recommended
ollama run qwen3.6-27b:fp8      # needs ~27GB VRAM (FP8)
Enter fullscreen mode Exit fullscreen mode

Note: if ollama run qwen3.6-27b returns "model not found", give it a minute — Ollama's library updates periodically. You can also pull manually with ollama pull qwen3.6-27b.

what makes qwen3.6-27b different

The 35B-A3B is a Mixture-of-Experts model: 35B total params but only 3B activated per token. Qwen3.6-27B is a different beast — a dense 27B model with a Gated DeltaNet + Gated Attention hybrid architecture.

Key specs:

  • 27B parameters (all active, no MoE routing)
  • 64 layers, 5120 hidden dimension
  • 262,144 token context natively (extensible to 1,010,000)
  • Vision encoder included (image-text-to-text)
  • Apache 2.0 license

The Gated DeltaNet architecture processes tokens through alternating Gated DeltaNet and Gated Attention layers — a hybrid that combines linear-attention efficiency with gated selective attention. It's a different design philosophy from both vanilla transformers and the 35B MoE.

benchmark table

Benchmark Qwen3.6-27B Qwen3.6-35B-A3B Gemma4-31B
SWE-bench Verified 77.2 73.4 52.0
SWE-bench Pro 53.5 49.5 35.7
Terminal-Bench 2.0 59.3 51.5 42.9
SkillsBench Avg5 48.2 28.7 23.6
MMLU-Pro 86.2 85.2 85.2
LiveCodeBench v6 83.9 80.4 80.0
AIME 2026 94.1 92.7 89.2

All numbers from the official Qwen3.6-27B model card.

The 27B dense model is pulling ahead of the 35B MoE on agentic coding tasks — SWE-bench, Terminal-Bench, SkillsBench. The gap is especially wide on SkillsBench (48.2 vs 28.7) which tests real-world dev skills.

vram requirements

Qwen3.6-27B is a dense model, so all 27B parameters stay in memory:

Quantization VRAM (approx) Recommended GPU
Q2_K 10-11 GB RTX 3060, RTX 4060
Q4_K_M 16-17 GB RTX 4070, RTX 3080
Q8_0 27-28 GB RTX 4090, A5000
FP8 27 GB RTX 4090, H100
FP16 54 GB dual GPU or professional

Note: these are for the base model only. With the vision encoder + KV cache for long context, add 2-4 GB overhead.

why not just use the 35B MoE?

The 35B-A3B activates fewer params per token, which means faster inference and lower memory during generation. But if you're doing agentic coding with longer context windows, the dense 27B is showing real advantages on benchmark tasks that require deep repository reasoning.

The 35B MoE also requires more total disk space (the full expert bank is still loaded even if only 3B activate per token) and the routing decisions can introduce variability.

try it with locally uncensored

I've been building Locally Uncensored — a cross-platform desktop app that lets you run Qwen3.6-27B (and other models) with uncensored outputs, image understanding, and a built-in code agent.

Features:

  • One-click model setup via Ollama
  • Image + text input
  • Built-in code agent mode
  • Chat history and export
  • No cloud, no data leaving your machine
# clone and run
git clone https://github.com/PurpleDoubleD/locally-uncensored
cd locally-uncensored && npm run tauri dev
Enter fullscreen mode Exit fullscreen mode

Check the GitHub releases for pre-built binaries.


What GPU are you running? And have you tried the 27B vs the 35B MoE side-by-side? Drop a comment with your setup.

Locally Uncensored — AGPL-3.0 license.

Top comments (0)