David

Posted on Apr 23

how to run qwen3.6-27b locally — the dense 27B that beats the 35B MoE on coding

#ai #tutorial #llm #opensource

Alibaba just dropped Qwen3.6-27B, a 27-billion parameter dense model that scores 77.2% on SWE-bench Verified. That's higher than Qwen3.6-35B-A3B (73.4%) — the MoE version everyone was talking about last week.

I've been building Locally Uncensored, a desktop AI app, and we just added Qwen3.6-27B support.

install with ollama

If you already have Ollama set up, it's a one-liner:

ollama run qwen3.6-27b

That's it. If you want a specific quantization:

ollama run qwen3.6-27b:q4_K_M   # 16GB RAM recommended
ollama run qwen3.6-27b:q8_0     # 27GB RAM recommended
ollama run qwen3.6-27b:fp8      # needs ~27GB VRAM (FP8)

Note: if ollama run qwen3.6-27b returns "model not found", give it a minute — Ollama's library updates periodically. You can also pull manually with ollama pull qwen3.6-27b.

what makes qwen3.6-27b different

The 35B-A3B is a Mixture-of-Experts model: 35B total params but only 3B activated per token. Qwen3.6-27B is a different beast — a dense 27B model with a Gated DeltaNet + Gated Attention hybrid architecture.

Key specs:

27B parameters (all active, no MoE routing)
64 layers, 5120 hidden dimension
262,144 token context natively (extensible to 1,010,000)
Vision encoder included (image-text-to-text)
Apache 2.0 license

The Gated DeltaNet architecture processes tokens through alternating Gated DeltaNet and Gated Attention layers — a hybrid that combines linear-attention efficiency with gated selective attention. It's a different design philosophy from both vanilla transformers and the 35B MoE.

benchmark table

Benchmark	Qwen3.6-27B	Qwen3.6-35B-A3B	Gemma4-31B
SWE-bench Verified	77.2	73.4	52.0
SWE-bench Pro	53.5	49.5	35.7
Terminal-Bench 2.0	59.3	51.5	42.9
SkillsBench Avg5	48.2	28.7	23.6
MMLU-Pro	86.2	85.2	85.2
LiveCodeBench v6	83.9	80.4	80.0
AIME 2026	94.1	92.7	89.2

All numbers from the official Qwen3.6-27B model card.

The 27B dense model is pulling ahead of the 35B MoE on agentic coding tasks — SWE-bench, Terminal-Bench, SkillsBench. The gap is especially wide on SkillsBench (48.2 vs 28.7) which tests real-world dev skills.

vram requirements

Qwen3.6-27B is a dense model, so all 27B parameters stay in memory:

Quantization	VRAM (approx)	Recommended GPU
Q2_K	10-11 GB	RTX 3060, RTX 4060
Q4_K_M	16-17 GB	RTX 4070, RTX 3080
Q8_0	27-28 GB	RTX 4090, A5000
FP8	27 GB	RTX 4090, H100
FP16	54 GB	dual GPU or professional

Note: these are for the base model only. With the vision encoder + KV cache for long context, add 2-4 GB overhead.

why not just use the 35B MoE?

The 35B-A3B activates fewer params per token, which means faster inference and lower memory during generation. But if you're doing agentic coding with longer context windows, the dense 27B is showing real advantages on benchmark tasks that require deep repository reasoning.

The 35B MoE also requires more total disk space (the full expert bank is still loaded even if only 3B activate per token) and the routing decisions can introduce variability.

try it with locally uncensored

I've been building Locally Uncensored — a cross-platform desktop app that lets you run Qwen3.6-27B (and other models) with uncensored outputs, image understanding, and a built-in code agent.

Features:

One-click model setup via Ollama
Image + text input
Built-in code agent mode
Chat history and export
No cloud, no data leaving your machine

# clone and run
git clone https://github.com/PurpleDoubleD/locally-uncensored
cd locally-uncensored && npm run tauri dev

Check the GitHub releases for pre-built binaries.

What GPU are you running? And have you tried the 27B vs the 35B MoE side-by-side? Drop a comment with your setup.

Locally Uncensored — AGPL-3.0 license.

DEV Community