Alibaba just dropped Qwen3.6-27B, a 27-billion parameter dense model that scores 77.2% on SWE-bench Verified. That's higher than Qwen3.6-35B-A3B (73.4%) — the MoE version everyone was talking about last week.
I've been building Locally Uncensored, a desktop AI app, and we just added Qwen3.6-27B support.
install with ollama
If you already have Ollama set up, it's a one-liner:
ollama run qwen3.6-27b
That's it. If you want a specific quantization:
ollama run qwen3.6-27b:q4_K_M # 16GB RAM recommended
ollama run qwen3.6-27b:q8_0 # 27GB RAM recommended
ollama run qwen3.6-27b:fp8 # needs ~27GB VRAM (FP8)
Note: if ollama run qwen3.6-27b returns "model not found", give it a minute — Ollama's library updates periodically. You can also pull manually with ollama pull qwen3.6-27b.
what makes qwen3.6-27b different
The 35B-A3B is a Mixture-of-Experts model: 35B total params but only 3B activated per token. Qwen3.6-27B is a different beast — a dense 27B model with a Gated DeltaNet + Gated Attention hybrid architecture.
Key specs:
- 27B parameters (all active, no MoE routing)
- 64 layers, 5120 hidden dimension
- 262,144 token context natively (extensible to 1,010,000)
- Vision encoder included (image-text-to-text)
- Apache 2.0 license
The Gated DeltaNet architecture processes tokens through alternating Gated DeltaNet and Gated Attention layers — a hybrid that combines linear-attention efficiency with gated selective attention. It's a different design philosophy from both vanilla transformers and the 35B MoE.
benchmark table
| Benchmark | Qwen3.6-27B | Qwen3.6-35B-A3B | Gemma4-31B |
|---|---|---|---|
| SWE-bench Verified | 77.2 | 73.4 | 52.0 |
| SWE-bench Pro | 53.5 | 49.5 | 35.7 |
| Terminal-Bench 2.0 | 59.3 | 51.5 | 42.9 |
| SkillsBench Avg5 | 48.2 | 28.7 | 23.6 |
| MMLU-Pro | 86.2 | 85.2 | 85.2 |
| LiveCodeBench v6 | 83.9 | 80.4 | 80.0 |
| AIME 2026 | 94.1 | 92.7 | 89.2 |
All numbers from the official Qwen3.6-27B model card.
The 27B dense model is pulling ahead of the 35B MoE on agentic coding tasks — SWE-bench, Terminal-Bench, SkillsBench. The gap is especially wide on SkillsBench (48.2 vs 28.7) which tests real-world dev skills.
vram requirements
Qwen3.6-27B is a dense model, so all 27B parameters stay in memory:
| Quantization | VRAM (approx) | Recommended GPU |
|---|---|---|
| Q2_K | 10-11 GB | RTX 3060, RTX 4060 |
| Q4_K_M | 16-17 GB | RTX 4070, RTX 3080 |
| Q8_0 | 27-28 GB | RTX 4090, A5000 |
| FP8 | 27 GB | RTX 4090, H100 |
| FP16 | 54 GB | dual GPU or professional |
Note: these are for the base model only. With the vision encoder + KV cache for long context, add 2-4 GB overhead.
why not just use the 35B MoE?
The 35B-A3B activates fewer params per token, which means faster inference and lower memory during generation. But if you're doing agentic coding with longer context windows, the dense 27B is showing real advantages on benchmark tasks that require deep repository reasoning.
The 35B MoE also requires more total disk space (the full expert bank is still loaded even if only 3B activate per token) and the routing decisions can introduce variability.
try it with locally uncensored
I've been building Locally Uncensored — a cross-platform desktop app that lets you run Qwen3.6-27B (and other models) with uncensored outputs, image understanding, and a built-in code agent.
Features:
- One-click model setup via Ollama
- Image + text input
- Built-in code agent mode
- Chat history and export
- No cloud, no data leaving your machine
# clone and run
git clone https://github.com/PurpleDoubleD/locally-uncensored
cd locally-uncensored && npm run tauri dev
Check the GitHub releases for pre-built binaries.
What GPU are you running? And have you tried the 27B vs the 35B MoE side-by-side? Drop a comment with your setup.
Locally Uncensored — AGPL-3.0 license.
Top comments (0)