Originally published on locallyuncensored.com
Qwen 3.6 dropped on April 21 2026. Two main families: a 27B dense model that activates every parameter per token and a 35B MoE with 3B active per token. Both ship with vision, agentic coding, thinking-mode preservation, and a 256K context window.
If you only have time for the short version: install Locally Uncensored, open Model Manager > Discover > Text, search Qwen 3.6, hit the download arrow on the variant that fits your VRAM.
Which Qwen 3.6 Variant Should You Pick?
The biggest decision is dense vs MoE. The second biggest is which quant.
The 27B dense activates all 27B parameters for every token. Slower per token, but every token gets the full model. Quality is consistent. Recommended default for general chat, reasoning, and most coding.
The 35B MoE only activates 3B parameters per token via routing. Much faster per token (often 2-3x throughput at similar quants). VRAM peak during inference is lower than the model size suggests. But routing introduces variance. The MoE wins on coding benchmarks (SWE-bench specifically) when you pick the coding-specialised variant.
Quant Comparison Table
| Variant | Quant | Disk | VRAM Target | Quality |
|---|---|---|---|---|
| 27B dense | UD-IQ2_XXS | 8.7 GB | 8 GB GPU | Good (low-VRAM lifesaver) |
| 27B dense | Q3_K_M | 13 GB | 12 GB GPU | Very good (RTX 3060 sweet spot) |
| 27B dense | Q4_K_M | 16 GB | 16 GB GPU | Recommended default |
| 27B dense | UD-Q4_K_XL | 16 GB | 16 GB GPU | Better quality per GB |
| 27B dense | Q5_K_M | 18 GB | 20 GB GPU | High |
| 27B dense | Q6_K | 21 GB | 24 GB GPU | Near-lossless |
| 27B dense | Q8_0 | 27 GB | 32 GB GPU | Effectively lossless |
| 35B MoE | Q4_K_M | 24 GB | 24 GB GPU | Recommended for MoE |
| 35B MoE | NVFP4 | 22 GB | 22 GB GPU (RTX 40+) | Smallest with full quality on Blackwell |
| 35B MoE coding | NVFP4 | 22 GB | 22 GB GPU (RTX 40+) | Best coding-bench-per-GB |
| 35B MoE | BF16 | 71 GB | 96 GB GPU | Reference quality |
Recommendation by Hardware
- 8 GB VRAM (RTX 3060 8GB, RTX 4060 8GB): 27B UD-IQ2_XXS - the only quant that fits
- 12 GB VRAM (RTX 3060 12GB, RTX 3080 Ti, RTX 4070): 27B Q3_K_M - sweet spot, ~15-25 tok/s
- 16 GB VRAM: 27B Q4_K_M or UD-Q4_K_XL - the recommended default
- 24 GB VRAM (RTX 3090, RTX 4090): 27B Q6_K for max dense quality, OR 35B MoE Q4_K_M for coding
- RTX 40+ Blackwell: 35B MoE NVFP4 - smallest size with native quality
- Apple Silicon M3/M4: 35B MoE MLX BF16 via MLX runtime
- CPU only with 32 GB RAM: 27B Q4_K_M at 1-3 tok/s - usable for short tasks
Installation Path 1 - Ollama (CLI)
ollama pull qwen3.6:27b # dense Q4_K_M, 16 GB
ollama pull qwen3.6 # 35B MoE Q4_K_M, 24 GB
ollama pull qwen3.6:35b-a3b-coding-nvfp4 # coding NVFP4
ollama run qwen3.6:27b
Installation Path 2 - Locally Uncensored (GUI)
If you want a one-click experience plus chat, agent mode, image generation, and a/b model compare in the same window:
- Download the v2.4.0 installer for your OS
- First-launch wizard auto-detects Ollama (or offers one-click install)
- Model Manager > Discover > Text > search Qwen 3.6
- Click the download arrow on the variant matching your VRAM
Performance on RTX 3060 12 GB
Tested with Qwen 3.6 27B Q3_K_M, 4096-token context, fp16 KV cache:
| Workload | tok/s |
|---|---|
| Cold first response | ~3 (model load) |
| Warm chat (50-token answers) | 22-26 |
| Long-form (1000 tokens) | 18-20 |
| Thinking-mode enabled | 15-18 |
Vision Support
Both 27B dense and 35B MoE accept image input. Drag-and-drop a screenshot, photo, or chart. VRAM cost for vision is +1-2 GB on top of the base model.
Coding Performance
The 35B MoE coding-specialised variants are tuned on SWE-bench training data. The coding NVFP4 variant scores in the same ballpark as Claude 3.5 Sonnet on SWE-bench-verified at a fraction of the inference cost.
For day-to-day coding inside LU's Codex agent, the 27B dense Q4_K_M is the better default - consistent quality, no MoE-routing variance.
Qwen 3.6 vs Qwen 3.5
| Feature | Qwen 3.5 | Qwen 3.6 |
|---|---|---|
| Vision | No | Yes (both 27B and 35B) |
| Context window | 128K | 256K |
| Thinking mode | QwQ-only | Preserved across variants |
| Coding-specific MoE | No | Yes (35B-a3b-coding) |
| NVFP4 quant | No | Yes (35B MoE) |
| MLX variant for Apple Silicon | No | Yes |
Locally Uncensored is AGPL-3.0 licensed. Built by PurpleDoubleD. Bug reports on GitHub Discussions or in the Discord.
Top comments (0)