DEV Community

David
David

Posted on • Originally published at locallyuncensored.com

How to Run Qwen 3.6 Locally - 27B Dense, 35B MoE, and Coding Variants Setup Guide

Originally published on locallyuncensored.com

Qwen 3.6 dropped on April 21 2026. Two main families: a 27B dense model that activates every parameter per token and a 35B MoE with 3B active per token. Both ship with vision, agentic coding, thinking-mode preservation, and a 256K context window.

If you only have time for the short version: install Locally Uncensored, open Model Manager > Discover > Text, search Qwen 3.6, hit the download arrow on the variant that fits your VRAM.

Which Qwen 3.6 Variant Should You Pick?

The biggest decision is dense vs MoE. The second biggest is which quant.

The 27B dense activates all 27B parameters for every token. Slower per token, but every token gets the full model. Quality is consistent. Recommended default for general chat, reasoning, and most coding.

The 35B MoE only activates 3B parameters per token via routing. Much faster per token (often 2-3x throughput at similar quants). VRAM peak during inference is lower than the model size suggests. But routing introduces variance. The MoE wins on coding benchmarks (SWE-bench specifically) when you pick the coding-specialised variant.

Quant Comparison Table

Variant Quant Disk VRAM Target Quality
27B dense UD-IQ2_XXS 8.7 GB 8 GB GPU Good (low-VRAM lifesaver)
27B dense Q3_K_M 13 GB 12 GB GPU Very good (RTX 3060 sweet spot)
27B dense Q4_K_M 16 GB 16 GB GPU Recommended default
27B dense UD-Q4_K_XL 16 GB 16 GB GPU Better quality per GB
27B dense Q5_K_M 18 GB 20 GB GPU High
27B dense Q6_K 21 GB 24 GB GPU Near-lossless
27B dense Q8_0 27 GB 32 GB GPU Effectively lossless
35B MoE Q4_K_M 24 GB 24 GB GPU Recommended for MoE
35B MoE NVFP4 22 GB 22 GB GPU (RTX 40+) Smallest with full quality on Blackwell
35B MoE coding NVFP4 22 GB 22 GB GPU (RTX 40+) Best coding-bench-per-GB
35B MoE BF16 71 GB 96 GB GPU Reference quality

Recommendation by Hardware

  • 8 GB VRAM (RTX 3060 8GB, RTX 4060 8GB): 27B UD-IQ2_XXS - the only quant that fits
  • 12 GB VRAM (RTX 3060 12GB, RTX 3080 Ti, RTX 4070): 27B Q3_K_M - sweet spot, ~15-25 tok/s
  • 16 GB VRAM: 27B Q4_K_M or UD-Q4_K_XL - the recommended default
  • 24 GB VRAM (RTX 3090, RTX 4090): 27B Q6_K for max dense quality, OR 35B MoE Q4_K_M for coding
  • RTX 40+ Blackwell: 35B MoE NVFP4 - smallest size with native quality
  • Apple Silicon M3/M4: 35B MoE MLX BF16 via MLX runtime
  • CPU only with 32 GB RAM: 27B Q4_K_M at 1-3 tok/s - usable for short tasks

Installation Path 1 - Ollama (CLI)

ollama pull qwen3.6:27b           # dense Q4_K_M, 16 GB
ollama pull qwen3.6                # 35B MoE Q4_K_M, 24 GB
ollama pull qwen3.6:35b-a3b-coding-nvfp4   # coding NVFP4
ollama run qwen3.6:27b
Enter fullscreen mode Exit fullscreen mode

Installation Path 2 - Locally Uncensored (GUI)

If you want a one-click experience plus chat, agent mode, image generation, and a/b model compare in the same window:

  1. Download the v2.4.0 installer for your OS
  2. First-launch wizard auto-detects Ollama (or offers one-click install)
  3. Model Manager > Discover > Text > search Qwen 3.6
  4. Click the download arrow on the variant matching your VRAM

Performance on RTX 3060 12 GB

Tested with Qwen 3.6 27B Q3_K_M, 4096-token context, fp16 KV cache:

Workload tok/s
Cold first response ~3 (model load)
Warm chat (50-token answers) 22-26
Long-form (1000 tokens) 18-20
Thinking-mode enabled 15-18

Vision Support

Both 27B dense and 35B MoE accept image input. Drag-and-drop a screenshot, photo, or chart. VRAM cost for vision is +1-2 GB on top of the base model.

Coding Performance

The 35B MoE coding-specialised variants are tuned on SWE-bench training data. The coding NVFP4 variant scores in the same ballpark as Claude 3.5 Sonnet on SWE-bench-verified at a fraction of the inference cost.

For day-to-day coding inside LU's Codex agent, the 27B dense Q4_K_M is the better default - consistent quality, no MoE-routing variance.

Qwen 3.6 vs Qwen 3.5

Feature Qwen 3.5 Qwen 3.6
Vision No Yes (both 27B and 35B)
Context window 128K 256K
Thinking mode QwQ-only Preserved across variants
Coding-specific MoE No Yes (35B-a3b-coding)
NVFP4 quant No Yes (35B MoE)
MLX variant for Apple Silicon No Yes

Locally Uncensored is AGPL-3.0 licensed. Built by PurpleDoubleD. Bug reports on GitHub Discussions or in the Discord.

Top comments (0)