David

Posted on Apr 24 • Originally published at locallyuncensored.com

How to Run Qwen 3.6 Locally - 27B Dense, 35B MoE, and Coding Variants Setup Guide

#localllm #qwen #tutorial #opensource

Originally published on locallyuncensored.com

Qwen 3.6 dropped on April 21 2026. Two main families: a 27B dense model that activates every parameter per token and a 35B MoE with 3B active per token. Both ship with vision, agentic coding, thinking-mode preservation, and a 256K context window.

If you only have time for the short version: install Locally Uncensored, open Model Manager > Discover > Text, search Qwen 3.6, hit the download arrow on the variant that fits your VRAM.

Which Qwen 3.6 Variant Should You Pick?

The biggest decision is dense vs MoE. The second biggest is which quant.

The 27B dense activates all 27B parameters for every token. Slower per token, but every token gets the full model. Quality is consistent. Recommended default for general chat, reasoning, and most coding.

The 35B MoE only activates 3B parameters per token via routing. Much faster per token (often 2-3x throughput at similar quants). VRAM peak during inference is lower than the model size suggests. But routing introduces variance. The MoE wins on coding benchmarks (SWE-bench specifically) when you pick the coding-specialised variant.

Quant Comparison Table

Variant	Quant	Disk	VRAM Target	Quality
27B dense	UD-IQ2_XXS	8.7 GB	8 GB GPU	Good (low-VRAM lifesaver)
27B dense	Q3_K_M	13 GB	12 GB GPU	Very good (RTX 3060 sweet spot)
27B dense	Q4_K_M	16 GB	16 GB GPU	Recommended default
27B dense	UD-Q4_K_XL	16 GB	16 GB GPU	Better quality per GB
27B dense	Q5_K_M	18 GB	20 GB GPU	High
27B dense	Q6_K	21 GB	24 GB GPU	Near-lossless
27B dense	Q8_0	27 GB	32 GB GPU	Effectively lossless
35B MoE	Q4_K_M	24 GB	24 GB GPU	Recommended for MoE
35B MoE	NVFP4	22 GB	22 GB GPU (RTX 40+)	Smallest with full quality on Blackwell
35B MoE coding	NVFP4	22 GB	22 GB GPU (RTX 40+)	Best coding-bench-per-GB
35B MoE	BF16	71 GB	96 GB GPU	Reference quality

Recommendation by Hardware

8 GB VRAM (RTX 3060 8GB, RTX 4060 8GB): 27B UD-IQ2_XXS - the only quant that fits
12 GB VRAM (RTX 3060 12GB, RTX 3080 Ti, RTX 4070): 27B Q3_K_M - sweet spot, ~15-25 tok/s
16 GB VRAM: 27B Q4_K_M or UD-Q4_K_XL - the recommended default
24 GB VRAM (RTX 3090, RTX 4090): 27B Q6_K for max dense quality, OR 35B MoE Q4_K_M for coding
RTX 40+ Blackwell: 35B MoE NVFP4 - smallest size with native quality
Apple Silicon M3/M4: 35B MoE MLX BF16 via MLX runtime
CPU only with 32 GB RAM: 27B Q4_K_M at 1-3 tok/s - usable for short tasks

Installation Path 1 - Ollama (CLI)

ollama pull qwen3.6:27b           # dense Q4_K_M, 16 GB
ollama pull qwen3.6                # 35B MoE Q4_K_M, 24 GB
ollama pull qwen3.6:35b-a3b-coding-nvfp4   # coding NVFP4
ollama run qwen3.6:27b

Installation Path 2 - Locally Uncensored (GUI)

If you want a one-click experience plus chat, agent mode, image generation, and a/b model compare in the same window:

Download the v2.4.0 installer for your OS
First-launch wizard auto-detects Ollama (or offers one-click install)
Model Manager > Discover > Text > search Qwen 3.6
Click the download arrow on the variant matching your VRAM

Performance on RTX 3060 12 GB

Tested with Qwen 3.6 27B Q3_K_M, 4096-token context, fp16 KV cache:

Workload	tok/s
Cold first response	~3 (model load)
Warm chat (50-token answers)	22-26
Long-form (1000 tokens)	18-20
Thinking-mode enabled	15-18

Vision Support

Both 27B dense and 35B MoE accept image input. Drag-and-drop a screenshot, photo, or chart. VRAM cost for vision is +1-2 GB on top of the base model.

Coding Performance

The 35B MoE coding-specialised variants are tuned on SWE-bench training data. The coding NVFP4 variant scores in the same ballpark as Claude 3.5 Sonnet on SWE-bench-verified at a fraction of the inference cost.

For day-to-day coding inside LU's Codex agent, the 27B dense Q4_K_M is the better default - consistent quality, no MoE-routing variance.

Qwen 3.6 vs Qwen 3.5

Feature	Qwen 3.5	Qwen 3.6
Vision	No	Yes (both 27B and 35B)
Context window	128K	256K
Thinking mode	QwQ-only	Preserved across variants
Coding-specific MoE	No	Yes (35B-a3b-coding)
NVFP4 quant	No	Yes (35B MoE)
MLX variant for Apple Silicon	No	Yes

Locally Uncensored is AGPL-3.0 licensed. Built by PurpleDoubleD. Bug reports on GitHub Discussions or in the Discord.

DEV Community