DEV Community

EveryLocalAI
EveryLocalAI

Posted on

Run Qwen3.6-27B Locally: The Most Capable Open Model for a Single GPU

Run Qwen3.6-27B Locally: The Most Capable Open Model for a Single GPU

Qwen3.6-27B is a dense 27-billion parameter model from Alibaba that scores 77.2% on SWE-bench Verified — matching closed-source models like Claude Sonnet 4.5 on real-world coding tasks. It ships under Apache 2.0 license with native vision support, 262K context window, and hybrid thinking mode.

Paired with Ollama for one-command serving and Open WebUI for a ChatGPT-like interface, this stack gives you a private AI assistant that rivals cloud services with no monthly fee.

What makes Qwen3.6-27B special

  • Vision understanding — baked-in vision encoder, upload images and ask about them
  • 262K context window — entire codebases or long documents in one pass
  • Hybrid thinking — shows reasoning before answering, skip with /no_think
  • 77.2% SWE-bench — competes with Sonnet 4.5 on real PRs
  • Apache 2.0 license — free for any use

Hardware requirements

Quantization VRAM needed Hardware
Q4_K_M 16-18 GB RTX 3090, RTX 4070 Ti Super, Mac 24GB+
Q8_0 28 GB RTX 4090, Mac 32GB+
BF16 54 GB 2x RTX 4090, A100

The Q4_K_M sweet spot fits a single RTX 3090 (24GB, ~$750 used). On Mac, you need 24GB+ unified memory.

One-command setup with Ollama

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull Qwen3.6-27B (auto-selects Q4 for your hardware)
ollama pull qwen3.6:27b

# Run it
ollama run qwen3.6:27b
Enter fullscreen mode Exit fullscreen mode

That's it. Hybrid thinking is on by default — the model shows reasoning before answering. Use /no_think for faster responses.

Add a chat UI with Open WebUI

Run Open WebUI alongside Ollama for a polished ChatGPT experience:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
Enter fullscreen mode Exit fullscreen mode

Performance on consumer GPUs

Hardware Q4 speed Q8 speed
RTX 3090 (24GB) 25-35 tok/s 15-20 tok/s
RTX 4070 Ti Super (16GB) 10-15 tok/s
Mac M4 Max (48GB) 20-30 tok/s 12-18 tok/s
Mac M2 Pro (24GB) 10-15 tok/s

Cost vs cloud

Local: $0/month + $750 for RTX 3090. Claude Sonnet: $20/month + per-token charges. The GPU pays for itself in ~8 months of heavy API use. Plus complete privacy and no rate limits.


Originally published on everylocalai.com

Top comments (0)