Jovan Chan

Posted on Jun 23 • Originally published at aicoderscope.com

WSL 3 for AI Coding on Windows 2026: GPU Passthrough, Claude Code, Aider, and Cline Without Dual-Booting

#wsl #localllm #ollama #claudecode

This article was originally published on aicoderscope.com

TL;DR: WSL 3, previewed at Microsoft Build 2026 on June 2, swaps WSL 2's heavy Hyper-V virtual machine for a lightweight paravirtualized layer that puts Linux GPU and NPU workloads within 3–5% of bare-metal Linux speed. That matters for running Claude Code, Aider, Cline, and local Ollama models on Windows without dual-booting. The catch: the preview is locked to Copilot+ PCs with Qualcomm Snapdragon X Elite, Intel Meteor Lake, and Lunar Lake NPUs — AMD and most discrete NVIDIA desktop setups aren't on the launch list, and WSL 2 already does NVIDIA CUDA passthrough today.

What you'll be able to do after this guide:

Decide whether WSL 3 is worth chasing now or whether your existing WSL 2 setup already does the job
Run a Linux-native AI coding stack (Claude Code, Aider, Cline + Ollama) on Windows with GPU acceleration
Avoid the two traps that bite developers moving their coding agents into WSL

	WSL 2 (today)	WSL 3 (preview)	Dual-boot Linux
GPU passthrough	NVIDIA CUDA works; some virtualization overhead	Near-native, ~3–5% overhead; adds NPU	Native, zero overhead
Hardware at launch	Any WSL2-capable PC	Copilot+ PCs: Snapdragon X Elite, Intel Meteor/Lunar Lake	Any
Setup friction	`wsl --install`, one reboot	Windows Insider preview build required	Partition, bootloader, driver pain
Best for	Most devs running Ollama/CUDA on Windows now	Copilot+ laptops doing local NPU + GPU AI	Squeezing every last token/sec out of a GPU

Honest take: If you already run an RTX desktop with WSL 2, your CUDA-backed Aider and Cline setup is fine — stay put. WSL 3 is the real upgrade for Copilot+ laptop owners who want their NPU and GPU available to Linux coding agents without dual-booting. Treat it as preview, not production.

Why this is news at all (WSL 2 already runs CUDA)

Worth clearing up first, because the headlines oversell it: WSL 2 has supported NVIDIA CUDA GPU passthrough since 2020. If you have an RTX card and run ollama run qwen2.5-coder:14b inside Ubuntu on WSL 2 today, it already uses your GPU. The driver bridge (/usr/lib/wsl/lib, the dxg kernel interface) has been shipping for years. So WSL 3 is not "finally GPU on Windows Linux." That existed.

What WSL 3 actually changes, per Microsoft's Build 2026 preview, is two things. First, it replaces the full Hyper-V VM that backs WSL 2 with a lighter paravirtualized hardware access model, dropping the virtualization tax so CUDA, DirectML, ROCm, ONNX Runtime, and OpenVINO workloads land within 3–5% of a native Linux install. Second — and this is the genuinely new capability — it exposes the NPU to Linux, not just the GPU. On a Copilot+ laptop, that means PyTorch, JAX, llama.cpp, and Ollama running inside Linux can finally reach the neural accelerator that was previously Windows-only.

For an AI coding workflow, the practical read is narrow but real: if you code on a Snapdragon X Elite or Intel Lunar Lake laptop and want a Linux-native agent stack hitting local models on-device, WSL 3 is the first time that's near-native. For everyone on a desktop NVIDIA box, the win is a few percent of overhead you probably won't notice.

What WSL 3 supports at launch

The preview is restricted. From Microsoft's Build 2026 announcement and the coverage that followed:

Hardware: Copilot+ PCs (they have an NPU) on Qualcomm Snapdragon X Elite, Intel Meteor Lake, and Intel Lunar Lake. AMD support comes later. Discrete NVIDIA desktop GPUs were not called out as part of the launch passthrough story — the headline demo was about Copilot+ NPUs and integrated graphics.
Frameworks: CUDA, ROCm, DirectML, ONNX Runtime, and OpenVINO are all listed as supported acceleration paths. DirectML is the layer doing the heavy lifting for NPU and integrated-GPU access.
AI tools demonstrated: Ollama, PyTorch, llama.cpp, and JAX running inside WSL with near-native acceleration.
Distribution: preview ships through the Windows Insider Program first; Microsoft said it plans to push WSL 3 through Windows Update later, but gave no GA date.

That last point is the one to internalize. As of June 16, 2026, WSL 3 is a Windows Insider preview with no general-availability date. If your machine isn't a recent Copilot+ device, you cannot run it yet regardless of how good your GPU is.

Getting on the WSL 3 preview

There's no wsl --upgrade-to-3 button. The path is the Windows Insider Program:

# Check what you're on today (run in PowerShell or inside WSL)
wsl --version

# Example output on a current WSL 2 install:
# WSL version: 2.6.1.0
# Kernel version: 6.6.87.2-1
# WSLg version: 1.0.66
# Windows version: 10.0.26200.8728

To get the preview build:

Enroll the machine in the Windows Insider Program (Settings → Windows Update → Windows Insider Program) and pick the channel carrying the WSL 3 preview. Microsoft typically lands this kind of feature in the Dev or Canary channel first.
Install the flagged build and reboot.
After the preview lands, wsl --version reports a 3.x WSL version on supported hardware.

If you're not on a Snapdragon X Elite, Meteor Lake, or Lunar Lake machine, stop here — the preview won't activate the new passthrough path, and you gain nothing over WSL 2.

The AI coding stack that benefits

Once you have a GPU- or NPU-accelerated Linux environment, the coding tools that gain the most are the Linux-native and CLI-first ones. Here's what to install and why each one cares about WSL.

Claude Code runs cleanly in any Linux shell, so WSL has always been a reasonable home for it. The agent itself calls Anthropic's API, so the GPU isn't doing inference — but if you pair Claude Code with a local model gateway or run local tooling (linters, test suites, build steps) that the agent drives, the near-native I/O and compute of WSL 3 cut the friction. See our Claude Code review for what it's actually good at.

Aider is terminal-native and a natural WSL citizen. Point it at a local Ollama model and the GPU passthrough is what makes that usable:

# Inside WSL, with Ollama running locally
ollama pull qwen2.5-coder:14b
aider --model ollama/qwen2.5-coder:14b

# Aider confirms the model and shows the repo map:
# Aider v0.86.0
# Model: ollama/qwen2.5-coder:14b
# Git repo: .git with 142 files
# Repo-map: using 1024 tokens

If that ollama pull model runs on CPU instead of your accelerator, you'll feel it — single-digit tokens per second instead of usable speed. Our full Aider + Ollama setup guide covers model choice and the context-window trap.

Cline lives in VS Code, which on Windows talks to the WSL backend through the Remote-WSL extension. With a local model served from inside WSL (or LM Studio on the Windows side), Cline's agentic tool-calling runs against your accelerated Ollama instance. The Cline + LM Studio setup and Continue.dev + Ollama guide both apply directly.

The common thread: WSL 3 doesn't make the agent smarter. It makes the local inference the agent depends on fast enough to be worth using on a Windows machine — which is exactly the "local LLM + coding tool" combination most readers are searching for.

The problem I keep seeing: networking between WSL and the host

The single most common failure when wiring an editor-side agent (Cline, Continue.dev, Cursor) on Windows to a model server inside WSL isn't GPU — it's the network boundary. WSL gets its own virtual NIC, so localhost:11434 on the Windows side does not always reach Ollama running inside WSL.

The fix that works reliably: bind Ollama to all interfaces inside WSL, then point the editor at the WSL IP, not localhost.


bash
# Inside WSL — make Ollama listen on all interfaces
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

# Find the WSL IP to use f

DEV Community