Mike Kipruto

Posted on May 7

Offline Qwen3 AI Coding Setup for VS Code – No Internet, No Cost, Full Privacy

#ai #ollama #qwen #offline

Run a powerful, private AI coding assistant on your laptop — completely offline.

No API keys. No monthly fees. No telemetry. Your code never leaves your machine.

What You'll Get

Intelligent code generation, refactoring, debugging, and explanation
Support for Python, JavaScript, TypeScript, Go, Rust, Java, C++, PHP, SQL, and more
Works on airplanes, remote sites, air-gapped networks, or when internet is down
Full privacy and zero cost after initial setup

Prerequisites

Requirement	Minimum	Recommended
Operating System	macOS 12+, Windows 10/11, Linux	Latest macOS / Windows 11
RAM	8 GB	16 GB+
Disk Space	6 GB	10 GB+
VS Code	Latest version	Latest version
GPU (Optional)	None	NVIDIA 6GB+ / Apple Silicon

Step-by-Step Setup

Step 1: Install Ollama

macOS / Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer from ollama.com/download

Verify:

ollama --version

Step 2: Download Qwen3 8B Model

ollama pull qwen3:8b

(≈5.2 GB – This is the only step that needs internet)

Step 3: Install Continue.dev in VS Code

Open VS Code
Go to Extensions (Ctrl/Cmd + Shift + X)
Search for "Continue" (by Continue Dev, Inc. — blue spiral icon)
Install it

Step 4: Configure Continue for Local Qwen3

Press Ctrl/Cmd + Shift + P → search "Continue: Open Config File"

Replace everything with:

{
  "models": [
    {
      "title": "Qwen3-8B (Code)",
      "provider": "ollama",
      "model": "qwen3:8b",
      "contextLength": 32768,
      "completionOptions": {
        "temperature": 0.2,
        "maxTokens": 4096
      }
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen3-8B Autocomplete",
    "provider": "ollama",
    "model": "qwen3:8b"
  }
}

Save the file.

Step 5: Test It Offline

Open Continue sidebar (Ctrl/Cmd + Shift + L)
Select Qwen3-8B (Code)
Turn off your internet completely
Type in the chat: "Write a fast Python function to validate email addresses"

If it responds, your offline setup is working perfectly.

Useful Daily Workflows

Code Review: @Current File + "Review this function for bugs, security issues and performance"
Refactoring: Highlight code → Ctrl/Cmd + Shift + I → "Refactor with proper error handling and type hints"
Test Generation: "Write comprehensive pytest tests covering edge cases"
Faster responses: Set temperature to 0.1
Deeper analysis: Type /think in the chat

Hardware Performance Guide

Hardware	Tokens/sec	Experience
Apple M1/M2 (16GB)	18–28	Very Good
NVIDIA RTX 3060 / 4060	25–45	Excellent
NVIDIA RTX 4090	50–80+	Near Instant
CPU Only (8-core)	2–6	Usable

Troubleshooting

Issue	Solution
Model not appearing	Save config → Reload VS Code
Slow generation	Check GPU usage (`nvidia-smi`)
Ollama not running	Run `ollama serve` in terminal
Connection refused	Restart Ollama desktop app

Why This Setup Matters in 2026

Complete privacy for client or proprietary code
Zero recurring costs
True offline capability anywhere
Full control over your AI tools

This is currently one of the strongest local AI coding setups available.

Originally published on mike.co.ke

Follow me for more practical WordPress, AI, and development guides.

DEV Community