Run a powerful, private AI coding assistant on your laptop — completely offline.
No API keys. No monthly fees. No telemetry. Your code never leaves your machine.
What You'll Get
- Intelligent code generation, refactoring, debugging, and explanation
- Support for Python, JavaScript, TypeScript, Go, Rust, Java, C++, PHP, SQL, and more
- Works on airplanes, remote sites, air-gapped networks, or when internet is down
- Full privacy and zero cost after initial setup
Prerequisites
| Requirement | Minimum | Recommended |
|---|---|---|
| Operating System | macOS 12+, Windows 10/11, Linux | Latest macOS / Windows 11 |
| RAM | 8 GB | 16 GB+ |
| Disk Space | 6 GB | 10 GB+ |
| VS Code | Latest version | Latest version |
| GPU (Optional) | None | NVIDIA 6GB+ / Apple Silicon |
Step-by-Step Setup
Step 1: Install Ollama
macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
Windows: Download the installer from ollama.com/download
Verify:
ollama --version
Step 2: Download Qwen3 8B Model
ollama pull qwen3:8b
(≈5.2 GB – This is the only step that needs internet)
Step 3: Install Continue.dev in VS Code
- Open VS Code
- Go to Extensions (
Ctrl/Cmd + Shift + X) - Search for "Continue" (by Continue Dev, Inc. — blue spiral icon)
- Install it
Step 4: Configure Continue for Local Qwen3
Press Ctrl/Cmd + Shift + P → search "Continue: Open Config File"
Replace everything with:
{
"models": [
{
"title": "Qwen3-8B (Code)",
"provider": "ollama",
"model": "qwen3:8b",
"contextLength": 32768,
"completionOptions": {
"temperature": 0.2,
"maxTokens": 4096
}
}
],
"tabAutocompleteModel": {
"title": "Qwen3-8B Autocomplete",
"provider": "ollama",
"model": "qwen3:8b"
}
}
Save the file.
Step 5: Test It Offline
- Open Continue sidebar (
Ctrl/Cmd + Shift + L) - Select Qwen3-8B (Code)
- Turn off your internet completely
- Type in the chat:
"Write a fast Python function to validate email addresses"
If it responds, your offline setup is working perfectly.
Useful Daily Workflows
-
Code Review:
@Current File+ "Review this function for bugs, security issues and performance" -
Refactoring: Highlight code →
Ctrl/Cmd + Shift + I→ "Refactor with proper error handling and type hints" - Test Generation: "Write comprehensive pytest tests covering edge cases"
-
Faster responses: Set temperature to
0.1 -
Deeper analysis: Type
/thinkin the chat
Hardware Performance Guide
| Hardware | Tokens/sec | Experience |
|---|---|---|
| Apple M1/M2 (16GB) | 18–28 | Very Good |
| NVIDIA RTX 3060 / 4060 | 25–45 | Excellent |
| NVIDIA RTX 4090 | 50–80+ | Near Instant |
| CPU Only (8-core) | 2–6 | Usable |
Troubleshooting
| Issue | Solution |
|---|---|
| Model not appearing | Save config → Reload VS Code |
| Slow generation | Check GPU usage (nvidia-smi) |
| Ollama not running | Run ollama serve in terminal |
| Connection refused | Restart Ollama desktop app |
Why This Setup Matters in 2026
- Complete privacy for client or proprietary code
- Zero recurring costs
- True offline capability anywhere
- Full control over your AI tools
This is currently one of the strongest local AI coding setups available.
Originally published on mike.co.ke
Follow me for more practical WordPress, AI, and development guides.

Top comments (0)