You're paying $19/month for GitHub Copilot. Your code is leaving your machine, hitting someone else's servers, and coming back as suggestions. It works. But you could also run the same workflow locally — for free, with full privacy, on hardware you probably already own.
This guide sets up Continue.dev with Ollama so you get AI code completion, chat, and refactoring directly in VS Code — no API keys, no subscriptions, no data leaving your network.
What You Need
- A machine with 16GB+ RAM (Mac mini M-series is ideal, but any modern desktop works)
- VS Code or a fork (Cursor users: you already have this built in, but keep reading for the self-hosted angle)
- Docker (optional, for running Ollama in a container)
- 10 minutes
Step 1: Install Ollama
Ollama makes running local LLMs trivially simple. One binary, one command.
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or with Docker
docker run -d --name ollama \
-p 11434:11434 \
-v ollama_data:/root/.ollama \
ollama/ollama:latest
Pull a model that's good at code. For 16GB machines, deepseek-coder-v2:16b hits the sweet spot between quality and speed:
# Primary code model
ollama pull deepseek-coder-v2:16b
# Smaller alternative if RAM is tight
ollama pull codellama:7b
# For chat/explanations (optional)
ollama pull llama3.1:8b
Verify it works:
ollama run deepseek-coder-v2:16b "Write a Python function that retries HTTP requests with exponential backoff"
You should get a working function back in a few seconds.
Step 2: Install Continue.dev
Continue is an open-source AI code assistant that plugs into VS Code. It supports any OpenAI-compatible API — which Ollama exposes out of the box.
- Open VS Code
- Extensions → Search "Continue" → Install
- You'll see a new sidebar icon (the Continue logo)
Or from the terminal:
code --install-extension continue.continue
Step 3: Configure Continue for Ollama
Open Continue's config. Press Cmd+Shift+P (or Ctrl+Shift+P) → "Continue: Open Config File". Replace or merge with:
{
"models": [
{
"title": "DeepSeek Coder v2 (Local)",
"provider": "ollama",
"model": "deepseek-coder-v2:16b",
"apiBase": "http://localhost:11434"
},
{
"title": "Llama 3.1 Chat (Local)",
"provider": "ollama",
"model": "llama3.1:8b",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "DeepSeek Autocomplete",
"provider": "ollama",
"model": "deepseek-coder-v2:16b",
"apiBase": "http://localhost:11434"
},
"allowAnonymousTelemetry": false
}
The key settings:
- models: What shows up in Continue's chat sidebar. You can add multiple and switch between them.
- tabAutocompleteModel: The model that powers inline completions (the Copilot-like experience).
- allowAnonymousTelemetry: Disable it. You're self-hosting for privacy — act like it.
Step 4: Use It Like Copilot
Once configured, you get three workflows:
Inline Completions
Just type. Continue will suggest completions as you code, exactly like Copilot. Press Tab to accept.
Chat
Open the Continue sidebar and ask questions about your code:
Explain what this function does and suggest improvements
You can highlight code, right-click → "Continue: Add to Chat" to give it context.
Refactoring
Highlight a block of code, press Cmd+I, and type what you want:
Refactor this to use async/await instead of callbacks
Continue rewrites the selection in-place with a diff view.
Step 5: Serve It Across Your Network
If Ollama runs on a dedicated machine (like a Mac mini), expose it to your local network:
# Set Ollama to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve
Or in Docker:
docker run -d --name ollama \
-p 0.0.0.0:11434:11434 \
-v ollama_data:/root/.ollama \
ollama/ollama:latest
Then update Continue's config on your other machines to point at the server:
"apiBase": "http://192.168.1.100:11434"
Now every machine in your house gets AI code assistance, powered by one box. No internet required.
Performance Tips
Model selection matters more than hardware. A well-quantized 16B model on Apple Silicon will outperform a 70B model struggling on insufficient VRAM.
# Check what's loaded and how much memory it uses
ollama ps
# Pre-load your coding model so first completion is fast
curl http://localhost:11434/api/generate \
-d '{"model": "deepseek-coder-v2:16b", "keep_alive": "24h"}'
Keep the model warm. By default, Ollama unloads models after 5 minutes of inactivity. The keep_alive parameter above keeps it in memory for 24 hours — instant completions all day.
Use different models for different tasks. Fast 7B model for autocomplete, larger 16B for chat and complex refactoring. Continue lets you configure this separately.
How Does It Compare to Copilot?
Honestly? For autocomplete, Copilot is still faster and more accurate — GitHub has the training data advantage. But the gap has narrowed dramatically. DeepSeek Coder v2 handles 80-90% of what Copilot suggests, and for the remaining 10%, the chat workflow compensates.
What you gain:
- Zero data leaves your machine. Your proprietary code stays private.
- No subscription. One-time hardware cost, then it's free forever.
- Works offline. Airplane mode? Still coding with AI.
- No rate limits. Hit it as hard as you want.
- Full control. Swap models, tweak parameters, add custom prompts.
What's Next
Once this is running, you can extend it:
- Add context providers in Continue to index your codebase (docs, Git history, file tree)
- Set up a reverse proxy with auth if you want to expose it outside your LAN
- Try Qwen2.5-Coder — another strong option that's gaining ground fast
- Connect the same Ollama instance to other tools (n8n, Open WebUI, your own scripts)
The AI code assistant market is a $19/month tax on developers who don't know they can run the same thing locally. Now you know. Set it up once, and every keystroke stays yours.
Building things that run on your hardware? Follow @signal-weekly for practical homelab + AI guides every week.
Top comments (0)