DEV Community

SIGNAL
SIGNAL

Posted on

Self-Host Your AI Code Assistant With Continue.dev + Ollama — VS Code Copilot Without the Subscription

You're paying $19/month for GitHub Copilot. Your code is leaving your machine, hitting someone else's servers, and coming back as suggestions. It works. But you could also run the same workflow locally — for free, with full privacy, on hardware you probably already own.

This guide sets up Continue.dev with Ollama so you get AI code completion, chat, and refactoring directly in VS Code — no API keys, no subscriptions, no data leaving your network.

What You Need

  • A machine with 16GB+ RAM (Mac mini M-series is ideal, but any modern desktop works)
  • VS Code or a fork (Cursor users: you already have this built in, but keep reading for the self-hosted angle)
  • Docker (optional, for running Ollama in a container)
  • 10 minutes

Step 1: Install Ollama

Ollama makes running local LLMs trivially simple. One binary, one command.

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or with Docker
docker run -d --name ollama \
  -p 11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama:latest
Enter fullscreen mode Exit fullscreen mode

Pull a model that's good at code. For 16GB machines, deepseek-coder-v2:16b hits the sweet spot between quality and speed:

# Primary code model
ollama pull deepseek-coder-v2:16b

# Smaller alternative if RAM is tight
ollama pull codellama:7b

# For chat/explanations (optional)
ollama pull llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

Verify it works:

ollama run deepseek-coder-v2:16b "Write a Python function that retries HTTP requests with exponential backoff"
Enter fullscreen mode Exit fullscreen mode

You should get a working function back in a few seconds.

Step 2: Install Continue.dev

Continue is an open-source AI code assistant that plugs into VS Code. It supports any OpenAI-compatible API — which Ollama exposes out of the box.

  1. Open VS Code
  2. Extensions → Search "Continue" → Install
  3. You'll see a new sidebar icon (the Continue logo)

Or from the terminal:

code --install-extension continue.continue
Enter fullscreen mode Exit fullscreen mode

Step 3: Configure Continue for Ollama

Open Continue's config. Press Cmd+Shift+P (or Ctrl+Shift+P) → "Continue: Open Config File". Replace or merge with:

{
  "models": [
    {
      "title": "DeepSeek Coder v2 (Local)",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b",
      "apiBase": "http://localhost:11434"
    },
    {
      "title": "Llama 3.1 Chat (Local)",
      "provider": "ollama",
      "model": "llama3.1:8b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek Autocomplete",
    "provider": "ollama",
    "model": "deepseek-coder-v2:16b",
    "apiBase": "http://localhost:11434"
  },
  "allowAnonymousTelemetry": false
}
Enter fullscreen mode Exit fullscreen mode

The key settings:

  • models: What shows up in Continue's chat sidebar. You can add multiple and switch between them.
  • tabAutocompleteModel: The model that powers inline completions (the Copilot-like experience).
  • allowAnonymousTelemetry: Disable it. You're self-hosting for privacy — act like it.

Step 4: Use It Like Copilot

Once configured, you get three workflows:

Inline Completions

Just type. Continue will suggest completions as you code, exactly like Copilot. Press Tab to accept.

Chat

Open the Continue sidebar and ask questions about your code:

Explain what this function does and suggest improvements
Enter fullscreen mode Exit fullscreen mode

You can highlight code, right-click → "Continue: Add to Chat" to give it context.

Refactoring

Highlight a block of code, press Cmd+I, and type what you want:

Refactor this to use async/await instead of callbacks
Enter fullscreen mode Exit fullscreen mode

Continue rewrites the selection in-place with a diff view.

Step 5: Serve It Across Your Network

If Ollama runs on a dedicated machine (like a Mac mini), expose it to your local network:

# Set Ollama to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve
Enter fullscreen mode Exit fullscreen mode

Or in Docker:

docker run -d --name ollama \
  -p 0.0.0.0:11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama:latest
Enter fullscreen mode Exit fullscreen mode

Then update Continue's config on your other machines to point at the server:

"apiBase": "http://192.168.1.100:11434"
Enter fullscreen mode Exit fullscreen mode

Now every machine in your house gets AI code assistance, powered by one box. No internet required.

Performance Tips

Model selection matters more than hardware. A well-quantized 16B model on Apple Silicon will outperform a 70B model struggling on insufficient VRAM.

# Check what's loaded and how much memory it uses
ollama ps

# Pre-load your coding model so first completion is fast
curl http://localhost:11434/api/generate \
  -d '{"model": "deepseek-coder-v2:16b", "keep_alive": "24h"}'
Enter fullscreen mode Exit fullscreen mode

Keep the model warm. By default, Ollama unloads models after 5 minutes of inactivity. The keep_alive parameter above keeps it in memory for 24 hours — instant completions all day.

Use different models for different tasks. Fast 7B model for autocomplete, larger 16B for chat and complex refactoring. Continue lets you configure this separately.

How Does It Compare to Copilot?

Honestly? For autocomplete, Copilot is still faster and more accurate — GitHub has the training data advantage. But the gap has narrowed dramatically. DeepSeek Coder v2 handles 80-90% of what Copilot suggests, and for the remaining 10%, the chat workflow compensates.

What you gain:

  • Zero data leaves your machine. Your proprietary code stays private.
  • No subscription. One-time hardware cost, then it's free forever.
  • Works offline. Airplane mode? Still coding with AI.
  • No rate limits. Hit it as hard as you want.
  • Full control. Swap models, tweak parameters, add custom prompts.

What's Next

Once this is running, you can extend it:

  • Add context providers in Continue to index your codebase (docs, Git history, file tree)
  • Set up a reverse proxy with auth if you want to expose it outside your LAN
  • Try Qwen2.5-Coder — another strong option that's gaining ground fast
  • Connect the same Ollama instance to other tools (n8n, Open WebUI, your own scripts)

The AI code assistant market is a $19/month tax on developers who don't know they can run the same thing locally. Now you know. Set it up once, and every keystroke stays yours.

Building things that run on your hardware? Follow @signal-weekly for practical homelab + AI guides every week.

Top comments (0)