DEV Community

Cover image for Running Local AI (Self-hosted) Coding Assistants in VS Code with Ollama and GitHub Copilot
Zepher Ashe
Zepher Ashe

Posted on

Running Local AI (Self-hosted) Coding Assistants in VS Code with Ollama and GitHub Copilot

Introduction

GitHub Copilot now supports Bring Your Own Key (BYOK), allowing developers to connect local or self-hosted AI models directly into VS Code.

This means you can run coding assistants locally using tools like Ollama (offline AI) without sending prompts to external providers.


Architecture

VS Code
   ↓
GitHub Copilot Chat (BYOK)
   ↓
Ollama API
   ↓
Local LLM
Enter fullscreen mode Exit fullscreen mode

Example:

VS Code → localhost:11434 → Qwen2.5-Coder
Enter fullscreen mode Exit fullscreen mode

Step 0 - Prerequisites


Step 1 - Install Ollama

Linux

Official:

sudo curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Reference: https://docs.ollama.com/integrations/vscode


Verify

ollama --version
Enter fullscreen mode Exit fullscreen mode

Start service:

systemctl enable ollama --now
Enter fullscreen mode Exit fullscreen mode

Check service:

systemctl status ollama
Enter fullscreen mode Exit fullscreen mode

Step 2 - Pull a Model

Best Starting Models

Model Approx VRAM Recommended Hardware Notes
qwen2.5-coder:14b 10–12GB RTX 3090 / 4090 Best balance
qwen2.5-coder:32b 24GB+ RTX 4090 / A5000 Excellent coding performance
deepseek-coder-v2 24GB+ RTX 4090 / A5000 Strong reasoning
phi4 CPU friendly Modern x86 CPU Lightweight
phi4-mini CPU friendly (~3GB RAM free) Modern x86 CPU Lightweight

Note: Hardware requirements vary depending on quantisation level and context size.

  • LLMs can run on CPU-only systems, but response latency may increase significantly depending on model size and quantisation.
  • For practical coding assistance, GPU acceleration is strongly recommended for models larger than 7B–14B parameters.

Step 3 - Test the Model

Run:

ollama pull phi4-mini
Enter fullscreen mode Exit fullscreen mode
ollama run phi4-mini
Enter fullscreen mode Exit fullscreen mode

You should get an interactive prompt.


Step 4 - Confirm API Endpoint

Ollama automatically exposes:

http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

Test:

curl http://localhost:11434/api/tags
Enter fullscreen mode Exit fullscreen mode

You should see JSON listing models.


Step 5 - Configure VS Code

Open:

Ctrl+Shift+P 
(macOS: Command+Shift+P)
Enter fullscreen mode Exit fullscreen mode

Run:

Chat: Manage Language Models
Enter fullscreen mode Exit fullscreen mode

Then:

Add Models → Ollama
Enter fullscreen mode Exit fullscreen mode

NOTE: this section must show “tools”

Otherwise GitHub Copilot cannot select

VS Code should auto-detect:

http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

Or, if using a remote Ollama server (not run locally) - see step 8

https://ollama.internal.domain:443
Enter fullscreen mode Exit fullscreen mode

VS Code Language Models

Reference: https://docs.ollama.com/integrations/vscode


Step 6 - Select Your Model

Inside Copilot Chat (set session target) select Local:

  • Open model picker
  • Choose (other models Ollama):

    • qwen2.5-coder
    • etc…

Now your prompts go to your local model.

GitHub Copilot Models


Important Limitation

This currently applies mainly to:

  • ✅ Chat
  • ✅ Agent mode
  • ✅ AI interactions

But NOT fully to:

  • ❌ inline autocomplete

Microsoft documents this explicitly.

https://code.visualstudio.com/docs/copilot/customization/language-models


Homelab Setup

Small Setup

High-level overview

Mini PC / NUC
Debian Server
Ollama
Qwen2.5-Coder 14B
Tailscale
Enter fullscreen mode Exit fullscreen mode


Step 7 - Remote Access (Optional)

If your inference server is elsewhere (not hosted locally):

Example:

http://ai-node:11434
Enter fullscreen mode Exit fullscreen mode

or

https://ollama.internal.domain
Enter fullscreen mode Exit fullscreen mode

Then configure Ollama accordingly.


SECURITY (VERY IMPORTANT)

Do NOT expose Ollama publicly.

Bad:

0.0.0.0:11434
Enter fullscreen mode Exit fullscreen mode

without auth/firewall.

Use:

  • Tailscale
  • WireGuard
  • reverse proxy auth
  • firewall ACLs

There are already reports of exposed Ollama servers online.


Step 8 - Configure Nginx Reverse Proxy (Optional)

You can skip this step if testing

If you plan to access Ollama remotely, it is recommended to place it behind a reverse proxy rather than exposing the API directly.

VS Code
   ↓
HTTPS
   ↓
Nginx Reverse Proxy
   ↓
Ollama API
   ↓
Local Model
Enter fullscreen mode Exit fullscreen mode

Install Nginx

Debian/Ubuntu:

sudo apt update
sudo apt install nginx -y
Enter fullscreen mode Exit fullscreen mode

RHEL/AlmaLinux:

sudo dnf install nginx -y
Enter fullscreen mode Exit fullscreen mode

Enable and start the service:

sudo systemctl enable --now nginx
Enter fullscreen mode Exit fullscreen mode

Verify:

systemctl status nginx
Enter fullscreen mode Exit fullscreen mode

Create Reverse Proxy Configuration

Create a new Nginx site configuration:

sudo vim /etc/nginx/conf.d/ollama.conf
Enter fullscreen mode Exit fullscreen mode

Example configuration:

This step uses HTTP for testing (not recommended for production)

server {
    listen 80;
    server_name ollama.internal.domain;

    location / {
        proxy_pass http://127.0.0.1:11434;

        proxy_http_version 1.1;

        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
Enter fullscreen mode Exit fullscreen mode

Test configuration:

sudo nginx -t
Enter fullscreen mode Exit fullscreen mode

Reload Nginx:

sudo systemctl reload nginx
Enter fullscreen mode Exit fullscreen mode

Verify Reverse Proxy

Test locally:

curl http://localhost/api/tags
Enter fullscreen mode Exit fullscreen mode

Or remotely (DNS must resolve for this to work):

curl http://ollama.internal.domain/api/tags
Enter fullscreen mode Exit fullscreen mode

You should receive JSON output listing available models.


Step 9 - Testing

Inside VS Code Copilot Chat:

  1. Select the Ollama model
  2. Ensure the session target is set to Local
  3. Open a source file
  4. Highlight a small code block
  5. Test prompts such as:
Explain this function
Enter fullscreen mode Exit fullscreen mode

or:

Suggest improvements
Enter fullscreen mode Exit fullscreen mode

If successful:

  • responses should come from the local model
  • Ollama logs will show /v1/chat/completions
  • no external provider API keys are required

Monitor logs:

journalctl -u ollama -f
Enter fullscreen mode Exit fullscreen mode

Example successful request:

POST "/v1/chat/completions"
Enter fullscreen mode Exit fullscreen mode

References

Official VS Code Docs

VS Code Language Models Documentation

Official Ollama VS Code Integration

Ollama VS Code Integration Docs

GitHub Copilot BYOK Docs

GitHub Copilot BYOK Documentation

VS Code BYOK Announcement

Expanding Model Choice in VS Code with BYOK

vLLM

vLLM GitHub Repository

Ollama

https://ollama.com


Top comments (0)