Zepher Ashe

Posted on May 14

Running Local AI (Self-hosted) Coding Assistants in VS Code with Ollama and GitHub Copilot

#githubcopilot #ai #programming #vscode

Introduction

GitHub Copilot now supports Bring Your Own Key (BYOK), allowing developers to connect local or self-hosted AI models directly into VS Code.

This means you can run coding assistants locally using tools like Ollama (offline AI) without sending prompts to external providers.

Architecture
Step 0 - Prerequisites
Step 1 - Install Ollama
Step 2 - Pull a Model
Step 3 - Test the Model
Step 4 - Confirm API Endpoint
Step 5 - Configure VS Code
Step 6 - Select Your Model
Step 7 - Remote Access (Optional)
Step 8 - Configure Nginx Reverse Proxy (Optional)
Step 9 - Testing

Architecture

VS Code
   ↓
GitHub Copilot Chat (BYOK)
   ↓
Ollama API
   ↓
Local LLM

Example:

VS Code → localhost:11434 → Qwen2.5-Coder

Step 0 - Prerequisites

Step 1 - Install Ollama

Linux

Official:

sudo curl -fsSL https://ollama.com/install.sh | sh

Reference: https://docs.ollama.com/integrations/vscode

Verify

ollama --version

Start service:

systemctl enable ollama --now

Check service:

systemctl status ollama

Step 2 - Pull a Model

Best Starting Models

Model	Approx VRAM	Recommended Hardware	Notes
qwen2.5-coder:14b	10–12GB	RTX 3090 / 4090	Best balance
qwen2.5-coder:32b	24GB+	RTX 4090 / A5000	Excellent coding performance
deepseek-coder-v2	24GB+	RTX 4090 / A5000	Strong reasoning
phi4	CPU friendly	Modern x86 CPU	Lightweight
phi4-mini	CPU friendly (~3GB RAM free)	Modern x86 CPU	Lightweight

Note: Hardware requirements vary depending on quantisation level and context size.

LLMs can run on CPU-only systems, but response latency may increase significantly depending on model size and quantisation.

For practical coding assistance, GPU acceleration is strongly recommended for models larger than 7B–14B parameters.

Step 3 - Test the Model

Run:

ollama pull phi4-mini

ollama run phi4-mini

You should get an interactive prompt.

Step 4 - Confirm API Endpoint

Ollama automatically exposes:

http://localhost:11434

Test:

curl http://localhost:11434/api/tags

You should see JSON listing models.

Step 5 - Configure VS Code

Open:

Ctrl+Shift+P 
(macOS: Command+Shift+P)

Run:

Chat: Manage Language Models

Then:

Add Models → Ollama

NOTE: this section must show “tools”

Otherwise GitHub Copilot cannot select

VS Code should auto-detect:

http://localhost:11434

Or, if using a remote Ollama server (not run locally) - see step 8

https://ollama.internal.domain:443

Reference: https://docs.ollama.com/integrations/vscode

Step 6 - Select Your Model

Inside Copilot Chat (set session target) select Local:

Open model picker
Choose (other models Ollama):
- qwen2.5-coder
- etc…

Now your prompts go to your local model.

Important Limitation

This currently applies mainly to:

✅ Chat
✅ Agent mode
✅ AI interactions

But NOT fully to:

❌ inline autocomplete

Microsoft documents this explicitly.

https://code.visualstudio.com/docs/copilot/customization/language-models

Homelab Setup

Small Setup

High-level overview

Mini PC / NUC
Debian Server
Ollama
Qwen2.5-Coder 14B
Tailscale

Step 7 - Remote Access (Optional)

If your inference server is elsewhere (not hosted locally):

Example:

http://ai-node:11434

https://ollama.internal.domain

Then configure Ollama accordingly.

SECURITY (VERY IMPORTANT)

Do NOT expose Ollama publicly.

Bad:

0.0.0.0:11434

without auth/firewall.

Use:

Tailscale
WireGuard
reverse proxy auth
firewall ACLs

There are already reports of exposed Ollama servers online.

Step 8 - Configure Nginx Reverse Proxy (Optional)

You can skip this step if testing

If you plan to access Ollama remotely, it is recommended to place it behind a reverse proxy rather than exposing the API directly.

VS Code
   ↓
HTTPS
   ↓
Nginx Reverse Proxy
   ↓
Ollama API
   ↓
Local Model

Install Nginx

Debian/Ubuntu:

sudo apt update
sudo apt install nginx -y

RHEL/AlmaLinux:

sudo dnf install nginx -y

Enable and start the service:

sudo systemctl enable --now nginx

Verify:

systemctl status nginx

Create Reverse Proxy Configuration

Create a new Nginx site configuration:

sudo vim /etc/nginx/conf.d/ollama.conf

Example configuration:

This step uses HTTP for testing (not recommended for production)

server {
    listen 80;
    server_name ollama.internal.domain;

    location / {
        proxy_pass http://127.0.0.1:11434;

        proxy_http_version 1.1;

        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Test configuration:

sudo nginx -t

Reload Nginx:

sudo systemctl reload nginx

Verify Reverse Proxy

Test locally:

curl http://localhost/api/tags

Or remotely (DNS must resolve for this to work):

curl http://ollama.internal.domain/api/tags

You should receive JSON output listing available models.

Step 9 - Testing

Inside VS Code Copilot Chat:

Select the Ollama model
Ensure the session target is set to Local
Open a source file
Highlight a small code block
Test prompts such as:

Explain this function

or:

Suggest improvements

If successful:

responses should come from the local model
Ollama logs will show /v1/chat/completions
no external provider API keys are required

Monitor logs:

journalctl -u ollama -f

Example successful request:

POST "/v1/chat/completions"

References

DEV Community