Introduction
GitHub Copilot now supports Bring Your Own Key (BYOK), allowing developers to connect local or self-hosted AI models directly into VS Code.
This means you can run coding assistants locally using tools like Ollama (offline AI) without sending prompts to external providers.
- Architecture
- Step 0 - Prerequisites
- Step 1 - Install Ollama
- Step 2 - Pull a Model
- Step 3 - Test the Model
- Step 4 - Confirm API Endpoint
- Step 5 - Configure VS Code
- Step 6 - Select Your Model
- Step 7 - Remote Access (Optional)
- Step 8 - Configure Nginx Reverse Proxy (Optional)
- Step 9 - Testing
Architecture
VS Code
↓
GitHub Copilot Chat (BYOK)
↓
Ollama API
↓
Local LLM
Example:
VS Code → localhost:11434 → Qwen2.5-Coder
Step 0 - Prerequisites
- Ollama v0.18.3+
- VS Code 1.113+
- GitHub Copilot Chat extension 0.41.0+
Step 1 - Install Ollama
Linux
Official:
sudo curl -fsSL https://ollama.com/install.sh | sh
Reference: https://docs.ollama.com/integrations/vscode
Verify
ollama --version
Start service:
systemctl enable ollama --now
Check service:
systemctl status ollama
Step 2 - Pull a Model
Best Starting Models
| Model | Approx VRAM | Recommended Hardware | Notes |
|---|---|---|---|
| qwen2.5-coder:14b | 10–12GB | RTX 3090 / 4090 | Best balance |
| qwen2.5-coder:32b | 24GB+ | RTX 4090 / A5000 | Excellent coding performance |
| deepseek-coder-v2 | 24GB+ | RTX 4090 / A5000 | Strong reasoning |
| phi4 | CPU friendly | Modern x86 CPU | Lightweight |
| phi4-mini | CPU friendly (~3GB RAM free) | Modern x86 CPU | Lightweight |
Note: Hardware requirements vary depending on quantisation level and context size.
- LLMs can run on CPU-only systems, but response latency may increase significantly depending on model size and quantisation.
- For practical coding assistance, GPU acceleration is strongly recommended for models larger than 7B–14B parameters.
Step 3 - Test the Model
Run:
ollama pull phi4-mini
ollama run phi4-mini
You should get an interactive prompt.
Step 4 - Confirm API Endpoint
Ollama automatically exposes:
http://localhost:11434
Test:
curl http://localhost:11434/api/tags
You should see JSON listing models.
Step 5 - Configure VS Code
Open:
Ctrl+Shift+P
(macOS: Command+Shift+P)
Run:
Chat: Manage Language Models
Then:
Add Models → Ollama
NOTE: this section must show “tools”
Otherwise GitHub Copilot cannot select
VS Code should auto-detect:
http://localhost:11434
Or, if using a remote Ollama server (not run locally) - see step 8
https://ollama.internal.domain:443
Reference: https://docs.ollama.com/integrations/vscode
Step 6 - Select Your Model
Inside Copilot Chat (set session target) select Local:
- Open model picker
-
Choose (other models
Ollama):- qwen2.5-coder
- etc…
Now your prompts go to your local model.
Important Limitation
This currently applies mainly to:
- ✅ Chat
- ✅ Agent mode
- ✅ AI interactions
But NOT fully to:
- ❌ inline autocomplete
Microsoft documents this explicitly.
https://code.visualstudio.com/docs/copilot/customization/language-models
Homelab Setup
Small Setup
High-level overview
Mini PC / NUC
Debian Server
Ollama
Qwen2.5-Coder 14B
Tailscale
Step 7 - Remote Access (Optional)
If your inference server is elsewhere (not hosted locally):
Example:
http://ai-node:11434
or
https://ollama.internal.domain
Then configure Ollama accordingly.
SECURITY (VERY IMPORTANT)
Do NOT expose Ollama publicly.
Bad:
0.0.0.0:11434
without auth/firewall.
Use:
- Tailscale
- WireGuard
- reverse proxy auth
- firewall ACLs
There are already reports of exposed Ollama servers online.
Step 8 - Configure Nginx Reverse Proxy (Optional)
You can skip this step if testing
If you plan to access Ollama remotely, it is recommended to place it behind a reverse proxy rather than exposing the API directly.
VS Code
↓
HTTPS
↓
Nginx Reverse Proxy
↓
Ollama API
↓
Local Model
Install Nginx
Debian/Ubuntu:
sudo apt update
sudo apt install nginx -y
RHEL/AlmaLinux:
sudo dnf install nginx -y
Enable and start the service:
sudo systemctl enable --now nginx
Verify:
systemctl status nginx
Create Reverse Proxy Configuration
Create a new Nginx site configuration:
sudo vim /etc/nginx/conf.d/ollama.conf
Example configuration:
This step uses HTTP for testing (not recommended for production)
server {
listen 80;
server_name ollama.internal.domain;
location / {
proxy_pass http://127.0.0.1:11434;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Test configuration:
sudo nginx -t
Reload Nginx:
sudo systemctl reload nginx
Verify Reverse Proxy
Test locally:
curl http://localhost/api/tags
Or remotely (DNS must resolve for this to work):
curl http://ollama.internal.domain/api/tags
You should receive JSON output listing available models.
Step 9 - Testing
Inside VS Code Copilot Chat:
- Select the Ollama model
- Ensure the session target is set to
Local - Open a source file
- Highlight a small code block
- Test prompts such as:
Explain this function
or:
Suggest improvements
If successful:
- responses should come from the local model
- Ollama logs will show
/v1/chat/completions - no external provider API keys are required
Monitor logs:
journalctl -u ollama -f
Example successful request:
POST "/v1/chat/completions"
References
Official VS Code Docs
VS Code Language Models Documentation
Official Ollama VS Code Integration
Ollama VS Code Integration Docs
GitHub Copilot BYOK Docs
GitHub Copilot BYOK Documentation
VS Code BYOK Announcement
Expanding Model Choice in VS Code with BYOK


Top comments (0)