How to Use Any OAI-Compatible API with GitHub Copilot — Custom Model Setup Guide
TL;DR — GitHub Copilot now lets you point Chat (VS Code) and the Copilot CLI at any OpenAI-compatible endpoint. In VS Code, run Chat: Manage Language Models, pick the OpenAI Compatible provider, paste a base URL plus key. In Copilot CLI, export COPILOT_PROVIDER_BASE_URL, COPILOT_PROVIDER_API_KEY, and COPILOT_MODEL. Inline completions are unaffected — they still run on Copilot's own infra.
You don't need to leave Copilot to escape Copilot's model menu. Twenty seconds of env vars and your copilot CLI is talking to Claude Opus 4.6, GPT-5.4, or a local vLLM box — billed by the provider, not your Copilot quota.
What BYOK actually does in Copilot
BYOK (Bring Your Own Key) lets the Chat surface and the agent CLI use a model you authenticate to directly, instead of going through GitHub's hosted model pool. The wiring is narrow on purpose:
| Surface | BYOK supported? | Billing |
|---|---|---|
| VS Code Chat / Agent mode | Yes | Your provider |
| Copilot CLI | Yes | Your provider |
| Inline code completions | No | Copilot subscription |
| Pull request summaries, code review | No | Copilot subscription |
The split exists because completions need single-digit-millisecond latency budgets that arbitrary endpoints can't promise. Chat and agents tolerate the round trip, so they got opened up first.
Setup in VS Code
The path was announced in October 2025 and has since landed in the stable channel for several providers (GA was confirmed in the April 2026 GitHub changelog). For the generic OpenAI-compatible flow:
- Open the Command Palette → Chat: Manage Language Models.
- Pick OpenAI Compatible from the provider list.
- Fill in the Base URL (must serve
/chat/completions), the API key, and a Model ID that the provider exposes. - Hit Add Model. The model now appears in the Copilot Chat model dropdown.
There are two JSON shapes worth knowing about. The legacy github.copilot.chat.customOAIModels object in settings.json still works in stable releases but is marked deprecated:
"github.copilot.chat.customOAIModels": {
"anthropic/claude-opus-4.6": {
"name": "Claude Opus 4.6 (via ofox)",
"url": "https://api.ofox.ai/v1/chat/completions",
"toolCalling": true,
"vision": true,
"maxInputTokens": 200000,
"maxOutputTokens": 16000
}
}
The replacement (currently Insiders-only) is the chatLanguageModels.json workspace file using the customendpoint vendor — note the array shape and the apiType selector that picks between OpenAI's chat-completions, OpenAI's responses, and Anthropic's messages protocol:
[
{
"name": "ofox.ai",
"vendor": "customendpoint",
"apiKey": "${OFOX_API_KEY}",
"apiType": "chat-completions",
"models": [
{
"id": "anthropic/claude-opus-4.6",
"name": "Claude Opus 4.6",
"url": "https://api.ofox.ai/v1/chat/completions",
"toolCalling": true,
"vision": true,
"maxInputTokens": 200000,
"maxOutputTokens": 16000
}
]
}
]
Capability flags (toolCalling, vision) matter. If the agent thinks the model doesn't support tools, it silently falls back to plain chat and your custom commands never fire.
Setup in Copilot CLI
The CLI's BYOK docs are the cleanest reference. Three environment variables, exported before launching copilot:
export COPILOT_PROVIDER_BASE_URL=https://api.ofox.ai/v1
export COPILOT_PROVIDER_API_KEY=$OFOX_API_KEY
export COPILOT_MODEL=anthropic/claude-opus-4.6
copilot
For a local Ollama box, drop the key entirely:
export COPILOT_PROVIDER_BASE_URL=http://localhost:11434
export COPILOT_MODEL=qwen2.5-coder:14b
copilot
The CLI talks the OpenAI Chat Completions protocol against whatever you point it at. If /v1/chat/completions resolves and the model ID is valid on that endpoint, it works.
Worked example: ofox.ai as the endpoint
ofox.ai is a gateway that exposes Anthropic, Google, Alibaba and Moonshot models behind the OpenAI Chat Completions schema — useful for Copilot BYOK because you get Claude or Gemini in the Chat dropdown without juggling three SDKs. The base URL is https://api.ofox.ai/v1 and the auth header is a standard Authorization: Bearer <key>.
A typical model ID set to expose to Copilot:
Model ID (use as COPILOT_MODEL) |
What it is |
|---|---|
openai/gpt-5.4 |
GPT-5.4 (general-purpose OpenAI tier) |
anthropic/claude-opus-4.6 |
Claude Opus 4.6 |
google/gemini-3.1-pro-preview |
Gemini 3.1 Pro preview |
bailian/qwen3-max |
Qwen3-Max |
moonshotai/kimi-k2.6 |
Kimi K2.6 |
Smoke test before pointing Copilot at it:
curl https://api.ofox.ai/v1/chat/completions \
-H "Authorization: Bearer $OFOX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"anthropic/claude-opus-4.6","messages":[{"role":"user","content":"ping"}]}'
If that returns a choices[0].message.content, Copilot will connect. If it 404s on the model ID, fix the ID first — Copilot surfaces those errors as a generic "model unavailable" toast that masks the real cause. For deeper debugging of mismatched IDs and 404s, see Model Not Found errors troubleshooting.
For broader background on the gateway pattern — one key, many providers — see the OpenAI SDK migration guide and the pillar overview AI API aggregation: every model behind one endpoint.
Caveats worth knowing before you commit
- Authentication is static credentials only. BYOK accepts an API key or bearer token. There's no OAuth handshake, no service-account flow, no key rotation hook. Treat the key like any other long-lived secret — scope it, rotate it manually, and don't put it in a public repo.
- Telemetry still flows to GitHub. BYOK changes where the inference happens, not where the usage telemetry goes. Enterprise admins who needed a model migration for compliance reasons should re-read the data-handling docs before assuming BYOK is sufficient.
- Rate limits become yours to manage. Copilot's quota stops protecting you; if your provider rate-limits you, the Chat panel will just stall. Watch your provider dashboard for the first week.
- Code completions remain on Copilot. Repeating this because it's the #1 misunderstanding: BYOK does not replace the inline ghost-text completions. Those still hit GitHub's hosted models.
Comparing the IDE custom-API options
If you're choosing between Copilot BYOK and the equivalent feature in other editors, the surface area looks similar but the agent capabilities don't. The Cursor / Claude Code / Cline custom API setup guide walks the same exercise for those three. Short version: Copilot's BYOK is the cleanest in-editor flow (it's a UI form), Claude Code gives you the most agent power per dollar when paired with ANTHROPIC_BASE_URL, and Cursor sits in between.
Troubleshooting
"Failed to fetch model list" — Your base URL is missing /v1 or your endpoint doesn't serve a GET /models route. The OpenAI-Compatible provider probes /models to populate the dropdown. If your gateway doesn't expose it, type the model ID manually in the form.
Chat hangs after first turn — Tool calling is enabled in Copilot but the model isn't returning the expected tool_calls payload shape. Either flip toolCalling: false in your customOAIModels entry, or switch to a model that fully implements the OpenAI tools spec.
CLI says "context length exceeded" early — COPILOT_MODEL is set to an alias your provider remaps to a smaller-context variant. Use the canonical model ID from the provider's docs, not a shorthand.
Vision attachments silently dropped — Set vision: true on the model entry in settings.json. Without that flag, Copilot strips image parts from the multimodal payload before sending.
The interesting thing about Copilot BYOK isn't that it lets you switch models — it's that it lets you switch vendors without leaving the editor. Copilot becomes a thin chat shell; the intelligence is rented from whoever's winning this month.
Originally published on ofox.ai/blog.
Top comments (0)