Laksmana Tri Moerdani

Posted on Jul 4

How to Use 30+ AI Models in GitHub Copilot Chat for Free

#ai #vscode #opensource #tutorial

Copilot Pro+ is $39 a month. The free tier caps you at 2,000 completions and two models. Want DeepSeek V4, Kimi K2.6, GLM 5.1, Qwen3.7 Max? Not in the default catalog.

There's a VS Code API that lets extensions register custom providers into the native model picker. I built one that plugs in 30+ models. No Copilot subscription required. Same Chat UI, same Agent Mode, just more models to pick from.

Why bother adding more models to Copilot Chat?

A few reasons I kept running into:

The free tier is thin. Two models, 2,000 completions. Fine for a weekend. Falls apart on a real project.
Pro+ locks the interesting models behind $39. Claude Opus, GPT-5.5 premium, Gemini 3.5 Pro. You pay the full monthly even if you only need them occasionally.
Open-weight models got good. DeepSeek V4, Kimi K2.6, GLM 5.2, Qwen3.7 can hold their own against frontier models for most coding tasks. They're cheap or free, but wiring each one into VS Code manually is a chore.
Different tasks want different models. Quick refactor vs. deep debugging vs. large-context review. One model doesn't fit all.

The extension sits in between. Free models when the task is simple. Pay-per-use when you need Claude Opus for one tricky prompt. Flat subscription when you're grinding through a refactor.

A few things to know before you start

You need GitHub Copilot Chat installed. The free version works. You don't need Pro, Pro+, or Max.
You need a VS Code account or local install. VS Code 1.100 or newer.
API keys are stored in VS Code SecretStorage. They don't leave your machine.
Free models rotate. Big Pickle is always free. DeepSeek V4 Flash Free, MiMo V2.5 Free, and Nemotron rotate in and out. The paid models stay put.
The extension is MIT, independent. Not affiliated with GitHub, OpenCode, or any model provider.

Which plan fits your use case?

Three options, same extension. Pick based on how much you code.

Just testing, or light use. OpenCode Zen free models. $0. DeepSeek V4 Flash works at $0 balance. No card needed. Rate limits are low without balance, but enough to try the extension.

Daily coding. OpenCode Go subscription. $10 a month, $5 the first month. DeepSeek V4 Pro, Kimi K2.6, GLM 5.1, Qwen3.7 Max, MiMo V2.5 Pro, MiniMax M3. Generous limits across 5-hour, weekly, and monthly windows.

Need Claude, GPT, or Gemini occasionally. Zen pay-per-use. Add $20 balance. Claude Opus 4.7 ($5/$25 per 1M tokens), GPT-5.5 ($5/$30), Gemini 3.5 Flash ($0.50/$3). You pay only for what you use. Adding balance also improves rate limits on the free models.

My honest take: start with the free tier. If you hit the rate limit more than twice a week, upgrade to Go. Add Zen balance only when you specifically need Claude or GPT for a task.

How to set it up

Takes about 60 seconds.

1. Install GitHub Copilot Chat

If you don't have it already:

code --install-extension GitHub.copilot-chat

2. Install OpenCode for Copilot Chat

From the marketplace: OpenCode for Copilot Chat

Or via command line:

code --install-extension ltmoerdani.opencode-copilot-chat

3. Get an OpenCode API key

Go to opencode.ai/auth, sign up, and copy your API key.

You don't need to add a payment method to start. The free models work at $0 balance.

4. Add the provider to Copilot Chat

Open Copilot Chat in VS Code (Cmd+Shift+I or Ctrl+Shift+I). Click the model name in the picker.

Model picker → "Add Models..." → OpenCode Zen → paste API key

Press Enter to accept the default group name. The models appear in the picker.

5. Pick a model and start chatting

Pick any OpenCode model from the dropdown. Start with DeepSeek V4 Flash Free if you want to test without paying anything.

That's it. Tool-calling, Agent Mode, file edits, terminal commands. All work natively because the extension forwards tool-call format correctly per endpoint.

What you actually get

	Copilot Free	Copilot Pro+ $39/mo	OpenCode Extension
Models	2	Premium only	30+
Free model	No	No	DeepSeek V4 Flash
Reasoning controls	None	GitHub decides	You set per model
Agent Mode	No	Yes	Yes
Vision, PDF, Audio	Limited	Limited	Per-model
Provider	GitHub	GitHub	OpenCode Zen or Go

The models

Free (no payment needed):

Big Pickle (always free, 200K context)
DeepSeek V4 Flash Free (200K context)
MiMo V2.5 Free (rotating)
Nemotron 3 Super Free (rotating)

Go subscription, $10/mo ($5 first month):

DeepSeek V4 Pro (1M context, 384K output, reasoning off to max)
Kimi K2.6 (262K context, reasoning on/off)
GLM 5.1 (202K context)
Qwen3.7 Max (1M context, thinking budget 4K to 82K)
MiMo V2.5 Pro (1M context)
MiniMax M3 (512K context)

Zen pay-per-use (add balance):

Claude Opus 4.7 ($5/$25 per 1M tokens)
Claude Sonnet 4.6 ($3/$15)
GPT-5.5 ($5/$30)
GPT-5.4 ($0.75 to $30 depending on variant)
Gemini 3.5 Flash ($0.50/$3)
Grok 4 (256K context)
Mistral Large, Llama 4 Maverick, Sonar Pro, Command R+

Thinking controls per model family

Each family has its own reasoning knob. You set it from the model picker, no config file.

DeepSeek: off / low / medium / high / max
Qwen: thinking_budget from 4,096 to 81,920 tokens
MiMo: low / medium / high
MiniMax: on / off
GLM and Kimi: on / off

I usually keep DeepSeek on high for debugging, medium for refactors, off for quick questions. Qwen's thinking_budget is handy when you want reasoning but need to cap token cost on a long session.

How the routing actually works

Different model families speak different protocols. The extension routes each one to its native endpoint.

GPT models        → OpenAI /responses
Gemini            → Google :streamGenerateContent?alt=sse
Claude, MiniMax   → Anthropic /messages
DeepSeek, Qwen,
  Kimi, GLM, MiMo → /chat/completions

Tool-call format gets translated per endpoint. OpenAI uses tool_calls, Anthropic uses tool_use content blocks. Agent Mode (read files, edit, run terminal) keeps working because the translation happens in the streaming layer.

There's also a retry layer. If the upstream API rejects a parameter because models.dev metadata is stale, the extension parses the error, patches the request body, and retries once. This handles thinking config mismatches and temperature rejections without requiring a code release.

Honest limitations

Worth being upfront about these:

Session cost doesn't show in VS Code's native session popover. VS Code 1.126 doesn't convert BYOK usage data parts into IChatUsage progress events yet. The extension tracks cost in its own status bar. When VS Code fixes this, the data will flow through automatically.
Free models have low rate limits without balance. Adding $20 to Zen improves this, but if you want truly free, expect to wait during peak hours.
Some models need specific configurations. Kimi K2.7 Code rejects temperature and forces thinking on. GLM only accepts off, high, or max. The extension handles these per-model quirks, but if a new model drops with a new quirk, it might 400 until metadata catches up.
Vision support varies. GLM, MiniMax, and a few others don't support image input. The extension filters this based on models.dev metadata.

Conclusion

If you're on Copilot Free and keep hitting the limit, or you're on Pro+ and keep wishing for a model that isn't in the catalog, this is a way out. The free tier is enough to test. The Go subscription is the cheapest way to get daily access to the open-weight catalog. Zen pay-per-use covers the Claude/GPT/Gemini cases without a $39 commitment.

Same Chat UI you already use. More models. Lower bill most months.

Repo is MIT on GitHub. Contributions welcome. Four external contributors have already shipped features, bug fixes, and docs.

DEV Community