Emanuele Bartolesi for Playful Programming

Posted on Apr 9

Using GitHub Copilot CLI with Local Models (LM Studio)

#github #githubcopilot #ai

Local AI is getting attention for one simple reason: control.

Cloud models are strong and fast, but for many companies and developers, especially when experimenting or working with sensitive code, that is not ideal.

This is where local models come in.

Tools like LM Studio let you run LLMs directly on your machine. No external calls. No data leaving your environment or your network.

Instead of sending prompts to cloud models, you can point Copilot CLI to a local model running in LM Studio.

This setup is not perfect. It is not officially seamless. But it works well enough for learning, experimentation, and some real workflows.

What You Need

Before setting this up, make sure the basics are clear. This is not a plug-and-play setup. There are a few moving parts, and some assumptions.

GitHub Copilot CLI

You need GitHub Copilot CLI installed and working.

You can launch GitHub Copilot CLI with the following command:

copilot

or even better, if you want to see the banner in the terminal:

copilot --banner

By default, the CLI uses GitHub-managed models, but now you can override that.

Note: You must be authenticated with GitHub and have access to Copilot.

LM Studio

LM Studio is the simplest way to run local LLMs without dealing with raw model tooling.

What it gives you:

A UI to download and run models
A local server that exposes an OpenAI-compatible API
No need to manually manage inference engines

Once running, it exposes an endpoint like (OpenAI compatible):

http://localhost:1234/v1

This is the key piece. Copilot CLI will talk to this endpoint instead of the cloud.

A Local Model (Be Realistic)

Not all models are equal. And your hardware matters.

For this guide, a small model like:

qwen/qwen3-coder-30b

is enough to get started.

But be clear on the trade-offs:

Small models → fast, but weaker reasoning
Large models → better output, but slow or unusable on most laptops

If you are on a standard laptop:

1B–3B models → OK
7B+ models → borderline
13B+ → usually not practical without a GPU

On my laptop I am giving a chance to the NVidia model nemotron-3-nano-4b but on my gaming PC (that I use for developing and not gaming) I use bigger models la Qwen3 Code or similar.

Tip: Start small. The goal here is understanding the workflow, not benchmarking models.

Connecting Copilot CLI to LM Studio

This is the part most people get wrong.

Copilot CLI is not designed primarily for local models. You are using a BYOK (Bring Your Own Key/Model) path that is still evolving.

It works, but you need to be precise.

Reference: https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/use-byok-models

1. Set the Environment Variables

You must override the default Copilot provider.

In PowerShell:

$env:COPILOT_PROVIDER_BASE_URL="http://localhost:1234/v1"
$env:COPILOT_MODEL="google/gemma-3-1b"
$env:COPILOT_OFFLINE="true"

What they do:

COPILOT_PROVIDER_BASE_URL → points to your local LM Studio server
COPILOT_MODEL → defines which model to use
COPILOT_OFFLINE → prevents fallback to cloud models

If COPILOT_OFFLINE is not set, Copilot may silently use cloud models.

2. Run a Simple Test

Open LM Studio, and open the *Developer * tab.
Click on the switch of the Status and the server will start in a second.

Then, click on Load Models and load the models you prefer that you downloaded previously on your machine.

Open Copilot CLI with the following command in a project folder you want to test:

copilot --banner

Ask a simple tasks like: "Give me the list of all the files larger than 2MB".

If everything is configured correctly:

The response comes from your local model
No external API calls are made

Don't expect a result in seconds like a normal cloud model, but it depends on your hardware; it can also take minutes sometimes.

Reality Check

This is not a first-class integration.

No guarantee of full compatibility
No optimization for local inference
No smart routing like in GitHub-hosted Copilot

But for simple CLI workflows, it is good enough.

When This Setup Makes Sense

Do not use this everywhere. Use it where it actually gives you an advantage.

1. Privacy-Sensitive Environments

If code cannot leave your machine, this setup is useful.

Examples:

Internal tools
Proprietary scripts
Regulated environments

You avoid sending prompts and contexts to external services.

This is the strongest reason to use local models.
Especially in some companies like insurances or military, it makes absolutely sense.

2. Offline Workflows

If you work without a stable connection or you don't have a connection at all (like during a flight).

It is slower, but always available.

3. Learning and Understanding AI

This setup forces you to see how LLMs actually behave.

You learn:

Prompt sensitivity
Model limitations
Output variability

This is valuable if you want to go beyond "just using Copilot".

When It Does NOT Make Sense

Avoid this setup if you need:

High accuracy
Large context handling
Production-grade reliability

In those cases, cloud models are still the better option.

👀 GitHub Copilot quota visibility in VS Code

If you use GitHub Copilot and ever wondered:

what plan you’re on
whether you have limits
how much premium quota is left
when it resets

I built a small VS Code extension called Copilot Insights.

It shows Copilot plan and quota status directly inside VS Code.

No usage analytics. No productivity scoring. Just clarity.

👉 VS Code Marketplace:
https://marketplace.visualstudio.com/items?itemName=emanuelebartolesi.vscode-copilot-insights

DEV Community