Avisek Dey

Posted on Mar 28 • Originally published at github.com

I Tried Replacing GitHub Copilot with Local AI — Here’s What Happened (Docker + GPU)

#docker #ai #vscode #productivity

A practical guide to running a private, GPU-accelerated coding assistant locally using Docker Desktop — no API costs, no data leaving your machine.

⏱️ Setup time: ~30–60 minutes

My Story

I'm a professional Django developer based in India.

Like most developers, I was using cloud-based AI tools for coding assistance — paying for API credits, sending my code to third-party servers, and depending on a stable internet connection just to get a code suggestion.

Then one day I was exploring Docker Desktop and noticed two new things in the sidebar — Models and MCP Toolkit.

I had no idea what they were.

A few hours of tinkering later, I had a fully local AI coding assistant running on my laptop:

✅ Free
✅ Private (no code leaves my machine)
✅ Works offline
✅ GPU-accelerated (~273ms response time)

No GitHub Copilot subscription. No API costs.

This is exactly how I built it — including all the mistakes I made along the way.

⚠️ Before We Start (Important Reality Check)

Let's be honest:

This is NOT a perfect replacement for GitHub Copilot.

Cloud models (like GPT-4/5-level) are still:

better at reasoning
better at large codebases
more consistent

But…

👉 For most day-to-day coding tasks, a local setup like this is:

fast enough
smart enough
and WAY more private

Think of this as a practical alternative, not a 1:1 replacement.

🧠 Why Run AI Locally?

Problem with Cloud AI	Local AI Solution
💸 Costly at scale ($10–30/million tokens)	✅ Completely free
🌐 Needs internet	✅ Works offline
🔒 Code sent to third-party servers	✅ Stays on your machine
⚡ Network latency	✅ GPU-accelerated (~273ms)

🖥️ My Setup

This guide is based on my machine — adjust based on your hardware.

Component	Spec
💻 Laptop	Lenovo IdeaPad Pro 5
🧠 CPU	Intel Core Ultra 9 185H
🎮 GPU	6GB NVIDIA
💾 RAM	32GB
🪟 OS	Windows 11

Minimum Setup Recommendation

Hardware	What to Expect
No GPU	Works, but slow (~2–5s responses)
8GB RAM	Very limited models only
16GB RAM	Usable
32GB + GPU	🔥 Ideal

📦 Step 1 — Install Docker Desktop

Install Docker Desktop with AI features enabled:

👉 https://www.docker.com/products/docker-desktop

Make sure you see Models and MCP Toolkit in the left sidebar.

🤖 Step 2 — Understanding Model Selection

Before pulling any model, you need to understand three things. This took me a while to figure out — so I'll keep it quick.

🔢 Parameters = Brain Size

7B  → Good for most coding tasks ✅
30B → Needs 16GB+ VRAM
70B → High-end machines only

📦 Quantization = Compression

Think of it like image compression — smaller file, slight quality trade-off.

F16  → Full precision, largest file
Q4_0 → 4x compressed, best balance ✅
Q2   → Smallest, noticeable quality loss

💡 The Golden Rule — RAM vs VRAM

Fits in VRAM (6GB)?  → GPU  ⚡ ~273ms
Spills to RAM?       → CPU  🐢 ~3000ms
Too big for RAM?     → ❌ Won't run

🏆 Step 3 — Pull the Right Model

My Pick: `qwen2.5:7B-Q4_0`

Property	Value	Why
Parameters	7.62B	Smart enough for coding
Quantization	Q4_0	4x compressed, great quality
Size	4.12 GiB	Fits perfectly in 6GB VRAM ✅

Steps:

Docker Desktop → Models
Search qwen2.5
Find qwen2.5:7B-Q4_0 → click Pull

⏳ ~4GB download. Grab a coffee.

⚡ Step 4 — Enable GPU-Accelerated Inference (CRITICAL)

Most guides miss this step entirely.

By default Docker Model Runner uses CPU only. One checkbox changes everything.

Go to: Docker Desktop → Settings → AI

Enable all three:

✅ Enable Docker Model Runner
✅ Enable host-side TCP support → Port: 12434
✅ Enable GPU-backed inference

Click Apply.

⚠️ GPU inference downloads additional components — takes a few minutes the first time.

Verify It's Working

Open your browser:

http://localhost:12434/engines/v1/models

You should see:

{
  "object": "list",
  "data": [
    {
      "id": "docker.io/ai/qwen2.5:7B-Q4_0",
      "object": "model",
      "owned_by": "docker"
    }
  ]
}

Check Response Speed

Go to Docker Desktop → Models → Requests tab after sending a prompt:

Mode	Speed
GPU ⚡	~273ms
CPU 🐢	~3000ms

That's a 10x speedup from one checkbox.

🔌 Step 5 — Connect VS Code via Continue.dev

Docker Models exposes a local OpenAI-compatible API at:

http://localhost:12434/engines/v1

👉 Key insight: any tool that supports OpenAI's API works here — just change the URL from OpenAI's server to localhost.

Install Continue.dev

VS Code → Extensions (Ctrl+Shift+X)
Search Continue
Install Continue - open-source AI code agent

Configure It

Open C:\Users\<yourname>\.continue\config.yaml and paste:

name: Local Config
version: 1.0.0
schema: v1
models:
  - name: Qwen2.5 Coder Local
    provider: openai
    model: ai/qwen2.5:7B-Q4_0
    apiBase: http://localhost:12434/engines/v1
    apiKey: docker

💡 Why provider: openai? Docker's API speaks the OpenAI protocol — same language, different address.

💡 Why apiKey: docker? Just a placeholder — localhost needs no real auth.

Windows PowerShell Fix (if needed)

If you hit a script execution error:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Test It!

✅ If it responds — you're done.

🧪 Real Comparison — Copilot vs Local AI

Prompt I tested:

Write a Django REST Framework viewset for a User model
with JWT authentication and permission classes

GitHub Copilot output:

Clean, complete, production-ready code with proper imports, docstrings, and edge case handling. Roughly 60 lines, zero follow-up needed.

Local Qwen 7B output:

from rest_framework import viewsets, permissions
from rest_framework_simplejwt.authentication import JWTAuthentication
from .models import User
from .serializers import UserSerializer

class UserViewSet(viewsets.ModelViewSet):
    queryset = User.objects.all()
    serializer_class = UserSerializer
    authentication_classes = [JWTAuthentication]
    permission_classes = [permissions.IsAuthenticated]

    def get_queryset(self):
        # Users can only see their own data
        return User.objects.filter(id=self.request.user.id)

Solid, functional, correct — but less complete than Copilot. Needed a follow-up prompt for edge cases.

Verdict:

Feature	Copilot	Local Qwen 7B
Speed	Fast	Fast (GPU ⚡)
Boilerplate	Excellent	Good
Reasoning	Strong	Moderate
Multi-file context	Better	Limited
Cost	$10–19/mo	FREE
Privacy	External servers	Your machine

🏗️ Architecture (Mental Model)

┌─────────────────────────────────────────────┐
│              YOUR MACHINE                    │
│                                              │
│  ┌─────────────────┐                         │
│  │  Docker Models  │  ← qwen2.5:7B-Q4_0     │
│  │  (AI Brain) 🧠  │    runs on your GPU     │
│  └────────┬────────┘                         │
│           │ exposes                          │
│           ▼                                  │
│  ┌─────────────────┐                         │
│  │  localhost:12434│  ← OpenAI-compatible    │
│  │  REST API  🔌   │    just like -p 8080    │
│  └────────┬────────┘                         │
│           │ connects to                      │
│           ▼                                  │
│  ┌─────────────────┐                         │
│  │  VS Code        │  ← Continue.dev         │
│  │  (Your IDE) 💻  │    extension            │
│  └─────────────────┘                         │
└─────────────────────────────────────────────┘

No internet. No API costs. No data leaks.

⚠️ Where This Falls Short

Be honest with yourself:

❌ Not as smart as GPT-4-level models
❌ Limited context window (struggles with large codebases)
❌ Needs decent hardware for best results
❌ Setup takes 30–60 minutes vs just paying for Copilot

❌ When You Should NOT Use This

Working on large enterprise codebases
Need best-in-class reasoning (GPT-4 level)
Want zero setup / plug-and-play
Low-end hardware (<16GB RAM, no GPU)

❓ Troubleshooting

Issue	Cause	Fix
`Connection error`	TCP not enabled	Docker Desktop → Settings → AI → Enable host-side TCP
Slow responses (>2s)	GPU not enabled	Docker Desktop → Settings → AI → Enable GPU-backed inference
`npx` script error	PowerShell policy	Run `Set-ExecutionPolicy RemoteSigned` as Admin
Model not showing	Not pulled	Docker Desktop → Models → Pull qwen2.5:7B-Q4_0

🚀 What's Next?

This is just the foundation.

Docker's MCP Toolkit can let your local AI actually act — read your codebase, modify files, understand requirements. That's a full agent setup, and I'll cover it in Part 2.

💬 Final Thoughts

This setup won't replace Copilot for everyone.

But if you care about privacy, cost, and full control over your tools — it's absolutely worth the 30 minutes to set up.

If you’re running this setup (or planning to), I’d love to hear:
👉 What hardware are you using?

Let’s compare setups 👇

Check out my knowledge vault where I document everything I learn hands-on:
👉 https://github.com/Riju007/dev-knowledge-vault

March 2026 | 🐳 Docker Desktop AI features

DEV Community

I Tried Replacing GitHub Copilot with Local AI — Here’s What Happened (Docker + GPU)

My Story

⚠️ Before We Start (Important Reality Check)

🧠 Why Run AI Locally?

🖥️ My Setup

Minimum Setup Recommendation

📦 Step 1 — Install Docker Desktop

🤖 Step 2 — Understanding Model Selection

🔢 Parameters = Brain Size

📦 Quantization = Compression

💡 The Golden Rule — RAM vs VRAM

🏆 Step 3 — Pull the Right Model

My Pick: `qwen2.5:7B-Q4_0`

⚡ Step 4 — Enable GPU-Accelerated Inference (CRITICAL)

Verify It's Working

Check Response Speed

🔌 Step 5 — Connect VS Code via Continue.dev

Install Continue.dev

Configure It

Windows PowerShell Fix (if needed)

Test It!

🧪 Real Comparison — Copilot vs Local AI

Prompt I tested:

GitHub Copilot output:

Local Qwen 7B output:

Verdict:

🏗️ Architecture (Mental Model)

⚠️ Where This Falls Short

❌ When You Should NOT Use This

❓ Troubleshooting

🚀 What's Next?

💬 Final Thoughts

Top comments (0)

My Story

⚠️ Before We Start (Important Reality Check)

🧠 Why Run AI Locally?

🖥️ My Setup

Minimum Setup Recommendation

📦 Step 1 — Install Docker Desktop

🤖 Step 2 — Understanding Model Selection

🔢 Parameters = Brain Size

📦 Quantization = Compression

💡 The Golden Rule — RAM vs VRAM

🏆 Step 3 — Pull the Right Model

My Pick: qwen2.5:7B-Q4_0

⚡ Step 4 — Enable GPU-Accelerated Inference (CRITICAL)

Verify It's Working

Check Response Speed

🔌 Step 5 — Connect VS Code via Continue.dev

Install Continue.dev

Configure It

Windows PowerShell Fix (if needed)

Test It!

🧪 Real Comparison — Copilot vs Local AI

Prompt I tested:

GitHub Copilot output:

Local Qwen 7B output:

Verdict:

🏗️ Architecture (Mental Model)

⚠️ Where This Falls Short

❌ When You Should NOT Use This

❓ Troubleshooting

🚀 What's Next?

💬 Final Thoughts

My Pick: `qwen2.5:7B-Q4_0`