DEV Community

Cover image for I Tried Replacing GitHub Copilot with Local AI — Here’s What Happened (Docker + GPU)
Avisek Dey
Avisek Dey

Posted on • Originally published at github.com

I Tried Replacing GitHub Copilot with Local AI — Here’s What Happened (Docker + GPU)

A practical guide to running a private, GPU-accelerated coding assistant locally using Docker Desktop — no API costs, no data leaving your machine.

⏱️ Setup time: ~30–60 minutes

My Story

I'm a professional Django developer based in India.

Like most developers, I was using cloud-based AI tools for coding assistance — paying for API credits, sending my code to third-party servers, and depending on a stable internet connection just to get a code suggestion.

Then one day I was exploring Docker Desktop and noticed two new things in the sidebar — Models and MCP Toolkit.

I had no idea what they were.

A few hours of tinkering later, I had a fully local AI coding assistant running on my laptop:

  • ✅ Free
  • ✅ Private (no code leaves my machine)
  • ✅ Works offline
  • ✅ GPU-accelerated (~273ms response time)

No GitHub Copilot subscription. No API costs.

This is exactly how I built it — including all the mistakes I made along the way.


⚠️ Before We Start (Important Reality Check)

Let's be honest:

This is NOT a perfect replacement for GitHub Copilot.

Cloud models (like GPT-4/5-level) are still:

  • better at reasoning
  • better at large codebases
  • more consistent

But…

👉 For most day-to-day coding tasks, a local setup like this is:

  • fast enough
  • smart enough
  • and WAY more private

Think of this as a practical alternative, not a 1:1 replacement.


🧠 Why Run AI Locally?

Problem with Cloud AI Local AI Solution
💸 Costly at scale ($10–30/million tokens) ✅ Completely free
🌐 Needs internet ✅ Works offline
🔒 Code sent to third-party servers ✅ Stays on your machine
⚡ Network latency ✅ GPU-accelerated (~273ms)

🖥️ My Setup

This guide is based on my machine — adjust based on your hardware.

Component Spec
💻 Laptop Lenovo IdeaPad Pro 5
🧠 CPU Intel Core Ultra 9 185H
🎮 GPU 6GB NVIDIA
💾 RAM 32GB
🪟 OS Windows 11

Minimum Setup Recommendation

Hardware What to Expect
No GPU Works, but slow (~2–5s responses)
8GB RAM Very limited models only
16GB RAM Usable
32GB + GPU 🔥 Ideal

📦 Step 1 — Install Docker Desktop

Install Docker Desktop with AI features enabled:

👉 https://www.docker.com/products/docker-desktop

Make sure you see Models and MCP Toolkit in the left sidebar.


🤖 Step 2 — Understanding Model Selection

Before pulling any model, you need to understand three things. This took me a while to figure out — so I'll keep it quick.

🔢 Parameters = Brain Size

7B  → Good for most coding tasks ✅
30B → Needs 16GB+ VRAM
70B → High-end machines only
Enter fullscreen mode Exit fullscreen mode

📦 Quantization = Compression

Think of it like image compression — smaller file, slight quality trade-off.

F16  → Full precision, largest file
Q4_0 → 4x compressed, best balance ✅
Q2   → Smallest, noticeable quality loss
Enter fullscreen mode Exit fullscreen mode

💡 The Golden Rule — RAM vs VRAM

Fits in VRAM (6GB)?  → GPU  ⚡ ~273ms
Spills to RAM?       → CPU  🐢 ~3000ms
Too big for RAM?     → ❌ Won't run
Enter fullscreen mode Exit fullscreen mode

🏆 Step 3 — Pull the Right Model

My Pick: qwen2.5:7B-Q4_0

Property Value Why
Parameters 7.62B Smart enough for coding
Quantization Q4_0 4x compressed, great quality
Size 4.12 GiB Fits perfectly in 6GB VRAM ✅

Steps:

  1. Docker Desktop → Models
  2. Search qwen2.5
  3. Find qwen2.5:7B-Q4_0 → click Pull

Docker Models showing qwen2.5 variants with pull options

⏳ ~4GB download. Grab a coffee.


⚡ Step 4 — Enable GPU-Accelerated Inference (CRITICAL)

Most guides miss this step entirely.

By default Docker Model Runner uses CPU only. One checkbox changes everything.

Go to: Docker Desktop → Settings → AI

Enable all three:

  • ✅ Enable Docker Model Runner
  • ✅ Enable host-side TCP support → Port: 12434
  • ✅ Enable GPU-backed inference

Click Apply.

⚠️ GPU inference downloads additional components — takes a few minutes the first time.

Verify It's Working

Open your browser:

http://localhost:12434/engines/v1/models
Enter fullscreen mode Exit fullscreen mode

You should see:

{
  "object": "list",
  "data": [
    {
      "id": "docker.io/ai/qwen2.5:7B-Q4_0",
      "object": "model",
      "owned_by": "docker"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Check Response Speed

Go to Docker Desktop → Models → Requests tab after sending a prompt:

Docker Models Requests tab showing 273ms GPU response time

Mode Speed
GPU ⚡ ~273ms
CPU 🐢 ~3000ms

That's a 10x speedup from one checkbox.


🔌 Step 5 — Connect VS Code via Continue.dev

Docker Models exposes a local OpenAI-compatible API at:

http://localhost:12434/engines/v1
Enter fullscreen mode Exit fullscreen mode

👉 Key insight: any tool that supports OpenAI's API works here — just change the URL from OpenAI's server to localhost.

Install Continue.dev

  1. VS Code → Extensions (Ctrl+Shift+X)
  2. Search Continue
  3. Install Continue - open-source AI code agent

Configure It

Open C:\Users\<yourname>\.continue\config.yaml and paste:

name: Local Config
version: 1.0.0
schema: v1
models:
  - name: Qwen2.5 Coder Local
    provider: openai
    model: ai/qwen2.5:7B-Q4_0
    apiBase: http://localhost:12434/engines/v1
    apiKey: docker
Enter fullscreen mode Exit fullscreen mode

💡 Why provider: openai? Docker's API speaks the OpenAI protocol — same language, different address.

💡 Why apiKey: docker? Just a placeholder — localhost needs no real auth.

Windows PowerShell Fix (if needed)

If you hit a script execution error:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Enter fullscreen mode Exit fullscreen mode

Test It!

Continue.dev working with local Qwen model in VS Code

✅ If it responds — you're done.


🧪 Real Comparison — Copilot vs Local AI

Prompt I tested:

Write a Django REST Framework viewset for a User model
with JWT authentication and permission classes
Enter fullscreen mode Exit fullscreen mode

GitHub Copilot output:

Clean, complete, production-ready code with proper imports, docstrings, and edge case handling. Roughly 60 lines, zero follow-up needed.

Local Qwen 7B output:

from rest_framework import viewsets, permissions
from rest_framework_simplejwt.authentication import JWTAuthentication
from .models import User
from .serializers import UserSerializer

class UserViewSet(viewsets.ModelViewSet):
    queryset = User.objects.all()
    serializer_class = UserSerializer
    authentication_classes = [JWTAuthentication]
    permission_classes = [permissions.IsAuthenticated]

    def get_queryset(self):
        # Users can only see their own data
        return User.objects.filter(id=self.request.user.id)
Enter fullscreen mode Exit fullscreen mode

Solid, functional, correct — but less complete than Copilot. Needed a follow-up prompt for edge cases.

Verdict:

Feature Copilot Local Qwen 7B
Speed Fast Fast (GPU ⚡)
Boilerplate Excellent Good
Reasoning Strong Moderate
Multi-file context Better Limited
Cost $10–19/mo FREE
Privacy External servers Your machine

🏗️ Architecture (Mental Model)

┌─────────────────────────────────────────────┐
│              YOUR MACHINE                    │
│                                              │
│  ┌─────────────────┐                         │
│  │  Docker Models  │  ← qwen2.5:7B-Q4_0     │
│  │  (AI Brain) 🧠  │    runs on your GPU     │
│  └────────┬────────┘                         │
│           │ exposes                          │
│           ▼                                  │
│  ┌─────────────────┐                         │
│  │  localhost:12434│  ← OpenAI-compatible    │
│  │  REST API  🔌   │    just like -p 8080    │
│  └────────┬────────┘                         │
│           │ connects to                      │
│           ▼                                  │
│  ┌─────────────────┐                         │
│  │  VS Code        │  ← Continue.dev         │
│  │  (Your IDE) 💻  │    extension            │
│  └─────────────────┘                         │
└─────────────────────────────────────────────┘

No internet. No API costs. No data leaks.
Enter fullscreen mode Exit fullscreen mode

⚠️ Where This Falls Short

Be honest with yourself:

  • ❌ Not as smart as GPT-4-level models
  • ❌ Limited context window (struggles with large codebases)
  • ❌ Needs decent hardware for best results
  • ❌ Setup takes 30–60 minutes vs just paying for Copilot

❌ When You Should NOT Use This

  • Working on large enterprise codebases
  • Need best-in-class reasoning (GPT-4 level)
  • Want zero setup / plug-and-play
  • Low-end hardware (<16GB RAM, no GPU)

❓ Troubleshooting

Issue Cause Fix
Connection error TCP not enabled Docker Desktop → Settings → AI → Enable host-side TCP
Slow responses (>2s) GPU not enabled Docker Desktop → Settings → AI → Enable GPU-backed inference
npx script error PowerShell policy Run Set-ExecutionPolicy RemoteSigned as Admin
Model not showing Not pulled Docker Desktop → Models → Pull qwen2.5:7B-Q4_0

🚀 What's Next?

This is just the foundation.

Docker's MCP Toolkit can let your local AI actually act — read your codebase, modify files, understand requirements. That's a full agent setup, and I'll cover it in Part 2.


💬 Final Thoughts

This setup won't replace Copilot for everyone.

But if you care about privacy, cost, and full control over your tools — it's absolutely worth the 30 minutes to set up.


If you’re running this setup (or planning to), I’d love to hear:
👉 What hardware are you using?

Let’s compare setups 👇

Check out my knowledge vault where I document everything I learn hands-on:
👉 https://github.com/Riju007/dev-knowledge-vault


March 2026 | 🐳 Docker Desktop AI features

Top comments (0)