A practical guide to running a private, GPU-accelerated coding assistant locally using Docker Desktop — no API costs, no data leaving your machine.
⏱️ Setup time: ~30–60 minutes
My Story
I'm a professional Django developer based in India.
Like most developers, I was using cloud-based AI tools for coding assistance — paying for API credits, sending my code to third-party servers, and depending on a stable internet connection just to get a code suggestion.
Then one day I was exploring Docker Desktop and noticed two new things in the sidebar — Models and MCP Toolkit.
I had no idea what they were.
A few hours of tinkering later, I had a fully local AI coding assistant running on my laptop:
- ✅ Free
- ✅ Private (no code leaves my machine)
- ✅ Works offline
- ✅ GPU-accelerated (~273ms response time)
No GitHub Copilot subscription. No API costs.
This is exactly how I built it — including all the mistakes I made along the way.
⚠️ Before We Start (Important Reality Check)
Let's be honest:
This is NOT a perfect replacement for GitHub Copilot.
Cloud models (like GPT-4/5-level) are still:
- better at reasoning
- better at large codebases
- more consistent
But…
👉 For most day-to-day coding tasks, a local setup like this is:
- fast enough
- smart enough
- and WAY more private
Think of this as a practical alternative, not a 1:1 replacement.
🧠 Why Run AI Locally?
| Problem with Cloud AI | Local AI Solution |
|---|---|
| 💸 Costly at scale ($10–30/million tokens) | ✅ Completely free |
| 🌐 Needs internet | ✅ Works offline |
| 🔒 Code sent to third-party servers | ✅ Stays on your machine |
| ⚡ Network latency | ✅ GPU-accelerated (~273ms) |
🖥️ My Setup
This guide is based on my machine — adjust based on your hardware.
| Component | Spec |
|---|---|
| 💻 Laptop | Lenovo IdeaPad Pro 5 |
| 🧠 CPU | Intel Core Ultra 9 185H |
| 🎮 GPU | 6GB NVIDIA |
| 💾 RAM | 32GB |
| 🪟 OS | Windows 11 |
Minimum Setup Recommendation
| Hardware | What to Expect |
|---|---|
| No GPU | Works, but slow (~2–5s responses) |
| 8GB RAM | Very limited models only |
| 16GB RAM | Usable |
| 32GB + GPU | 🔥 Ideal |
📦 Step 1 — Install Docker Desktop
Install Docker Desktop with AI features enabled:
👉 https://www.docker.com/products/docker-desktop
Make sure you see Models and MCP Toolkit in the left sidebar.
🤖 Step 2 — Understanding Model Selection
Before pulling any model, you need to understand three things. This took me a while to figure out — so I'll keep it quick.
🔢 Parameters = Brain Size
7B → Good for most coding tasks ✅
30B → Needs 16GB+ VRAM
70B → High-end machines only
📦 Quantization = Compression
Think of it like image compression — smaller file, slight quality trade-off.
F16 → Full precision, largest file
Q4_0 → 4x compressed, best balance ✅
Q2 → Smallest, noticeable quality loss
💡 The Golden Rule — RAM vs VRAM
Fits in VRAM (6GB)? → GPU ⚡ ~273ms
Spills to RAM? → CPU 🐢 ~3000ms
Too big for RAM? → ❌ Won't run
🏆 Step 3 — Pull the Right Model
My Pick: qwen2.5:7B-Q4_0
| Property | Value | Why |
|---|---|---|
| Parameters | 7.62B | Smart enough for coding |
| Quantization | Q4_0 | 4x compressed, great quality |
| Size | 4.12 GiB | Fits perfectly in 6GB VRAM ✅ |
Steps:
- Docker Desktop → Models
- Search
qwen2.5 - Find
qwen2.5:7B-Q4_0→ click Pull
⏳ ~4GB download. Grab a coffee.
⚡ Step 4 — Enable GPU-Accelerated Inference (CRITICAL)
Most guides miss this step entirely.
By default Docker Model Runner uses CPU only. One checkbox changes everything.
Go to: Docker Desktop → Settings → AI
Enable all three:
- ✅ Enable Docker Model Runner
- ✅ Enable host-side TCP support → Port:
12434 - ✅ Enable GPU-backed inference
Click Apply.
⚠️ GPU inference downloads additional components — takes a few minutes the first time.
Verify It's Working
Open your browser:
http://localhost:12434/engines/v1/models
You should see:
{
"object": "list",
"data": [
{
"id": "docker.io/ai/qwen2.5:7B-Q4_0",
"object": "model",
"owned_by": "docker"
}
]
}
Check Response Speed
Go to Docker Desktop → Models → Requests tab after sending a prompt:
| Mode | Speed |
|---|---|
| GPU ⚡ | ~273ms |
| CPU 🐢 | ~3000ms |
That's a 10x speedup from one checkbox.
🔌 Step 5 — Connect VS Code via Continue.dev
Docker Models exposes a local OpenAI-compatible API at:
http://localhost:12434/engines/v1
👉 Key insight: any tool that supports OpenAI's API works here — just change the URL from OpenAI's server to localhost.
Install Continue.dev
- VS Code → Extensions (
Ctrl+Shift+X) - Search
Continue - Install Continue - open-source AI code agent
Configure It
Open C:\Users\<yourname>\.continue\config.yaml and paste:
name: Local Config
version: 1.0.0
schema: v1
models:
- name: Qwen2.5 Coder Local
provider: openai
model: ai/qwen2.5:7B-Q4_0
apiBase: http://localhost:12434/engines/v1
apiKey: docker
💡 Why
provider: openai? Docker's API speaks the OpenAI protocol — same language, different address.💡 Why
apiKey: docker? Just a placeholder — localhost needs no real auth.
Windows PowerShell Fix (if needed)
If you hit a script execution error:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Test It!
✅ If it responds — you're done.
🧪 Real Comparison — Copilot vs Local AI
Prompt I tested:
Write a Django REST Framework viewset for a User model
with JWT authentication and permission classes
GitHub Copilot output:
Clean, complete, production-ready code with proper imports, docstrings, and edge case handling. Roughly 60 lines, zero follow-up needed.
Local Qwen 7B output:
from rest_framework import viewsets, permissions
from rest_framework_simplejwt.authentication import JWTAuthentication
from .models import User
from .serializers import UserSerializer
class UserViewSet(viewsets.ModelViewSet):
queryset = User.objects.all()
serializer_class = UserSerializer
authentication_classes = [JWTAuthentication]
permission_classes = [permissions.IsAuthenticated]
def get_queryset(self):
# Users can only see their own data
return User.objects.filter(id=self.request.user.id)
Solid, functional, correct — but less complete than Copilot. Needed a follow-up prompt for edge cases.
Verdict:
| Feature | Copilot | Local Qwen 7B |
|---|---|---|
| Speed | Fast | Fast (GPU ⚡) |
| Boilerplate | Excellent | Good |
| Reasoning | Strong | Moderate |
| Multi-file context | Better | Limited |
| Cost | $10–19/mo | FREE |
| Privacy | External servers | Your machine |
🏗️ Architecture (Mental Model)
┌─────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ ┌─────────────────┐ │
│ │ Docker Models │ ← qwen2.5:7B-Q4_0 │
│ │ (AI Brain) 🧠 │ runs on your GPU │
│ └────────┬────────┘ │
│ │ exposes │
│ ▼ │
│ ┌─────────────────┐ │
│ │ localhost:12434│ ← OpenAI-compatible │
│ │ REST API 🔌 │ just like -p 8080 │
│ └────────┬────────┘ │
│ │ connects to │
│ ▼ │
│ ┌─────────────────┐ │
│ │ VS Code │ ← Continue.dev │
│ │ (Your IDE) 💻 │ extension │
│ └─────────────────┘ │
└─────────────────────────────────────────────┘
No internet. No API costs. No data leaks.
⚠️ Where This Falls Short
Be honest with yourself:
- ❌ Not as smart as GPT-4-level models
- ❌ Limited context window (struggles with large codebases)
- ❌ Needs decent hardware for best results
- ❌ Setup takes 30–60 minutes vs just paying for Copilot
❌ When You Should NOT Use This
- Working on large enterprise codebases
- Need best-in-class reasoning (GPT-4 level)
- Want zero setup / plug-and-play
- Low-end hardware (<16GB RAM, no GPU)
❓ Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
Connection error |
TCP not enabled | Docker Desktop → Settings → AI → Enable host-side TCP |
| Slow responses (>2s) | GPU not enabled | Docker Desktop → Settings → AI → Enable GPU-backed inference |
npx script error |
PowerShell policy | Run Set-ExecutionPolicy RemoteSigned as Admin |
| Model not showing | Not pulled | Docker Desktop → Models → Pull qwen2.5:7B-Q4_0 |
🚀 What's Next?
This is just the foundation.
Docker's MCP Toolkit can let your local AI actually act — read your codebase, modify files, understand requirements. That's a full agent setup, and I'll cover it in Part 2.
💬 Final Thoughts
This setup won't replace Copilot for everyone.
But if you care about privacy, cost, and full control over your tools — it's absolutely worth the 30 minutes to set up.
If you’re running this setup (or planning to), I’d love to hear:
👉 What hardware are you using?
Let’s compare setups 👇
Check out my knowledge vault where I document everything I learn hands-on:
👉 https://github.com/Riju007/dev-knowledge-vault
March 2026 | 🐳 Docker Desktop AI features



Top comments (0)