<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Avisek Dey</title>
    <description>The latest articles on DEV Community by Avisek Dey (@riju005).</description>
    <link>https://dev.to/riju005</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1412412%2F2729af02-37e8-4806-a562-18a543b8eced.png</url>
      <title>DEV Community: Avisek Dey</title>
      <link>https://dev.to/riju005</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/riju005"/>
    <language>en</language>
    <item>
      <title>I Tried Replacing GitHub Copilot with Local AI — Here’s What Happened (Docker + GPU)</title>
      <dc:creator>Avisek Dey</dc:creator>
      <pubDate>Sat, 28 Mar 2026 19:57:28 +0000</pubDate>
      <link>https://dev.to/riju005/i-tried-replacing-github-copilot-with-local-ai-heres-what-happened-docker-gpu-5b44</link>
      <guid>https://dev.to/riju005/i-tried-replacing-github-copilot-with-local-ai-heres-what-happened-docker-gpu-5b44</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A practical guide to running a private, GPU-accelerated coding assistant locally using Docker Desktop — no API costs, no data leaving your machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;⏱️ Setup time: ~30–60 minutes&lt;/p&gt;

&lt;h2&gt;
  
  
  My Story
&lt;/h2&gt;

&lt;p&gt;I'm a professional Django developer based in India.&lt;/p&gt;

&lt;p&gt;Like most developers, I was using cloud-based AI tools for coding assistance — paying for API credits, sending my code to third-party servers, and depending on a stable internet connection just to get a code suggestion.&lt;/p&gt;

&lt;p&gt;Then one day I was exploring Docker Desktop and noticed two new things in the sidebar — &lt;strong&gt;Models&lt;/strong&gt; and &lt;strong&gt;MCP Toolkit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I had no idea what they were.&lt;/p&gt;

&lt;p&gt;A few hours of tinkering later, I had a fully local AI coding assistant running on my laptop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Free
&lt;/li&gt;
&lt;li&gt;✅ Private (no code leaves my machine)
&lt;/li&gt;
&lt;li&gt;✅ Works offline
&lt;/li&gt;
&lt;li&gt;✅ GPU-accelerated (~273ms response time)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No GitHub Copilot subscription. No API costs.&lt;/p&gt;

&lt;p&gt;This is exactly how I built it — including all the mistakes I made along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚠️ Before We Start (Important Reality Check)
&lt;/h2&gt;

&lt;p&gt;Let's be honest:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is NOT a perfect replacement for GitHub Copilot.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cloud models (like GPT-4/5-level) are still:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better at reasoning&lt;/li&gt;
&lt;li&gt;better at large codebases&lt;/li&gt;
&lt;li&gt;more consistent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But…&lt;/p&gt;

&lt;p&gt;👉 For &lt;strong&gt;most day-to-day coding tasks&lt;/strong&gt;, a local setup like this is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast enough&lt;/li&gt;
&lt;li&gt;smart enough&lt;/li&gt;
&lt;li&gt;and WAY more private&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of this as a &lt;strong&gt;practical alternative&lt;/strong&gt;, not a 1:1 replacement.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Why Run AI Locally?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem with Cloud AI&lt;/th&gt;
&lt;th&gt;Local AI Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;💸 Costly at scale ($10–30/million tokens)&lt;/td&gt;
&lt;td&gt;✅ Completely free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🌐 Needs internet&lt;/td&gt;
&lt;td&gt;✅ Works offline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔒 Code sent to third-party servers&lt;/td&gt;
&lt;td&gt;✅ Stays on your machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;⚡ Network latency&lt;/td&gt;
&lt;td&gt;✅ GPU-accelerated (~273ms)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🖥️ My Setup
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;This guide is based on my machine — adjust based on your hardware.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;💻 Laptop&lt;/td&gt;
&lt;td&gt;Lenovo IdeaPad Pro 5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🧠 CPU&lt;/td&gt;
&lt;td&gt;Intel Core Ultra 9 185H&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🎮 GPU&lt;/td&gt;
&lt;td&gt;6GB NVIDIA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;💾 RAM&lt;/td&gt;
&lt;td&gt;32GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🪟 OS&lt;/td&gt;
&lt;td&gt;Windows 11&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Minimum Setup Recommendation
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;th&gt;What to Expect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No GPU&lt;/td&gt;
&lt;td&gt;Works, but slow (~2–5s responses)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8GB RAM&lt;/td&gt;
&lt;td&gt;Very limited models only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16GB RAM&lt;/td&gt;
&lt;td&gt;Usable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32GB + GPU&lt;/td&gt;
&lt;td&gt;🔥 Ideal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  📦 Step 1 — Install Docker Desktop
&lt;/h2&gt;

&lt;p&gt;Install Docker Desktop with AI features enabled:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.docker.com/products/docker-desktop" rel="noopener noreferrer"&gt;https://www.docker.com/products/docker-desktop&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Make sure you see &lt;strong&gt;Models&lt;/strong&gt; and &lt;strong&gt;MCP Toolkit&lt;/strong&gt; in the left sidebar.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Step 2 — Understanding Model Selection
&lt;/h2&gt;

&lt;p&gt;Before pulling any model, you need to understand three things. This took me a while to figure out — so I'll keep it quick.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔢 Parameters = Brain Size
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;7B  → Good for most coding tasks ✅
30B → Needs 16GB+ VRAM
70B → High-end machines only
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  📦 Quantization = Compression
&lt;/h3&gt;

&lt;p&gt;Think of it like image compression — smaller file, slight quality trade-off.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;F16  → Full precision, largest file
Q4_0 → 4x compressed, best balance ✅
Q2   → Smallest, noticeable quality loss
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  💡 The Golden Rule — RAM vs VRAM
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fits in VRAM (6GB)?  → GPU  ⚡ ~273ms
Spills to RAM?       → CPU  🐢 ~3000ms
Too big for RAM?     → ❌ Won't run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🏆 Step 3 — Pull the Right Model
&lt;/h2&gt;

&lt;h3&gt;
  
  
  My Pick: &lt;code&gt;qwen2.5:7B-Q4_0&lt;/code&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Parameters&lt;/td&gt;
&lt;td&gt;7.62B&lt;/td&gt;
&lt;td&gt;Smart enough for coding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quantization&lt;/td&gt;
&lt;td&gt;Q4_0&lt;/td&gt;
&lt;td&gt;4x compressed, great quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size&lt;/td&gt;
&lt;td&gt;4.12 GiB&lt;/td&gt;
&lt;td&gt;Fits perfectly in 6GB VRAM ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Docker Desktop → &lt;strong&gt;Models&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Search &lt;code&gt;qwen2.5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Find &lt;code&gt;qwen2.5:7B-Q4_0&lt;/code&gt; → click &lt;strong&gt;Pull&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvgcfyjsi4elru19fe7h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvgcfyjsi4elru19fe7h.png" alt="Docker Models showing qwen2.5 variants with pull options"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⏳ ~4GB download. Grab a coffee.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  ⚡ Step 4 — Enable GPU-Accelerated Inference (CRITICAL)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Most guides miss this step entirely.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By default Docker Model Runner uses CPU only. One checkbox changes everything.&lt;/p&gt;

&lt;p&gt;Go to: &lt;strong&gt;Docker Desktop → Settings → AI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enable all three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Enable Docker Model Runner&lt;/li&gt;
&lt;li&gt;✅ Enable host-side TCP support → Port: &lt;code&gt;12434&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ Enable GPU-backed inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Click &lt;strong&gt;Apply&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ GPU inference downloads additional components — takes a few minutes the first time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Verify It's Working
&lt;/h3&gt;

&lt;p&gt;Open your browser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;http://localhost:12434/engines/v1/models
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"list"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker.io/ai/qwen2.5:7B-Q4_0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"owned_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check Response Speed
&lt;/h3&gt;

&lt;p&gt;Go to &lt;strong&gt;Docker Desktop → Models → Requests tab&lt;/strong&gt; after sending a prompt:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66rnekpop2paeljq9sce.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66rnekpop2paeljq9sce.png" alt="Docker Models Requests tab showing 273ms GPU response time"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU ⚡&lt;/td&gt;
&lt;td&gt;~273ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU 🐢&lt;/td&gt;
&lt;td&gt;~3000ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;That's a 10x speedup from one checkbox.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🔌 Step 5 — Connect VS Code via Continue.dev
&lt;/h2&gt;

&lt;p&gt;Docker Models exposes a &lt;strong&gt;local OpenAI-compatible API&lt;/strong&gt; at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;http://localhost:12434/engines/v1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👉 Key insight: &lt;strong&gt;any tool that supports OpenAI's API works here&lt;/strong&gt; — just change the URL from OpenAI's server to localhost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install Continue.dev
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;VS Code → Extensions (&lt;code&gt;Ctrl+Shift+X&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Search &lt;code&gt;Continue&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Install &lt;strong&gt;Continue - open-source AI code agent&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Configure It
&lt;/h3&gt;

&lt;p&gt;Open &lt;code&gt;C:\Users\&amp;lt;yourname&amp;gt;\.continue\config.yaml&lt;/code&gt; and paste:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Local Config&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.0.0&lt;/span&gt;
&lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Qwen2.5 Coder Local&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai/qwen2.5:7B-Q4_0&lt;/span&gt;
    &lt;span class="na"&gt;apiBase&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:12434/engines/v1&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Why &lt;code&gt;provider: openai&lt;/code&gt;?&lt;/strong&gt; Docker's API speaks the OpenAI protocol — same language, different address.&lt;/p&gt;

&lt;p&gt;💡 &lt;strong&gt;Why &lt;code&gt;apiKey: docker&lt;/code&gt;?&lt;/strong&gt; Just a placeholder — localhost needs no real auth.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Windows PowerShell Fix (if needed)
&lt;/h3&gt;

&lt;p&gt;If you hit a script execution error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Set-ExecutionPolicy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ExecutionPolicy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;RemoteSigned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Scope&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CurrentUser&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test It!
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljzt5rkxe99w3n6k5y6y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljzt5rkxe99w3n6k5y6y.png" alt="Continue.dev working with local Qwen model in VS Code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If it responds — &lt;strong&gt;you're done.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 Real Comparison — Copilot vs Local AI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prompt I tested:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write a Django REST Framework viewset for a User model
with JWT authentication and permission classes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GitHub Copilot output:
&lt;/h3&gt;

&lt;p&gt;Clean, complete, production-ready code with proper imports, docstrings, and edge case handling. Roughly 60 lines, zero follow-up needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local Qwen 7B output:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rest_framework&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;viewsets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;permissions&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rest_framework_simplejwt.authentication&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JWTAuthentication&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.serializers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UserSerializer&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UserViewSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;viewsets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ModelViewSet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;queryset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;serializer_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UserSerializer&lt;/span&gt;
    &lt;span class="n"&gt;authentication_classes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;JWTAuthentication&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;permission_classes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;permissions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsAuthenticated&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_queryset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Users can only see their own data
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Solid, functional, correct — but less complete than Copilot. Needed a follow-up prompt for edge cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verdict:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Copilot&lt;/th&gt;
&lt;th&gt;Local Qwen 7B&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Fast (GPU ⚡)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boilerplate&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-file context&lt;/td&gt;
&lt;td&gt;Better&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$10–19/mo&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;FREE&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;External servers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Your machine&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🏗️ Architecture (Mental Model)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────┐
│              YOUR MACHINE                    │
│                                              │
│  ┌─────────────────┐                         │
│  │  Docker Models  │  ← qwen2.5:7B-Q4_0     │
│  │  (AI Brain) 🧠  │    runs on your GPU     │
│  └────────┬────────┘                         │
│           │ exposes                          │
│           ▼                                  │
│  ┌─────────────────┐                         │
│  │  localhost:12434│  ← OpenAI-compatible    │
│  │  REST API  🔌   │    just like -p 8080    │
│  └────────┬────────┘                         │
│           │ connects to                      │
│           ▼                                  │
│  ┌─────────────────┐                         │
│  │  VS Code        │  ← Continue.dev         │
│  │  (Your IDE) 💻  │    extension            │
│  └─────────────────┘                         │
└─────────────────────────────────────────────┘

No internet. No API costs. No data leaks.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ⚠️ Where This Falls Short
&lt;/h2&gt;

&lt;p&gt;Be honest with yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ Not as smart as GPT-4-level models&lt;/li&gt;
&lt;li&gt;❌ Limited context window (struggles with large codebases)&lt;/li&gt;
&lt;li&gt;❌ Needs decent hardware for best results&lt;/li&gt;
&lt;li&gt;❌ Setup takes 30–60 minutes vs just paying for Copilot&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ❌ When You Should NOT Use This
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Working on large enterprise codebases&lt;/li&gt;
&lt;li&gt;Need best-in-class reasoning (GPT-4 level)&lt;/li&gt;
&lt;li&gt;Want zero setup / plug-and-play&lt;/li&gt;
&lt;li&gt;Low-end hardware (&amp;lt;16GB RAM, no GPU)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ❓ Troubleshooting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Connection error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TCP not enabled&lt;/td&gt;
&lt;td&gt;Docker Desktop → Settings → AI → Enable host-side TCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slow responses (&amp;gt;2s)&lt;/td&gt;
&lt;td&gt;GPU not enabled&lt;/td&gt;
&lt;td&gt;Docker Desktop → Settings → AI → Enable GPU-backed inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;npx&lt;/code&gt; script error&lt;/td&gt;
&lt;td&gt;PowerShell policy&lt;/td&gt;
&lt;td&gt;Run &lt;code&gt;Set-ExecutionPolicy RemoteSigned&lt;/code&gt; as Admin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model not showing&lt;/td&gt;
&lt;td&gt;Not pulled&lt;/td&gt;
&lt;td&gt;Docker Desktop → Models → Pull qwen2.5:7B-Q4_0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 What's Next?
&lt;/h2&gt;

&lt;p&gt;This is just the foundation.&lt;/p&gt;

&lt;p&gt;Docker's &lt;strong&gt;MCP Toolkit&lt;/strong&gt; can let your local AI actually &lt;em&gt;act&lt;/em&gt; — read your codebase, modify files, understand requirements. That's a full agent setup, and I'll cover it in Part 2.&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This setup won't replace Copilot for everyone.&lt;/p&gt;

&lt;p&gt;But if you care about privacy, cost, and full control over your tools — it's absolutely worth the 30 minutes to set up.&lt;/p&gt;




&lt;p&gt;If you’re running this setup (or planning to), I’d love to hear:&lt;br&gt;
👉 What hardware are you using?&lt;/p&gt;

&lt;p&gt;Let’s compare setups 👇&lt;/p&gt;

&lt;p&gt;Check out my knowledge vault where I document everything I learn hands-on:&lt;br&gt;
👉 &lt;a href="https://github.com/Riju007/dev-knowledge-vault" rel="noopener noreferrer"&gt;https://github.com/Riju007/dev-knowledge-vault&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;March 2026 | 🐳 Docker Desktop AI features&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>ai</category>
      <category>vscode</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
