Syed Ahmer Shah

Posted on Apr 12 • Edited on May 8

Build Your Own "Private Copilot" in 10 Minutes: Ollama, Continue, and DeepSeek-V3

#claude #chatgpt #coding #ai

You are paying $20 a month for GitHub Copilot. In our local economy, that is almost 6,000 PKR every single month. You are paying this "cloud tax" for a tool that lags the second your internet connection drops, goes offline when Microsoft has a server outage, and silently feeds your proprietary code into corporate training clusters.

If you want long-term freedom and leverage as a developer in 2026, you need to stop renting your tools and start owning them.

The era of relying exclusively on cloud-based AI is ending for serious engineers. The hardware has caught up. You can now run state-of-the-art models entirely offline, directly on your machine, with zero latency and absolute privacy.

This is not a theoretical concept. This is a practical, 10-minute setup that will replace your Copilot subscription today. We are going to use Ollama as the local engine, the Continue extension for VS Code, and a highly optimized DeepSeek model as the brain.

Here is the exact blueprint. No excuses. Let's build it.

The Architecture of a Local Copilot

To understand what we are building, you need to understand the three layers of an AI coding assistant:

The Inference Engine (Ollama): This is the software that loads the AI model into your computer's RAM/VRAM and serves it locally as an API.
The Brain (DeepSeek): This is the actual language model trained on code.
The Interface (Continue.dev): This is the VS Code extension that replaces the standard Copilot sidebar and autocomplete engine, redirecting the requests to your local Ollama server instead of the cloud.

Step 1: Install the Engine (Ollama)

Ollama is the standard for local LLM execution. It handles all the complex GPU acceleration and memory management silently in the background.

If you are on macOS or Windows, download the installer from the official site: ollama.com.

If you are on a Linux distribution or WSL, open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, verify the daemon is running:

ollama --version

You should see the current version output. That is your local server ready to accept models.

Step 2: Pull the Brain (DeepSeek Reality Check)

Let us address a hard technical truth right now: You are not going to run the full, uncompressed DeepSeek-V3 on a standard laptop. The full V3 is a massive Mixture-of-Experts model that requires serious server-grade clusters.

If you see tutorials claiming you can run the full V3 on 8GB of RAM, they are lying for clicks.

However, we do not need the massive generalized model. We need the highly distilled, quantized coding variants. For local machines with 16GB to 32GB of RAM, you want the DeepSeek-Coder series or the distilled V3 lightweight versions.

Open your terminal and pull the model:

ollama run deepseek-coder-v2

The download will take a few minutes depending on your connection. Once it finishes, you will be dropped into a local chat prompt. Test it by asking it to write a simple Python script.

Notice the speed. Notice that your Wi-Fi could be disconnected right now and it would still work.

Type /bye to exit. The model is now cached on your machine.

Step 3: Install the Interface (Continue)

We have the engine and the brain. Now we need it inside our editor.

Open VS Code
Go to the Extensions marketplace
Search for "Continue" (publisher: Continue)
Install it

Continue is an open-source AI code assistant. It gives you the familiar chat sidebar and inline autocomplete, but unlike proprietary tools, it lets you choose your API endpoint.

Step 4: The Configuration

By default, Continue might try to connect to free cloud APIs. We need to route it entirely to your local Ollama instance.

Click the gear icon in the bottom right of the Continue sidebar to open the config.json file. Replace the models and tabAutocompleteModel sections with the following:

{
  "models": [
    {
      "title": "Local DeepSeek Coder",
      "provider": "ollama",
      "model": "deepseek-coder-v2",
      "apiBase": "http://127.0.0.1:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek Autocomplete",
    "provider": "ollama",
    "model": "deepseek-coder-v2",
    "apiBase": "http://127.0.0.1:11434"
  },
  "allowAnonymousTelemetry": false
}

Save the file.

Look at what you just did. apiBase is pointing to your localhost. allowAnonymousTelemetry is false. Your code does not leave your machine. You have successfully air-gapped your development environment.

The Workflow in Practice

Restart VS Code to ensure the daemon connects properly.

Open a complex project file. Start typing a function. You will see the ghost text appear just like it did with GitHub Copilot. Press Tab to accept it.

Highlight a block of code, press Cmd/Ctrl + L to send it to the Continue sidebar, and tell it:

"Refactor this database query to prevent SQL injection."

The local model will read the context, stream the explanation, and offer a unified diff you can accept with one click.

The Hard Truth About Local AI

I will not sugarcoat this. Running models locally is a trade-off.

You are trading cloud dependency for hardware utilization. When the model is generating code, your fans will spin up. It will consume battery power. If you are running 8GB of RAM, it will be slow — you will need to pull an even smaller model like qwen2.5-coder:1.5b.

But consider the upside:

✅ You have completely removed a monthly financial drain
✅ You can take on freelance work with strict NDAs — you can legally guarantee their source code is never transmitted to third-party AI servers
✅ You have removed the latency of web requests
✅ You understand how AI orchestration actually works at the infrastructure level

Development is about building systems and understanding architecture, not just memorizing syntax. By setting this up, you have taken a step toward owning your tools.

Stop relying on black-box subscriptions. Build your own tools, keep your focus sharp, and get back to work.

Find Me Online

Platform	Link
✍️ Medium	@syedahmershah
💬 Dev.to	@syedahmershah
🧠 Hashnode	@syedahmershah
💻 GitHub	@ahmershahdev
🔗 LinkedIn	Syed Ahmer Shah
🧭 Beacons	Syed Ahmer Shah
🌐 Portfolio	ahmershah.dev

Tags: Ollama · DeepSeek · Continue · VS Code · Local AI · Privacy · Developer Tools

Top comments (2)

Archit Mittal • Apr 18

Running this exact setup on a 32GB M1 for about two months now — the biggest gotcha I hit was Continue's tab-autocomplete defaulting to a larger model than the chat one, which blew my RAM on first launch. Setting tabAutocompleteModel to a smaller 1.3B coder variant and keeping DeepSeek-V3 for chat/edit made it actually usable.

One addition worth mentioning: if you expose Ollama over OLLAMA_HOST=0.0.0.0:11434, you can point multiple machines on your LAN at the same model server. Super useful if you've got a workstation with a GPU and want to code from a thin laptop.

Mykola Kondratiuk • Apr 20

running local beats cloud copilots for one reason people skip - outages always pick the worst moment.