DEV Community

Cover image for Stop Paying for GitHub Copilot: Build a Free, 100% Private AI Assistant Locally

Stop Paying for GitHub Copilot: Build a Free, 100% Private AI Assistant Locally

The landscape of AI coding assistants has completely shifted. While cloud-based subscriptions like GitHub Copilot or Claude Pro are excellent, they come with two major pain points: recurring monthly costs and privacy concerns regarding your proprietary code.

Thanks to the incredible advancements in efficiency, you can now run state-of-the-art coding LLMs directly on your local machine with zero latency, absolute privacy, and absolutely no cost.

In this guide, we will set up a blazing-fast, context-aware local AI coding assistant using Ollama and Continue.dev in VS Code.


Why Go Local?

  • πŸ’° Cost-Efficient: $0/month forever.
  • πŸ”’ 100% Private: Your code never leaves your local machine. Perfect for NDA-protected or enterprise projects.
  • ✈️ Offline Capability: Code with full AI assistance on a plane, a train, or anywhere without internet.
  • πŸš€ Customization: Swap models instantly depending on whether you need quick autocomplete or deep architectural reasoning.

Prerequisites

To get a smooth experience, you ideally need:

  • A modern machine (Apple Silicon M-series chips or a Windows/Linux PC with a dedicated Nvidia RTX GPU).
  • 16GB of RAM/VRAM minimum (8GB can work with highly compressed models, but 16GB+ is the sweet spot).

Step 1: Install Ollama and the Coding Model

Ollama is the easiest way to manage and run LLMs locally.

  1. Download and install Ollama for your OS.
  2. Open your terminal and run the following command to download Qwen2.5-Coder (7B) or DeepSeek-Coder. For most modern setups, the 7B (7-billion parameter) model offers the ultimate balance between speed and ChatGPT-4 level coding intelligence.
ollama run qwen2.5-coder:7b

Enter fullscreen mode Exit fullscreen mode

Note: If your machine has lower specs, try ollama run qwen2.5-coder:1.5b for lightning-fast autocompletion.

Once the download is complete, you can test it in the terminal, then minimize it. Ollama will run quietly in the background as a local API.


Step 2: Set Up Continue.dev in Your IDE

Continue is an open-source AI code assistant extension that seamlessly replaces the Copilot UI in VS Code or JetBrains IDEs.

  1. Open VS Code.
  2. Go to the Extensions marketplace (Ctrl+Shift+X or Cmd+Shift+X).
  3. Search for Continue and click Install.

Step 3: Configure Continue to Use Your Local Model

Once installed, a new icon will appear in your sidebar. Click it, then click the gear icon (βš™οΈ) at the bottom right of the Continue panel to open your config.json file.

Replace the contents of the file with the following configuration to link it to your local Ollama instance:

{
  "models": [
    {
      "title": "Qwen2.5-Coder 7B",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 1.5b",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b"
    },
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive unit test suite for the code above using Jest/Vitest.",
      "description": "Write unit tests"
    }
  ],
  "contextProviders": [
    { "name": "codebase", "params": {} },
    { "name": "openFiles", "params": {} }
  ]
}

Enter fullscreen mode Exit fullscreen mode

What this configuration does:

  • Chat Model: Uses the robust 7b model for complex tasks, refactoring, and general chat conversations in the sidebar (Cmd+L or Ctrl+L).
  • Tab Autocomplete: Uses the ultra-lightweight 1.5b model to instantly suggest inline code as you type, ensuring zero lag.
  • Context Awareness: Allows you to type @codebase in the chat to let the local AI read your entire project repository securely.

How to Maximize Your Local AI Workflow

Now that you are fully set up, here are three essential shortcuts to replace your paid workflow:

1. The Inline Edit (Cmd+I / Ctrl+I)

Highlight a block of code, press Cmd+I, and ask the local model to modify it directly. For example: "Refactor this fetch request to use async/await and add error handling."

2. Full Project Context (@codebase)

If you are debugging a tricky bug that spans multiple files, open the chat and type:

@codebase why is my authentication state resetting on page refresh?

The extension will index your local files locally, feed relevant snippets to Ollama, and give you an exact answer without a single byte uploaded to the cloud.

3. Automatic Doc Generation

Highlight a function, hit your chat shortcut, and ask: "Add JSDoc comments to this function explaining the parameters."


Final Thoughts

The era of paying $10-$20 a month per developer for basic AI autocompletion is coming to an end. By leveraging open-weights models and local orchestration tools, you gain total control over your development environment, secure your data, and save money.

Give it a spin and see how it handles your toughest codebases!


Connect with Me

If you found this guide helpful, let's connect and discuss modern development workflows!

Top comments (1)

Collapse
 
alexshev profile image
Alex Shev

Local coding assistants are getting practical, but the setup should be judged by more than privacy and price. Context quality, latency under real projects, and how the tool handles repo-specific instructions are what decide whether it sticks.