Twisted-Code'r

Posted on Mar 21

I Ditched GitHub Copilot and Now I Code With AI for Free — Locally

#ai #productivity #tutorial #opensource

I was paying $10/month for GitHub Copilot. It's fine. It works. But it means every keystroke I type goes to Microsoft's servers, my code context gets shipped off somewhere, and I'm locked into whatever pricing they decide next year.

Then I found out I could run a better-than-Copilot setup on my own machine, completely free, with no data leaving my computer.

Here's exactly how I did it, and how you can too.

What You Actually Need

Before anything else — honest expectations:

A machine with at least 8GB RAM (16GB is better)
~5GB free disk space per model
A decent CPU, or an NVIDIA/Apple Silicon GPU for speed
About 20 minutes of setup time

No API keys. No credit cards. No subscriptions.

Step 1: Install Ollama

Ollama is the piece that makes all of this possible. It's basically a runtime that lets you pull and run open-source LLMs the same way Docker lets you run containers.

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows:
Download the installer from ollama.com.

Verify it's running:

ollama --version

Step 2: Pull a Coding Model

This is the part where you choose your AI. For coding specifically, these are the ones worth using:

For most people (8GB RAM):

ollama pull qwen2.5-coder:7b

Qwen 2.5 Coder from Alibaba is genuinely impressive. Beats older Copilot versions on HumanEval benchmarks. Specialised entirely for code.

If you have 16GB+ RAM:

ollama pull qwen2.5-coder:14b

Noticeably better at multi-file context and explaining complex logic.

If you're on Apple Silicon (M1/M2/M3):

ollama pull deepseek-coder-v2:16b

Runs fast on Metal. Great at refactoring and docstring generation.

Absolute minimum (4GB RAM):

ollama pull qwen2.5-coder:3b

Smaller but still surprisingly capable for autocomplete and simple functions.

Test it immediately:

ollama run qwen2.5-coder:7b "write a Python function to flatten a nested list"

If you get a clean response, you're good.

Step 3: Pick Your Editor Integration

Now the fun part — making it feel like Copilot inside your actual editor.

VS Code → Continue

Continue is the open-source Copilot alternative. It's a VS Code (and JetBrains) extension that hooks directly into your local Ollama instance.

Install the Continue extension from the VS Code marketplace
Open Continue's config file (~/.continue/config.json) and add:

{
  "models": [
    {
      "title": "Qwen Coder 7B",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen Coder 7B",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b"
  }
}

Press Tab to accept inline completions, Cmd+I (Mac) or Ctrl+I (Windows) to open the chat panel.

That's it. You now have inline autocomplete, a chat panel, and codebase-aware Q&A — all local.

Neovim → gen.nvim or avante.nvim

If you're a Neovim user, add this to your config with gen.nvim:

require('gen').setup({
  model = "qwen2.5-coder:7b",
  host = "localhost",
  port = "11434",
})

Then :Gen opens a prompt. Select code visually and run :Gen Enhance_Code or :Gen Add_Tests.

JetBrains → Continue (same plugin, different install)

Install Continue from the JetBrains marketplace. Same config file works.

Step 4: Supercharge It With Open WebUI (Optional but Worth It)

Open WebUI gives you a ChatGPT-like interface for your local models. Useful when you want to have a longer conversation about architecture, paste in a whole file, or explain a bug.

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Open http://localhost:3000, connect to your Ollama instance, and you have a full ChatGPT-style interface running entirely offline.

Real-World Performance

After a month of daily use on a MacBook Pro M2 with 16GB RAM, here's what I found:

Task	Qwen 2.5 Coder 7B	GitHub Copilot
Simple function completion	✅ Excellent	✅ Excellent
Refactoring a 100-line file	✅ Good	✅ Good
Explaining unfamiliar code	✅ Very good	✅ Very good
Multi-file context	⚠️ Limited	✅ Better
Speed (M2 Mac)	~2–3 tok/sec	Near instant
Privacy	✅ 100% local	❌ Sent to servers
Cost	✅ Free	❌ $10/month

Speed is the real tradeoff. On CPU-only machines, responses are slower than a cloud API. On Apple Silicon or an NVIDIA GPU, the gap closes a lot.

The Part Nobody Tells You

Prompting matters more locally than with cloud models.

Cloud models like GPT-4 or Claude have been fine-tuned to be forgiving — they infer what you meant even if you're vague. Smaller local models are more literal. A vague prompt gets a vague answer.

Instead of:

fix this function

Try:

This Python function is supposed to parse a JWT token and return the 
payload as a dict. It currently throws a KeyError when the token is 
expired. Fix the expiry handling and add a try/except that returns None 
on any decode failure.

More context = dramatically better output. Once I adjusted my prompting habit, the quality difference between local and cloud shrank a lot.

Bonus: Free Cloud Options When Local Isn't Enough

Sometimes you need a bigger model for a hard problem. These are genuinely free with no credit card:

Groq — Llama 3.1 70B running at insane speed. Free tier is generous.
Google AI Studio — Gemini 1.5 Flash, 1M token context window, free.
Cerebras — 1M tokens/day free, fastest inference available right now.

You can configure all of these in Continue the same way as Ollama — just swap the provider and add an API key.

TL;DR

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a coding model
ollama pull qwen2.5-coder:7b

# 3. Install Continue extension in VS Code

# 4. Start coding for free

Your code stays on your machine. You pay nothing. It's genuinely good enough for daily use.

The setup takes 20 minutes and you'll never think about it again.

If this helped, I'm posting more practical AI dev workflow stuff — follow along. And if your local setup is different from mine, drop it in the comments — curious what models people are running.

DEV Community