DEV Community

Chappie
Chappie

Posted on

How to Run Local LLMs for Coding (No Cloud, No API Keys)

I got tired of paying for API calls. Every time I wanted an AI coding assistant, it was another subscription, another API key, another company reading my code. So I went local. Here's exactly how to do it.

Why Local LLMs for Coding?

Three reasons:

  1. Privacy - Your code never leaves your machine
  2. Cost - Zero ongoing fees after initial setup
  3. Speed - No network latency, works offline

The tradeoff? You need decent hardware. But if you've got 16GB+ RAM and a GPU from the last few years, you're set.

The Stack: Ollama + Continue

Forget complicated setups. Ollama makes running local models trivially easy, and Continue gives you a VS Code/Cursor-style experience without the cloud dependency.

Step 1: Install Ollama

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows - download from ollama.com
Enter fullscreen mode Exit fullscreen mode

That's it. No Docker, no Python environments, no dependency hell.

Step 2: Pull a Coding Model

Not all models are equal for code. Here's what actually works:

# Best overall for coding (needs 16GB+ RAM)
ollama pull deepseek-coder-v2:16b

# Lighter option (8GB RAM)
ollama pull codellama:7b

# For code review and explanations
ollama pull mistral:7b
Enter fullscreen mode Exit fullscreen mode

DeepSeek Coder v2 is genuinely impressive - it rivals GPT-4 for most coding tasks. If you're RAM-constrained, CodeLlama 7B still handles autocomplete and simple generations well.

Step 3: Test It

ollama run deepseek-coder-v2:16b
>>> Write a Python function to parse JSON from a file safely
Enter fullscreen mode Exit fullscreen mode

You should get a response in seconds. If it's slow, you're probably swapping to disk - try a smaller model.

Step 4: Connect to Your Editor

Here's where it gets good. Install the Continue extension for VS Code:

  1. Open VS Code
  2. Extensions → Search "Continue"
  3. Install it
  4. Open Continue sidebar (Cmd/Ctrl + L)

Configure it to use Ollama. Create ~/.continue/config.json:

{
  "models": [
    {
      "title": "DeepSeek Coder Local",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "CodeLlama",
    "provider": "ollama",
    "model": "codellama:7b"
  }
}
Enter fullscreen mode Exit fullscreen mode

Now you've got:

  • Chat with your codebase (Cmd+L)
  • Inline edits (Cmd+I)
  • Tab autocomplete

All running locally. Zero API calls.

Real-World Performance

I've been using this setup for three months. Here's the honest assessment:

What works great:

  • Autocomplete (feels like Copilot)
  • Explaining code
  • Writing boilerplate
  • Simple refactoring
  • Regex and SQL generation

What's mediocre:

  • Complex multi-file changes
  • Understanding large codebases
  • Subtle bug detection

What still needs cloud models:

  • Cutting-edge reasoning (still reach for Claude for architecture)
  • Very large context windows

For 80% of daily coding tasks, local is enough. For the other 20%, I still use Claude - but my API bill dropped from $80/month to under $15.

Optimizing Performance

GPU Acceleration

If you have an NVIDIA GPU:

# Check if Ollama detects your GPU
ollama ps

# Should show CUDA if working
Enter fullscreen mode Exit fullscreen mode

For AMD GPUs on Linux, Ollama supports ROCm. M1/M2/M3 Macs get Metal acceleration automatically.

Multiple Models

I keep two running:

# Terminal 1 - for chat
ollama serve

# Terminal 2 - load models
ollama run deepseek-coder-v2:16b  # stays in memory
Enter fullscreen mode Exit fullscreen mode

First load takes 10-30 seconds. After that, it's instant.

Memory Management

Models stay loaded in RAM. To unload:

ollama stop deepseek-coder-v2:16b
Enter fullscreen mode Exit fullscreen mode

Or set automatic unloading in the Ollama config.

Free Copilot Alternative? Yes, Actually

This setup is a legitimate free Copilot alternative. The autocomplete is comparable, the chat is sometimes better (DeepSeek Coder handles Python and TypeScript particularly well), and you own your data.

Is it as good as Copilot Enterprise or Claude? No. But it's free, private, and works offline. For indie devs and privacy-conscious teams, that's the right tradeoff.

Quick Comparison

Feature Copilot This Setup
Cost $10-19/mo Free
Privacy Cloud Local
Offline No Yes
Quality Better Good enough
Setup 2 min 15 min

What's Next

Local models are improving fast. Six months ago this wasn't viable. Now it's my daily driver. In another year, the gap with cloud models will shrink further.

Start with Ollama + Continue. See if it fits your workflow. Worst case, you've lost 15 minutes. Best case, you've cut your AI coding costs to zero.


More at dev.to/cumulus

Top comments (0)