Chappie

Posted on Mar 27

How to Run Local LLMs for Coding (No Cloud, No API Keys)

#ai #productivity #programming #tutorial

I got tired of paying for API calls. Every time I wanted an AI coding assistant, it was another subscription, another API key, another company reading my code. So I went local. Here's exactly how to do it.

Why Local LLMs for Coding?

Three reasons:

Privacy - Your code never leaves your machine
Cost - Zero ongoing fees after initial setup
Speed - No network latency, works offline

The tradeoff? You need decent hardware. But if you've got 16GB+ RAM and a GPU from the last few years, you're set.

The Stack: Ollama + Continue

Forget complicated setups. Ollama makes running local models trivially easy, and Continue gives you a VS Code/Cursor-style experience without the cloud dependency.

Step 1: Install Ollama

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows - download from ollama.com

That's it. No Docker, no Python environments, no dependency hell.

Step 2: Pull a Coding Model

Not all models are equal for code. Here's what actually works:

# Best overall for coding (needs 16GB+ RAM)
ollama pull deepseek-coder-v2:16b

# Lighter option (8GB RAM)
ollama pull codellama:7b

# For code review and explanations
ollama pull mistral:7b

DeepSeek Coder v2 is genuinely impressive - it rivals GPT-4 for most coding tasks. If you're RAM-constrained, CodeLlama 7B still handles autocomplete and simple generations well.

Step 3: Test It

ollama run deepseek-coder-v2:16b
>>> Write a Python function to parse JSON from a file safely

You should get a response in seconds. If it's slow, you're probably swapping to disk - try a smaller model.

Step 4: Connect to Your Editor

Here's where it gets good. Install the Continue extension for VS Code:

Open VS Code
Extensions → Search "Continue"
Install it
Open Continue sidebar (Cmd/Ctrl + L)

Configure it to use Ollama. Create ~/.continue/config.json:

{
  "models": [
    {
      "title": "DeepSeek Coder Local",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "CodeLlama",
    "provider": "ollama",
    "model": "codellama:7b"
  }
}

Now you've got:

Chat with your codebase (Cmd+L)
Inline edits (Cmd+I)
Tab autocomplete

All running locally. Zero API calls.

Real-World Performance

I've been using this setup for three months. Here's the honest assessment:

What works great:

Autocomplete (feels like Copilot)
Explaining code
Writing boilerplate
Simple refactoring
Regex and SQL generation

What's mediocre:

Complex multi-file changes
Understanding large codebases
Subtle bug detection

What still needs cloud models:

Cutting-edge reasoning (still reach for Claude for architecture)
Very large context windows

For 80% of daily coding tasks, local is enough. For the other 20%, I still use Claude - but my API bill dropped from $80/month to under $15.

Optimizing Performance

GPU Acceleration

If you have an NVIDIA GPU:

# Check if Ollama detects your GPU
ollama ps

# Should show CUDA if working

For AMD GPUs on Linux, Ollama supports ROCm. M1/M2/M3 Macs get Metal acceleration automatically.

Multiple Models

I keep two running:

# Terminal 1 - for chat
ollama serve

# Terminal 2 - load models
ollama run deepseek-coder-v2:16b  # stays in memory

First load takes 10-30 seconds. After that, it's instant.

Memory Management

Models stay loaded in RAM. To unload:

ollama stop deepseek-coder-v2:16b

Or set automatic unloading in the Ollama config.

Free Copilot Alternative? Yes, Actually

This setup is a legitimate free Copilot alternative. The autocomplete is comparable, the chat is sometimes better (DeepSeek Coder handles Python and TypeScript particularly well), and you own your data.

Is it as good as Copilot Enterprise or Claude? No. But it's free, private, and works offline. For indie devs and privacy-conscious teams, that's the right tradeoff.

Quick Comparison

Feature	Copilot	This Setup
Cost	$10-19/mo	Free
Privacy	Cloud	Local
Offline	No	Yes
Quality	Better	Good enough
Setup	2 min	15 min

What's Next

Local models are improving fast. Six months ago this wasn't viable. Now it's my daily driver. In another year, the gap with cloud models will shrink further.

Start with Ollama + Continue. See if it fits your workflow. Worst case, you've lost 15 minutes. Best case, you've cut your AI coding costs to zero.

More at dev.to/cumulus

DEV Community