I got tired of paying for API calls. Every time I wanted an AI coding assistant, it was another subscription, another API key, another company reading my code. So I went local. Here's exactly how to do it.
Why Local LLMs for Coding?
Three reasons:
- Privacy - Your code never leaves your machine
- Cost - Zero ongoing fees after initial setup
- Speed - No network latency, works offline
The tradeoff? You need decent hardware. But if you've got 16GB+ RAM and a GPU from the last few years, you're set.
The Stack: Ollama + Continue
Forget complicated setups. Ollama makes running local models trivially easy, and Continue gives you a VS Code/Cursor-style experience without the cloud dependency.
Step 1: Install Ollama
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows - download from ollama.com
That's it. No Docker, no Python environments, no dependency hell.
Step 2: Pull a Coding Model
Not all models are equal for code. Here's what actually works:
# Best overall for coding (needs 16GB+ RAM)
ollama pull deepseek-coder-v2:16b
# Lighter option (8GB RAM)
ollama pull codellama:7b
# For code review and explanations
ollama pull mistral:7b
DeepSeek Coder v2 is genuinely impressive - it rivals GPT-4 for most coding tasks. If you're RAM-constrained, CodeLlama 7B still handles autocomplete and simple generations well.
Step 3: Test It
ollama run deepseek-coder-v2:16b
>>> Write a Python function to parse JSON from a file safely
You should get a response in seconds. If it's slow, you're probably swapping to disk - try a smaller model.
Step 4: Connect to Your Editor
Here's where it gets good. Install the Continue extension for VS Code:
- Open VS Code
- Extensions → Search "Continue"
- Install it
- Open Continue sidebar (Cmd/Ctrl + L)
Configure it to use Ollama. Create ~/.continue/config.json:
{
"models": [
{
"title": "DeepSeek Coder Local",
"provider": "ollama",
"model": "deepseek-coder-v2:16b"
}
],
"tabAutocompleteModel": {
"title": "CodeLlama",
"provider": "ollama",
"model": "codellama:7b"
}
}
Now you've got:
- Chat with your codebase (Cmd+L)
- Inline edits (Cmd+I)
- Tab autocomplete
All running locally. Zero API calls.
Real-World Performance
I've been using this setup for three months. Here's the honest assessment:
What works great:
- Autocomplete (feels like Copilot)
- Explaining code
- Writing boilerplate
- Simple refactoring
- Regex and SQL generation
What's mediocre:
- Complex multi-file changes
- Understanding large codebases
- Subtle bug detection
What still needs cloud models:
- Cutting-edge reasoning (still reach for Claude for architecture)
- Very large context windows
For 80% of daily coding tasks, local is enough. For the other 20%, I still use Claude - but my API bill dropped from $80/month to under $15.
Optimizing Performance
GPU Acceleration
If you have an NVIDIA GPU:
# Check if Ollama detects your GPU
ollama ps
# Should show CUDA if working
For AMD GPUs on Linux, Ollama supports ROCm. M1/M2/M3 Macs get Metal acceleration automatically.
Multiple Models
I keep two running:
# Terminal 1 - for chat
ollama serve
# Terminal 2 - load models
ollama run deepseek-coder-v2:16b # stays in memory
First load takes 10-30 seconds. After that, it's instant.
Memory Management
Models stay loaded in RAM. To unload:
ollama stop deepseek-coder-v2:16b
Or set automatic unloading in the Ollama config.
Free Copilot Alternative? Yes, Actually
This setup is a legitimate free Copilot alternative. The autocomplete is comparable, the chat is sometimes better (DeepSeek Coder handles Python and TypeScript particularly well), and you own your data.
Is it as good as Copilot Enterprise or Claude? No. But it's free, private, and works offline. For indie devs and privacy-conscious teams, that's the right tradeoff.
Quick Comparison
| Feature | Copilot | This Setup |
|---|---|---|
| Cost | $10-19/mo | Free |
| Privacy | Cloud | Local |
| Offline | No | Yes |
| Quality | Better | Good enough |
| Setup | 2 min | 15 min |
What's Next
Local models are improving fast. Six months ago this wasn't viable. Now it's my daily driver. In another year, the gap with cloud models will shrink further.
Start with Ollama + Continue. See if it fits your workflow. Worst case, you've lost 15 minutes. Best case, you've cut your AI coding costs to zero.
More at dev.to/cumulus
Top comments (0)