Jovan Chan

Posted on Jun 2 • Originally published at aicoderscope.com

Aider + Ollama Local LLM Setup Guide 2026: Official Config, Model Selection, Context Fix

#aider #ollama #localllm #setupguide

This article was originally published on aicoderscope.com

Aider plus Ollama is the cleanest path to fully local, zero-API-cost AI pair programming in 2026. It's also a setup where 80% of the public tutorials silently produce broken output, because of a default in Ollama that truncates your context without telling you. This guide gets you a working install, the right model for your hardware, and explicitly walks through the failure modes that most other tutorials skip.

Aider is a command-line AI pair programmer that edits files, runs git commits automatically, and works against any LLM endpoint—including a local Ollama server. The combination gives you a fully offline coding assistant with no API bills and no data leaving your machine. If you have a GPU with 8 GB of VRAM or more, you have enough to start.

What You'll Have at the End

A working setup that lets you:

cd ~/projects/my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

…and then have a conversation with a local LLM that reads your code, suggests changes, edits files directly, and commits them to git—all without an internet connection. Total install time: about 20 minutes if you already have an NVIDIA GPU set up. Total cost after install: $0.

Step 0: Hardware Reality Check

The model size you can run is bounded by your VRAM. Approximate fits for the recommended coding models:

Model	VRAM needed (Q4)	Coding quality	Practical tier
qwen2.5-coder:7b	~5 GB	Decent for completion, weak on refactors	RTX 3060 12GB, RTX 4060 8GB
qwen2.5-coder:14b	~9 GB	Good — solid for everyday Aider work	RTX 4060 Ti 16GB, RTX 3090
qwen2.5-coder:32b	~20 GB	Best local option	RTX 4090, RTX 5090, dual 3090s
deepseek-coder-v2:16b (MoE)	~10 GB	Excellent for completion, weaker on agent tasks	RTX 4060 Ti 16GB+

If you're running on an Apple Silicon Mac, unified memory takes the place of VRAM—a 32 GB Mac Studio fits the 14B comfortably, a 64 GB+ machine fits the 32B. For a deeper breakdown of which model actually fits where, our sister site has a Best Local AI Models by VRAM tier guide that covers this in detail.

The honest baseline: don't bother with a 7B coder model on Aider for anything past one-line edits. It's the equivalent of pairing with someone who has read your code once and forgotten most of it. The 14B is the practical floor for getting work done; the 32B is what makes Aider feel close to a paid cloud service.

Step 1: Install Ollama

On Linux:

curl -fsSL https://ollama.com/install.sh | sh

On macOS: download the installer from ollama.com/download and run it.

On Windows: same, download the installer.

Verify the install:

ollama --version

You should see a version string. Ollama installs itself as a service that runs on http://127.0.0.1:11434 by default.

Pull the model

Pick the largest coder model your VRAM tier supports from the table above:

ollama pull qwen2.5-coder:14b

This downloads ~9 GB. Wait for it to finish. Verify:

ollama list

You should see the model listed.

Step 2: Fix the Context Window Trap

This is the step that breaks 80% of public Aider+Ollama tutorials, including some that have been quoted in posts on Hacker News. Ollama defaults to a 2,048-token context window and silently discards anything beyond that.

For Aider, which loads your code into context, this is catastrophic. With a 2k window, Aider sees the first ~1,500 tokens of your repo (maybe 2-3 small files), then everything else is silently dropped. The model has no idea your project has 50 more files. Aider has no way to know its context is being truncated. The output looks plausible but is operating on a tiny fragment of the actual codebase.

The fix is to set OLLAMA_CONTEXT_LENGTH before starting Ollama. The official Aider docs are explicit about this. Stop Ollama if it's running:

# Linux/macOS
pkill ollama
# Or on systemd-based Linux:
sudo systemctl stop ollama

Restart with a larger context:

OLLAMA_CONTEXT_LENGTH=16384 ollama serve

16k is a reasonable floor for serious Aider work. If your codebase is large, push it higher (32k or 64k if your model supports it—qwen2.5-coder supports up to 128k context, but every additional token costs VRAM). Aider itself adjusts the context window per request to fit your prompt plus 8k for the reply, so setting OLLAMA_CONTEXT_LENGTH higher gives Aider room to work.

For a permanent fix on Linux with systemd, edit /etc/systemd/system/ollama.service.d/override.conf:

[Service]
Environment="OLLAMA_CONTEXT_LENGTH=16384"

Then sudo systemctl daemon-reload && sudo systemctl restart ollama. The setting now persists across reboots.

Step 3: Install Aider

The cleanest install path is via aider-install, which handles its own Python environment so it doesn't conflict with project dependencies:

python -m pip install aider-install
aider-install

Python 3.8 through 3.13 is supported with this installer. The traditional pip install works too if you want manual control:

pip install -U --upgrade-strategy only-if-needed aider-chat

For pip install, Python 3.9–3.12 is the supported range.

Verify:

aider --help

If you get aider: command not found, your Python user-bin directory isn't on PATH. The workaround is python -m aider instead of aider.

Step 4: Connect Aider to Ollama

Two ways: environment variable or config file.

Quick path: environment variable

Linux/macOS:

export OLLAMA_API_BASE=http://127.0.0.1:11434
cd ~/projects/my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

Windows (PowerShell):

$env:OLLAMA_API_BASE = "http://127.0.0.1:11434"
cd $HOME\projects\my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

Important: use ollama_chat/<model>, not ollama/<model>. The _chat suffix routes through the Ollama chat endpoint, which gives proper instruction-following. Without it, Aider talks to the base completion endpoint and the model behaves like it's autocompleting tokens rather than responding to a prompt.

Persistent path: config file

Drop a .aider.conf.yml in your home directory or repo root:

model: ollama_chat/qwen2.5-coder:14b
openai-api-base: http://127.0.0.1:11434
weak-model: ollama_chat/qwen2.5-coder:7b
auto-commits: true
dirty-commits: true

The weak-model field is what Aider uses for cheap operations like commit message generation. Pointing it at a smaller, faster model saves seconds per interaction.

Step 5: The Workflow That Actually Works

A first session typically goes:

cd ~/projects/my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

Aider scans your repo's git history, lists files, and gives you a prompt. From here:

/add path/to/file.py — explicitly add a file to context. Aider only sees what's added; this is intentional to control token usage.
/drop path/to/file.py — remove a file from context once you're done with it.
/diff — show pending changes before Aider commits.
Plain prompt — describe what you want. Example: "Refactor the auth middleware to use bcrypt instead of MD5."
/undo — roll back the last Aider commit if the change was wrong.
Ctrl+C — stop a running generation.

Aider auto-commits each accepted change with a sensible commit message. You can disable this with --no-auto-commits if you want manual control. The auto-commit pattern is what

DEV Community