This article was originally published on aicoderscope.com
TL;DR: OpenCode v1.15.13 connects to Ollama via an OpenAI-compatible custom provider in five config lines. Code exploration and read-only analysis in Plan mode work reliably with a 14B+ model. Build mode file writes are broken for every local model tested to date — open bug #29940 means the write tool schema causes token truncation before the file path is ever generated. Use Ollama for exploration; keep a cloud API key for the moment you need actual files written.
| OpenCode + Ollama (local) | OpenCode + Claude Sonnet | OpenCode + BYOK Anthropic | |
|---|---|---|---|
| Best for | Codebase exploration, read-only analysis | Daily coding tasks, file writes | Cost-controlled full agent work |
| Monthly cost | $0 (hardware only) | ~$5–$15 API depending on usage | Same — pay-per-token |
| File writes | Broken (open bug #29940) | Works reliably | Works reliably |
| Privacy | Fully local, nothing leaves machine | Prompts to Anthropic | Prompts to Anthropic |
Honest take: Install Ollama for Plan mode exploration — it's fast, free, and genuinely useful for understanding unfamiliar codebases. Don't expect to replace a cloud agent for actual coding tasks until the schema bug ships a fix.
What you're actually building
OpenCode is a terminal-first AI coding agent — a Go binary, a polished TUI, and an agent loop that can read files, write files, run shell commands, and answer questions about your code. It's open-source under MIT, version v1.15.13 as of May 30, 2026.
Ollama runs large language models locally. Version 0.30.0 ships an OpenAI-compatible REST API at http://localhost:11434/v1, which means any tool that knows how to speak to OpenAI can speak to Ollama instead.
OpenCode uses the Vercel AI SDK under the hood. It supports a @ai-sdk/openai-compatible provider type for exactly this use case — pointing at any OpenAI-compatible endpoint. Ollama is not a first-class bundled provider in OpenCode's source, but it works through this compatibility layer.
The combination: a free, fully local AI coding agent. That's the pitch. The reality has one sharp edge, documented below.
Hardware floor for local inference
Before configuring anything, map your hardware to realistic model choices.
| GPU VRAM | Practical Ollama model | OpenCode use case |
|---|---|---|
| 6–8 GB (RTX 4060) | qwen2.5-coder:7b Q4 | Plan mode only — file writes fail |
| 10–12 GB (RTX 3060 12GB) | qwen2.5-coder:14b Q4 | Plan mode reliable, Build mode unreliable |
| 16 GB (RTX 4060 Ti 16GB) | qwen2.5-coder:14b Q5 | Same as above, better quality |
| 24 GB (RTX 3090 / RTX 4090) | qwen2.5-coder:32b Q4 | Best local tier; Build mode may work for small files |
| 32+ GB RAM CPU-only | qwen2.5-coder:14b Q4 (slow) | Plan mode at 2–4 tok/s, acceptable for exploration |
CPU-only inference with Ollama is viable for Plan mode since you're not blocked on low latency — a 2-second response time for a code question is acceptable. It is not viable for Build mode even if the file write bug were fixed; the slowness makes an agent loop unusable.
For deeper guidance on hardware tiers for local LLM inference, see runaihome.com's local AI hardware guide.
Step 1: Install Ollama 0.30.0 and pull a coding model
# macOS / Linux one-liner
curl -fsSL https://ollama.com/install.sh | sh
# Verify
ollama --version
# ollama version is 0.30.0
On Windows, download the installer from ollama.com. The 0.30.0 release adds improved compatibility with NVIDIA hardware and faster model loading via an updated llama.cpp backend.
Pull the model you'll use. Pick based on your VRAM:
# 8 GB VRAM — fast, hits the file-write bug reliably
ollama pull qwen2.5-coder:7b
# 12–16 GB VRAM — best balance for Plan mode
ollama pull qwen2.5-coder:14b
# 24 GB VRAM — highest local quality
ollama pull qwen2.5-coder:32b
Confirm Ollama's OpenAI-compatible endpoint is running:
curl http://localhost:11434/v1/models
# Expected: {"object":"list","data":[{"id":"qwen2.5-coder:14b",...}]}
If you get Connection refused, run ollama serve in a separate terminal first. On macOS the menu bar app starts it automatically at login; on Linux you may need systemctl enable ollama.
Step 2: Install OpenCode v1.15.13
# Quick install
curl -fsSL https://opencode.ai/install | bash
# Or via npm
npm install -g opencode-ai@latest
# macOS / Linux via Homebrew
brew install anomalyco/tap/opencode
# Verify
opencode --version
# opencode 1.15.13
OpenCode installs to $HOME/.opencode/bin by default. Add that to your PATH if the shell doesn't pick it up automatically:
echo 'export PATH="$HOME/.opencode/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
Step 3: Configure OpenCode to use Ollama
The config file lives at the project root as opencode.json (preferred for per-project settings) or in a global location that follows your OS conventions. Create opencode.json in your project directory:
{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/qwen2.5-coder:14b",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"qwen2.5-coder:14b": {
"name": "qwen2.5-coder:14b"
}
}
}
}
}
The model field uses the format "provider/model" — the provider key (ollama) must match the key in the provider object. The model name inside the models map must exactly match the tag you pulled with ollama pull.
To add multiple models (useful for switching between a fast 7B for quick lookups and a 14B for heavier analysis):
{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/qwen2.5-coder:14b",
"small_model": "ollama/qwen2.5-coder:7b",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"qwen2.5-coder:14b": {
"name": "qwen2.5-coder:14b"
},
"qwen2.5-coder:7b": {
"name": "qwen2.5-coder:7b"
}
}
}
}
}
small_model is used by OpenCode for lightweight auxiliary tasks like session title generation — routing those to the 7B keeps token burn low without sacrificing quality on the main task.
Step 4: Launch and verify
cd /path/to/your/project
opencode
OpenCode opens its TUI. If the config is valid, the model selector in the bottom status bar shows qwen2.5-coder:14b (ollama). If it shows a different model or prompts for an API key, the opencode.json isn't being picked up — check that the file is in the directory where you launched OpenCode.
Press Tab to switch between build (full agent access) and plan (read-only) modes. For local models, start in plan mode.
Ask it something about your codebase:
> explain the authentication flow in this codebase
Expected: the model reads your files, traces the auth code, and returns a clear explanation. With qwen2.5-coder:14b on a 12 GB VRAM GPU, response time is typically 3–8 seconds for a file-reading task. Acceptable for exploration work.
What works well: Plan mode code exploration
Plan mode is where OpenCode + Ollama genuinely earns its place. The agent reads files, follows call chains, answers architecture questions, and drafts step-by-step plans for you to review before any code changes.
Top comments (0)