OpenCode + Ollama in 2026: the setup that works (and the file-write bug you need to know about)

#opencode #ollama #localllm #setupguide

This article was originally published on aicoderscope.com

TL;DR: OpenCode v1.15.13 connects to Ollama via an OpenAI-compatible custom provider in five config lines. Code exploration and read-only analysis in Plan mode work reliably with a 14B+ model. Build mode file writes are broken for every local model tested to date — open bug #29940 means the write tool schema causes token truncation before the file path is ever generated. Use Ollama for exploration; keep a cloud API key for the moment you need actual files written.

	OpenCode + Ollama (local)	OpenCode + Claude Sonnet	OpenCode + BYOK Anthropic
Best for	Codebase exploration, read-only analysis	Daily coding tasks, file writes	Cost-controlled full agent work
Monthly cost	$0 (hardware only)	~$5–$15 API depending on usage	Same — pay-per-token
File writes	Broken (open bug #29940)	Works reliably	Works reliably
Privacy	Fully local, nothing leaves machine	Prompts to Anthropic	Prompts to Anthropic

Honest take: Install Ollama for Plan mode exploration — it's fast, free, and genuinely useful for understanding unfamiliar codebases. Don't expect to replace a cloud agent for actual coding tasks until the schema bug ships a fix.

What you're actually building

OpenCode is a terminal-first AI coding agent — a Go binary, a polished TUI, and an agent loop that can read files, write files, run shell commands, and answer questions about your code. It's open-source under MIT, version v1.15.13 as of May 30, 2026.

Ollama runs large language models locally. Version 0.30.0 ships an OpenAI-compatible REST API at http://localhost:11434/v1, which means any tool that knows how to speak to OpenAI can speak to Ollama instead.

OpenCode uses the Vercel AI SDK under the hood. It supports a @ai-sdk/openai-compatible provider type for exactly this use case — pointing at any OpenAI-compatible endpoint. Ollama is not a first-class bundled provider in OpenCode's source, but it works through this compatibility layer.

The combination: a free, fully local AI coding agent. That's the pitch. The reality has one sharp edge, documented below.

Hardware floor for local inference

Before configuring anything, map your hardware to realistic model choices.

GPU VRAM	Practical Ollama model	OpenCode use case
6–8 GB (RTX 4060)	qwen2.5-coder:7b Q4	Plan mode only — file writes fail
10–12 GB (RTX 3060 12GB)	qwen2.5-coder:14b Q4	Plan mode reliable, Build mode unreliable
16 GB (RTX 4060 Ti 16GB)	qwen2.5-coder:14b Q5	Same as above, better quality
24 GB (RTX 3090 / RTX 4090)	qwen2.5-coder:32b Q4	Best local tier; Build mode may work for small files
32+ GB RAM CPU-only	qwen2.5-coder:14b Q4 (slow)	Plan mode at 2–4 tok/s, acceptable for exploration

CPU-only inference with Ollama is viable for Plan mode since you're not blocked on low latency — a 2-second response time for a code question is acceptable. It is not viable for Build mode even if the file write bug were fixed; the slowness makes an agent loop unusable.

For deeper guidance on hardware tiers for local LLM inference, see runaihome.com's local AI hardware guide.

Step 1: Install Ollama 0.30.0 and pull a coding model

# macOS / Linux one-liner
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version
# ollama version is 0.30.0

On Windows, download the installer from ollama.com. The 0.30.0 release adds improved compatibility with NVIDIA hardware and faster model loading via an updated llama.cpp backend.

Pull the model you'll use. Pick based on your VRAM:

# 8 GB VRAM — fast, hits the file-write bug reliably
ollama pull qwen2.5-coder:7b

# 12–16 GB VRAM — best balance for Plan mode
ollama pull qwen2.5-coder:14b

# 24 GB VRAM — highest local quality
ollama pull qwen2.5-coder:32b

Confirm Ollama's OpenAI-compatible endpoint is running:

curl http://localhost:11434/v1/models
# Expected: {"object":"list","data":[{"id":"qwen2.5-coder:14b",...}]}

If you get Connection refused, run ollama serve in a separate terminal first. On macOS the menu bar app starts it automatically at login; on Linux you may need systemctl enable ollama.

Step 2: Install OpenCode v1.15.13

# Quick install
curl -fsSL https://opencode.ai/install | bash

# Or via npm
npm install -g opencode-ai@latest

# macOS / Linux via Homebrew
brew install anomalyco/tap/opencode

# Verify
opencode --version
# opencode 1.15.13

OpenCode installs to $HOME/.opencode/bin by default. Add that to your PATH if the shell doesn't pick it up automatically:

echo 'export PATH="$HOME/.opencode/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Step 3: Configure OpenCode to use Ollama

The config file lives at the project root as opencode.json (preferred for per-project settings) or in a global location that follows your OS conventions. Create opencode.json in your project directory:

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/qwen2.5-coder:14b",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:14b": {
          "name": "qwen2.5-coder:14b"
        }
      }
    }
  }
}

The model field uses the format "provider/model" — the provider key (ollama) must match the key in the provider object. The model name inside the models map must exactly match the tag you pulled with ollama pull.

To add multiple models (useful for switching between a fast 7B for quick lookups and a 14B for heavier analysis):

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/qwen2.5-coder:14b",
  "small_model": "ollama/qwen2.5-coder:7b",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:14b": {
          "name": "qwen2.5-coder:14b"
        },
        "qwen2.5-coder:7b": {
          "name": "qwen2.5-coder:7b"
        }
      }
    }
  }
}

small_model is used by OpenCode for lightweight auxiliary tasks like session title generation — routing those to the 7B keeps token burn low without sacrificing quality on the main task.

Step 4: Launch and verify

cd /path/to/your/project
opencode

OpenCode opens its TUI. If the config is valid, the model selector in the bottom status bar shows qwen2.5-coder:14b (ollama). If it shows a different model or prompts for an API key, the opencode.json isn't being picked up — check that the file is in the directory where you launched OpenCode.

Press Tab to switch between build (full agent access) and plan (read-only) modes. For local models, start in plan mode.

Ask it something about your codebase:

> explain the authentication flow in this codebase

Expected: the model reads your files, traces the auth code, and returns a clear explanation. With qwen2.5-coder:14b on a 12 GB VRAM GPU, response time is typically 3–8 seconds for a file-reading task. Acceptable for exploration work.

What works well: Plan mode code exploration

Plan mode is where OpenCode + Ollama genuinely earns its place. The agent reads files, follows call chains, answers architecture questions, and drafts step-by-step plans for you to review before any code changes.