OpenHands 1.7.0 + Ollama in 2026: complete local setup, the Docker networking trap, and which models actually complete agentic tasks

#openhands #ollama #localllm #setupguide

This article was originally published on aicoderscope.com

TL;DR: OpenHands 1.7.0 + Ollama gives you a fully offline autonomous coding agent — no API key, no cloud, no data leaving your machine. The setup takes about 30 minutes if you know the two traps: Docker's host.docker.internal doesn't resolve on Linux by default, and the sandbox runtime image tag must exactly match the OpenHands image version. Miss either and the container fails silently or connects to nothing.

What you'll be able to do after this guide:

Run OpenHands autonomously against your local codebase with zero API cost
Give it a GitHub issue URL and watch it write the fix, run tests, and commit the result — all on your machine
Know which local models actually complete multi-file tasks vs. which ones stall out

	OpenHands + Ollama (local)	OpenHands Cloud Free	OpenHands Cloud Pro
Best for	Privacy-first teams, air-gapped infra, zero ongoing cost	Evaluation, 10 tasks/day	Daily driver with managed sandbox
Price / Cost	$0 after hardware	$0, BYOK at-cost	$20/mo + LLM at-cost
Model quality	Devstral Small 2: ~46.8% SWE-bench on OpenHands	Your key, your rate	Same
The catch	Requires GPU with ≥15 GB VRAM; Linux needs --add-host flag	10 conv/day hard limit	LLM bill on top

Honest take: If you have an RTX 4090 or equivalent sitting idle, the local path makes more sense than paying Claude API rates every time OpenHands spins up a multi-step task. On smaller hardware, Cloud Free is the smarter evaluation path before committing to the Docker setup.

Why local OpenHands

The OpenHands review covers the full architecture. The short version relevant to this guide: OpenHands is an autonomous coding agent backed by whichever LLM you configure. Swap the LLM and the agent behavior changes substantially.

Running locally means:

Zero data exposure — no code, no prompts, no environment variables reach an external server
No per-task API bill — model inference is just GPU utilization on your own machine
Freedom to use models that aren't available via commercial APIs (Apache 2.0-licensed Devstral Small 2 for example)

The trade-off is raw task completion capability. A frontier cloud model scores roughly 72% on SWE-bench when used as the OpenHands backend. The best local option as of June 2026, Devstral Small 2 (24B), scores approximately 46.8% on the same benchmark when driven by OpenHands. That 25-point gap is real — local inference misses roughly 70 more bug fixes per 277 test cases. For most real-world tasks (single-file fixes, feature additions, test generation), the local path still delivers. For the hardest 10% of issues — cross-cutting refactors, subtle invariant bugs, multi-service changes — the gap shows.

Hardware floor

Model size determines what you can run. OpenHands needs a model with solid tool-calling support — it issues file read/write calls, terminal commands, and browser interactions as structured tool invocations. Models without reliable JSON tool calling stall immediately.

Model	Ollama tag	VRAM (Q4_K_M)	Tool calling	Task quality
Devstral Small 2	`devstral-small-2`	~15 GB	Native, MoE-trained	Best local option; ~46.8% SWE-bench via OpenHands
qwen2.5-coder:32b	`qwen2.5-coder:32b`	~20 GB	Native	Strong; slightly behind Devstral on agentic multi-step
qwen2.5-coder:14b	`qwen2.5-coder:14b`	~9 GB	Native	Budget floor; completes single-file tasks reliably
Older llama3 / codellama	various	varies	None or unreliable	Not recommended — stalls on first tool call

Devstral Small 2 runs at Q4_K_M on 15 GB of VRAM, fitting inside any 24 GB card (RTX 4090, RTX 3090). For context: its 256k token window means you can feed an entire Python package without truncating. If your card tops out at 12–16 GB, use qwen2.5-coder:14b as your starting point — it completes simple one-file tasks and is a realistic daily evaluation model before upgrading hardware.

For a full breakdown of which models fit which GPU tier, the runaihome.com guide at Best Local AI Models by VRAM is worth checking before buying hardware for this setup.

Step 1: Install Ollama and pull a coding model

Current version: Ollama 0.30.6.

Linux:

curl -fsSL https://ollama.com/install.sh | sh

macOS / Windows: Download from ollama.com/download.

Before pulling a model, set the context window. OpenHands agentic loops consume 20k–50k tokens per task. Ollama's default 2k context silently truncates input — same trap as in Aider + Ollama setups and Continue.dev. Fix it before you start:

# Add to ~/.bashrc or ~/.zshrc
export OLLAMA_NUM_CTX=65536

Restart the Ollama service after setting the variable:

# Linux
sudo systemctl restart ollama

# macOS
# Restart via the menu bar icon, or kill and reopen Ollama.app

Now pull the model. For Devstral Small 2:

ollama pull devstral-small-2

This downloads the Q4_K_M quantized version at ~15 GB. Wait for the download to complete, then verify:

ollama list
# Expected output includes:
# devstral-small-2   ...   15.5 GB   ...

For the budget tier (qwen2.5-coder:14b):

ollama pull qwen2.5-coder:14b

Quick sanity check that tool calling is responding:

ollama run devstral-small-2 "What is 2 + 2? Use a tool call to return the answer."
# A model with proper tool-call support produces a structured response.
# A model without it just says "4" in plain text — not what OpenHands needs.

Step 2: Run OpenHands via Docker

OpenHands requires Docker because every task runs inside an isolated sandbox container. The isolation prevents runaway agent processes from affecting your host filesystem.

Install Docker if you haven't: docker.com/get-docker. On Linux, confirm your user is in the docker group:

sudo usermod -aG docker $USER
newgrp docker

The version-matching requirement

OpenHands' main container and the sandbox runtime container must run the same version. Mismatched tags produce a cryptic container start error that has no obvious diagnostic message. Always pin both explicitly.

For OpenHands 1.7.0:

export OH_VERSION=1.7.0
export RUNTIME_TAG=1.7.0-nikolaik

The Linux networking trap

On macOS and Windows (Docker Desktop), host.docker.internal resolves to your host machine automatically. On Linux, it does not. Ollama runs on your host at port 11434; the OpenHands container needs to reach it via host.docker.internal. Without the --add-host flag, OpenHands starts, you configure Ollama as the LLM provider, and every single task attempt returns a connection error without a clear reason why.

The fix is one flag in the Docker run command: --add-host host.docker.internal:host-gateway.

The full Docker run command

docker run -it --rm \
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:${RUNTIME_TAG} \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands:/.openhands \
  -v ~/workspace:/opt/workspace_base \
  -p 3000:3000 \
  --add-host host.docker.internal:host-gateway \
  docker.all-hands.dev/all-hands-ai/openhands:${OH_VERSION}

Volume mounts explained:

/var/run/docker.sock — lets OpenHands spawn sandbox containers (required)
~/.openhands:/.openhands — persists your LLM config across container restarts
~/workspace:/opt/workspace_base — the local directory OpenHands works in; set this to your actual project path

macOS note: On macOS with Docker Desktop, you can omit --add-host and replace host.docker.internal in the Ollama URL with the same token — Docker Desktop handles the resolution automati