DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at aicoderscope.com

OpenHands 1.7.0 + Ollama in 2026: complete local setup, the Docker networking trap, and which models actually complete agentic tasks

This article was originally published on aicoderscope.com

TL;DR: OpenHands 1.7.0 + Ollama gives you a fully offline autonomous coding agent — no API key, no cloud, no data leaving your machine. The setup takes about 30 minutes if you know the two traps: Docker's host.docker.internal doesn't resolve on Linux by default, and the sandbox runtime image tag must exactly match the OpenHands image version. Miss either and the container fails silently or connects to nothing.

What you'll be able to do after this guide:

  • Run OpenHands autonomously against your local codebase with zero API cost
  • Give it a GitHub issue URL and watch it write the fix, run tests, and commit the result — all on your machine
  • Know which local models actually complete multi-file tasks vs. which ones stall out
OpenHands + Ollama (local) OpenHands Cloud Free OpenHands Cloud Pro
Best for Privacy-first teams, air-gapped infra, zero ongoing cost Evaluation, 10 tasks/day Daily driver with managed sandbox
Price / Cost $0 after hardware $0, BYOK at-cost $20/mo + LLM at-cost
Model quality Devstral Small 2: ~46.8% SWE-bench on OpenHands Your key, your rate Same
The catch Requires GPU with ≥15 GB VRAM; Linux needs --add-host flag 10 conv/day hard limit LLM bill on top

Honest take: If you have an RTX 4090 or equivalent sitting idle, the local path makes more sense than paying Claude API rates every time OpenHands spins up a multi-step task. On smaller hardware, Cloud Free is the smarter evaluation path before committing to the Docker setup.


Why local OpenHands

The OpenHands review covers the full architecture. The short version relevant to this guide: OpenHands is an autonomous coding agent backed by whichever LLM you configure. Swap the LLM and the agent behavior changes substantially.

Running locally means:

  • Zero data exposure — no code, no prompts, no environment variables reach an external server
  • No per-task API bill — model inference is just GPU utilization on your own machine
  • Freedom to use models that aren't available via commercial APIs (Apache 2.0-licensed Devstral Small 2 for example)

The trade-off is raw task completion capability. A frontier cloud model scores roughly 72% on SWE-bench when used as the OpenHands backend. The best local option as of June 2026, Devstral Small 2 (24B), scores approximately 46.8% on the same benchmark when driven by OpenHands. That 25-point gap is real — local inference misses roughly 70 more bug fixes per 277 test cases. For most real-world tasks (single-file fixes, feature additions, test generation), the local path still delivers. For the hardest 10% of issues — cross-cutting refactors, subtle invariant bugs, multi-service changes — the gap shows.


Hardware floor

Model size determines what you can run. OpenHands needs a model with solid tool-calling support — it issues file read/write calls, terminal commands, and browser interactions as structured tool invocations. Models without reliable JSON tool calling stall immediately.

Model Ollama tag VRAM (Q4_K_M) Tool calling Task quality
Devstral Small 2 devstral-small-2 ~15 GB Native, MoE-trained Best local option; ~46.8% SWE-bench via OpenHands
qwen2.5-coder:32b qwen2.5-coder:32b ~20 GB Native Strong; slightly behind Devstral on agentic multi-step
qwen2.5-coder:14b qwen2.5-coder:14b ~9 GB Native Budget floor; completes single-file tasks reliably
Older llama3 / codellama various varies None or unreliable Not recommended — stalls on first tool call

Devstral Small 2 runs at Q4_K_M on 15 GB of VRAM, fitting inside any 24 GB card (RTX 4090, RTX 3090). For context: its 256k token window means you can feed an entire Python package without truncating. If your card tops out at 12–16 GB, use qwen2.5-coder:14b as your starting point — it completes simple one-file tasks and is a realistic daily evaluation model before upgrading hardware.

For a full breakdown of which models fit which GPU tier, the runaihome.com guide at Best Local AI Models by VRAM is worth checking before buying hardware for this setup.


Step 1: Install Ollama and pull a coding model

Current version: Ollama 0.30.6.

Linux:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

macOS / Windows: Download from ollama.com/download.

Before pulling a model, set the context window. OpenHands agentic loops consume 20k–50k tokens per task. Ollama's default 2k context silently truncates input — same trap as in Aider + Ollama setups and Continue.dev. Fix it before you start:

# Add to ~/.bashrc or ~/.zshrc
export OLLAMA_NUM_CTX=65536
Enter fullscreen mode Exit fullscreen mode

Restart the Ollama service after setting the variable:

# Linux
sudo systemctl restart ollama

# macOS
# Restart via the menu bar icon, or kill and reopen Ollama.app
Enter fullscreen mode Exit fullscreen mode

Now pull the model. For Devstral Small 2:

ollama pull devstral-small-2
Enter fullscreen mode Exit fullscreen mode

This downloads the Q4_K_M quantized version at ~15 GB. Wait for the download to complete, then verify:

ollama list
# Expected output includes:
# devstral-small-2   ...   15.5 GB   ...
Enter fullscreen mode Exit fullscreen mode

For the budget tier (qwen2.5-coder:14b):

ollama pull qwen2.5-coder:14b
Enter fullscreen mode Exit fullscreen mode

Quick sanity check that tool calling is responding:

ollama run devstral-small-2 "What is 2 + 2? Use a tool call to return the answer."
# A model with proper tool-call support produces a structured response.
# A model without it just says "4" in plain text — not what OpenHands needs.
Enter fullscreen mode Exit fullscreen mode

Step 2: Run OpenHands via Docker

OpenHands requires Docker because every task runs inside an isolated sandbox container. The isolation prevents runaway agent processes from affecting your host filesystem.

Install Docker if you haven't: docker.com/get-docker. On Linux, confirm your user is in the docker group:

sudo usermod -aG docker $USER
newgrp docker
Enter fullscreen mode Exit fullscreen mode

The version-matching requirement

OpenHands' main container and the sandbox runtime container must run the same version. Mismatched tags produce a cryptic container start error that has no obvious diagnostic message. Always pin both explicitly.

For OpenHands 1.7.0:

export OH_VERSION=1.7.0
export RUNTIME_TAG=1.7.0-nikolaik
Enter fullscreen mode Exit fullscreen mode

The Linux networking trap

On macOS and Windows (Docker Desktop), host.docker.internal resolves to your host machine automatically. On Linux, it does not. Ollama runs on your host at port 11434; the OpenHands container needs to reach it via host.docker.internal. Without the --add-host flag, OpenHands starts, you configure Ollama as the LLM provider, and every single task attempt returns a connection error without a clear reason why.

The fix is one flag in the Docker run command: --add-host host.docker.internal:host-gateway.

The full Docker run command

docker run -it --rm \
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:${RUNTIME_TAG} \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands:/.openhands \
  -v ~/workspace:/opt/workspace_base \
  -p 3000:3000 \
  --add-host host.docker.internal:host-gateway \
  docker.all-hands.dev/all-hands-ai/openhands:${OH_VERSION}
Enter fullscreen mode Exit fullscreen mode

Volume mounts explained:

  • /var/run/docker.sock — lets OpenHands spawn sandbox containers (required)
  • ~/.openhands:/.openhands — persists your LLM config across container restarts
  • ~/workspace:/opt/workspace_base — the local directory OpenHands works in; set this to your actual project path

macOS note: On macOS with Docker Desktop, you can omit --add-host and replace host.docker.internal in the Ollama URL with the same token — Docker Desktop handles the resolution automati

Top comments (0)