Vishal VeeraReddy

Posted on May 22

Run OpenHands on Any Model You Want

#ai #agents #automation #opensource

There's a quiet shift happening in how serious developers are using AI in 2026. The hype cycle has moved past "ask the chatbot" and landed somewhere more interesting: autonomous coding agents that actually open files, run commands, and ship work — paired with self-hosted routing layers that decide which model handles which turn so you don't go broke doing it.

This post is about the two open-source projects I've quietly stitched together into my daily driver — and the setup has all but replaced the closed tools I used to pay a small fortune for. OpenHands is the agent: a sandboxed, autonomous software engineer that opens files, runs commands, writes tests, and ships PRs. Lynkr is the router: a self-hosted proxy that sits in front of every LLM provider on the market and decides, request by request, which one should answer. One runs the work. The other decides what the work is worth. Together, they run locally, leak nothing to a third-party SaaS, and cost a fraction of anything closed you can buy in 2026.

What is OpenHands?

OpenHands is the most credible open-source answer to closed AI coding agents like Devin, Cursor's background agents, and Codex CLI. It grew out of the OpenDevin research project (renamed in early 2025) and is now maintained by All-Hands-AI, a venture-backed company with an $18.8M Series A. The repo lives at All-Hands-AI/OpenHands, ships under the MIT license, and at version 1.7.0 (May 2026) has crossed 74,400+ GitHub stars, 9,400+ forks, 102 releases, and 6,700+ commits — making it by a wide margin the most-adopted open agent framework in the world. The codebase is roughly 63% Python and 36% TypeScript. To understand why it works, it helps to look at the architecture, the agent design, the runtime, and the customization surface in turn.

The architecture: event-sourced, modular, deploy-anywhere

The V1 SDK that landed alongside the November 2025 paper "The OpenHands Software Agent SDK" is a clean reimagining of the original V0 monolith. Where V0 had 140+ configuration fields spread across 15 classes and tightly coupled the agent to the sandbox, V1 splits the system into four independent packages: an SDK for agent definitions, a Tools layer for action handlers, a Workspace layer for execution environments, and a Server for hosting. Everything is stateless and immutable — agents are configuration objects, not living entities — and all mutable context lives in a single event-sourced ConversationState object. The conversation itself is a typed stream of Action and Observation Pydantic events, both immutable, both replayable. This is the single most important design decision in OpenHands: an autonomous agent is modeled as a pure function from event history to next event, run in a loop. Pause, resume, fork, deterministic replay, and full audit trails come for free. The same code runs in a Jupyter notebook locally or against a remote container farm in production — the Conversation factory transparently picks LocalConversation (in-process) or RemoteConversation (HTTP/WebSocket to a containerized server) based on config, with zero code changes.

The agent: CodeActAgent and the "code as universal tool" insight

The flagship agent is CodeActAgent, and its premise comes from the original CodeAct paper: instead of handing the LLM 20 bespoke tools each wrapped in their own JSON schema, give it bash, Python (via Jupyter), and a browser DSL, and let it express any action as code. Want to read a file? cat. Want to refactor across the repo? Python. Want to verify a fix lands? Run the tests. Want to scrape documentation? Drive the browser. Empirically this generalizes far better than tool-per-task designs and dramatically reduces parsing errors, because the model is doing what it's already best at — writing code — instead of filling in arbitrary JSON. Tool use is unified through a single Action → Execution → Observation contract. The SDK also ships MCPToolDefinition extending the standard ToolDefinition interface, so Model Context Protocol tools plug in alongside native ones with no glue code, and a built-in RouterLLM that can switch models mid-conversation (e.g., escalate to a multimodal model only when an image actually appears in context). A LLMSecurityAnalyzer scores every proposed tool call as LOW / MEDIUM / HIGH / UNKNOWN and can pause for confirmation on dangerous operations — a layer most agent frameworks simply don't have.

The runtime: a sandboxed Docker microservice

Agents never touch your host directly. Every action runs inside a sandboxed runtime container that exposes an action execution server over a REST API, which the OpenHands backend talks to in a tight loop: send Action, receive Observation, repeat. The container ships its own bash shell, Jupyter kernel, headless browser, and a pluggable skills system, plus optional VS Code Web access on a tokenized URL so you can drop into the sandbox visually if you want to. Images are managed through a clever three-tier tagging system — source-hash, lock-hash, versioned — so rebuilds are incremental and reproducible across machines. The runtime is pluggable too: Docker is the default, LocalRuntime runs the action server on the host for fastest iteration, and RemoteRuntime targets remote container infrastructure for fleet-scale deployment. Plugin backends include E2B, Modal, and Daytona for teams that already have sandbox infrastructure they trust. Storage supports bind mounts and Docker named volumes with optional copy-on-write overlay mode for isolation.

The customization surface: microagents (V0) / Skills (V1)

OpenHands is opinionated but extensible through what V0 calls microagents and V1 renames to Skills (both terms still work — V1 reads the V0 layout for backward compatibility). You drop a .openhands/microagents/ directory at the root of your repo, add markdown files like repo.md, frontend.md, or migrations.md, and the main agent loads them on demand. A repo.md gives the agent the high-level mental model of your codebase — directory layout, build commands, test conventions, where the gotchas are — so it doesn't have to rediscover them every session. Triggered microagents activate only when their keywords appear in the conversation, so you can teach the agent narrow domain knowledge without bloating every prompt. This is where the project's "agents that ship in production" claim earns its keep: tuning OpenHands for your specific codebase is a markdown commit, not a fork.

The interfaces: four ways in

You can use OpenHands as a CLI (terminal-native, like Aider or Claude Code), a local GUI (React SPA backed by a REST/WebSocket API, the default docker run experience), the hosted cloud at all-hands.dev (free tier on Minimax models, paid tiers on frontier providers, GitHub/GitLab login), or as a headless service driven programmatically via the SDK. The headless mode is what powers the OpenHands Resolver: connect it to a GitHub repo, label an issue, and OpenHands spins up a sandboxed runtime, analyzes the issue, edits the code, runs the tests, and opens a PR — fully autonomous. The same Resolver pattern works for GitLab. For enterprise teams, the deployment story extends to private Kubernetes clusters with RBAC, Slack/Jira/Linear connectors, and source-available enterprise features.

The numbers that matter

OpenHands publishes against the standard benchmark: 77.6% on SWE-Bench Verified with Claude 3.5 Sonnet Thinking on the V0 harness, and 72.8% SWE-Bench Verified + 67.9% GAIA with Claude Sonnet 4.5 on the V1 SDK. Those aren't the highest numbers ever posted, but they are the highest by an open-source, self-hostable system you can actually inspect, fork, and run yourself. In January 2026 the team launched the OpenHands Index — a continuously-updated leaderboard evaluating models across five real engineering categories (Issue Resolution, Greenfield Development, Frontend Development, Software Testing, and Information Gathering) by ability, cost, and execution time. That tells you something about where the project is heading: away from "agent as parlor trick" and toward "agent as measured, reproducible infrastructure."

What ties it together

The throughline is that OpenHands is not trying to be a chatbot or an autocomplete. It's trying to be the runtime layer for autonomous software agents — composable, sandboxed, observable, model-agnostic, and equally happy in a notebook or behind a load balancer. It uses LiteLLM under the hood for LLM dispatch, which means 100+ providers are reachable today (Claude, GPT, Gemini, Bedrock, Vertex, Azure, OpenRouter, Ollama, llama.cpp, anything LiteLLM speaks) with no code changes. That last property is what makes the Lynkr pairing not just possible but obvious.

What is Lynkr?

Lynkr is a self-hosted Node.js proxy that sits between your AI coding tools and the dozen-plus LLM providers worth using, listening on http://localhost:8081 and presenting both Anthropic Messages and OpenAI Chat Completions APIs simultaneously. From the outside it looks like a normal LLM endpoint — point Claude Code, Cursor, Aider, Codex CLI, or OpenHands at it with one environment variable and you're done — but on the inside it does something none of those tools do on their own: it analyzes every request across 15 weighted dimensions (including an AST-based knowledge graph called Graphify that understands code structure across 19 languages, detecting god nodes, community cohesion, and architectural blast radius) and routes it to one of four model tiers — simple, medium, complex, or reasoning — based on what the request actually needs. Supported backends include local-and-free options (Ollama, llama.cpp, LM Studio, MLX Server) and cloud providers (AWS Bedrock with 100+ models, OpenRouter with 100+ more, Azure OpenAI, Azure Anthropic, OpenAI, Google Vertex, Databricks, Moonshot, Z.AI, DeepSeek). On top of routing, Lynkr layers a seven-phase token optimization pipeline — smart tool selection, Code Mode which collapses 100+ MCP tools into 4 meta-tools (≈96% tool-overhead reduction), Distill structural compression, SHA-256-keyed LRU prompt caching, memory deduplication, sliding-window history compression, and an optional ML-based headroom sidecar (Smart Crusher, CCR, LLMLingua) — plus a Titans-inspired long-term memory subsystem that stores observations in a SQLite FTS5 database (lynkr.db) scored on surprise, recency, and relevance, and injects only the relevant slice back into system prompts on future requests. Production deployments get Prometheus metrics at /metrics, Kubernetes-ready health checks, circuit breakers with half-open probe recovery, hot-reloadable config via POST /v1/admin/reload, SQLite-backed routing telemetry with P50/P95/P99 latencies and 0–100 quality scoring on every decision, and load shedding under pressure.

Why You Should Use Lynkr With OpenHands

The honest case for stacking these two has four layers, and each one matters more than the last.

Cost, but not the dumb kind. OpenHands is brilliant precisely because it doesn't ask permission — it just does the work, which means it burns through tokens at a rate that's painful if every turn hits a frontier model. A long session can quietly cost $20+ in Opus tokens because the agent doesn't know — and shouldn't have to know — that a mv foo.py bar.py doesn't deserve a $15-per-million model. Lynkr's complexity analysis is the missing brain: simple file moves and grep calls drop to simple tier (Haiku, GPT-4o-mini, local Qwen), real architectural reasoning gets routed to reasoning tier (Opus, GPT-5, Gemini Ultra). Users typically report 60–80% lower spend without a meaningful drop in output quality, because the expensive models still get called — just for the turns that actually need them. Unlike OpenRouter, Lynkr doesn't take a 5.5% cut of your credits, and unlike LiteLLM's proxy, the token optimization pipeline is built in rather than something you bolt on.

Provider redundancy that doesn't require you to redeploy. When Anthropic has a bad afternoon — and they do — your OpenHands session doesn't stop. Lynkr's circuit breakers detect the failure, route around it to a configured fallback (Bedrock Claude, Azure Anthropic, OpenAI, whatever you've set), and quietly recover via half-open probes when the primary comes back. You change zero code.

Local models when you want them, frontier when you don't. Drop in Ollama or LM Studio as your simple tier and the cheapest turns in your session cost literally nothing — they never leave your machine. The same OpenHands install can be 100% offline-capable for development tasks and seamlessly burst to cloud Opus for the hard problems. No other agent + router combination on the market does this without significant glue code.

One config owns your model strategy. This is the underrated win. As models drop monthly (GPT-5.1, Sonnet 4.7, Gemini 3, the next open-weights surprise from DeepSeek), you stop rewiring your tools — you change one line in Lynkr's config and hot-reload. OpenHands keeps doing what it does. Cursor, Claude Code, Aider, and every other tool pointed at the same Lynkr instance get the new strategy for free.

A note on philosophy: nothing about your code, prompts, or context ever passes through a third-party SaaS on the way to the model. Lynkr is self-hosted, your provider keys live in your environment, and the routing decisions are auditable in a local SQLite database. For anyone working in a regulated environment — or who just doesn't love handing prompt logs to an intermediary — this matters.

How to Use It

The full setup is three steps and roughly five minutes.

1. Install and start Lynkr

Pick whichever feels right:

# One-line install (recommended)
curl -fsSL https://raw.githubusercontent.com/Fast-Editor/Lynkr/main/install.sh | bash

# Or npm
npm install -g lynkr

# Or Homebrew
brew tap vishalveerareddy123/lynkr && brew install lynkr

# Or Docker
docker-compose up -d

2. Configure your providers

Lynkr reads from environment variables (or a .env file). A reasonable starter config that mixes a local model for cheap turns with cloud frontier models for hard ones:

# Tier definitions
SIMPLE_PROVIDER=ollama
SIMPLE_MODEL=qwen2.5-coder:latest

MEDIUM_PROVIDER=openai
MEDIUM_MODEL=gpt-4o-mini

COMPLEX_PROVIDER=anthropic
COMPLEX_MODEL=claude-sonnet-4-6

REASONING_PROVIDER=anthropic
REASONING_MODEL=claude-opus-4-7

# Provider credentials
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Optimizations
SEMANTIC_CACHE_ENABLED=true
CODE_MODE_ENABLED=true
MEMORY_ENABLED=true

Start it and verify:

lynkr start
curl http://localhost:8081/v1/models   # should return a JSON model list

3. Point OpenHands at Lynkr

Three environment variables on the OpenHands container do the entire job. The LLM_BASE_URL tells LiteLLM where to send requests, the LLM_API_KEY is a placeholder (Lynkr accepts anything because the real auth lives upstream), and the openai/ prefix on LLM_MODEL tells LiteLLM to use OpenAI wire format against Lynkr's /v1/chat/completions endpoint — even when the model on the other side is Claude or Gemini.

Full copy-pasteable command:

docker run -it --rm --pull=always \
  -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \
  -e AGENT_SERVER_IMAGE_TAG=1.19.1-python \
  -e LOG_ALL_EVENTS=true \
  -e LLM_BASE_URL="http://host.docker.internal:8081/v1" \
  -e LLM_API_KEY="sk-lynkr" \
  -e LLM_MODEL="openai/claude-sonnet-4-6" \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands:/.openhands \
  -p 3000:3000 \
  --add-host host.docker.internal:host-gateway \
  --name openhands-app \
  docker.openhands.dev/openhands/openhands:1.7

Open http://localhost:3000, give OpenHands a task, and watch the proxy do its job. If you tail lynkr logs in another terminal you'll see the routing decisions in real time — which tier each request landed on, latency, cached vs cold, and the quality score.

If you prefer config.toml over env vars, OpenHands also accepts:

[llm]
model = "openai/claude-sonnet-4-6"
base_url = "http://localhost:8081/v1"
api_key  = "sk-lynkr"

The same pattern works from OpenHands' GUI under Settings → LLM → Advanced.

What's Actually Happening Under the Hood

When OpenHands fires a request, here's the full lifecycle:

OpenHands → LiteLLM formats the call as OpenAI Chat Completions and ships it to http://host.docker.internal:8081/v1/chat/completions.
Lynkr ingests the request, runs token counting and budget enforcement.
Optimization pipeline applies prompt caching (SHA-256 LRU), memory deduplication, tool truncation (Code Mode collapses MCP tool definitions), and history compression.
Complexity analysis scores the request across 15 dimensions including Graphify AST signals; the router picks a tier.
Format translation converts to whatever wire protocol the destination needs — Bedrock Converse, Vertex Gemini, Anthropic Messages, or stays OpenAI.
Provider invocation via the unified invokeModel() abstraction. If the circuit breaker is open, the request silently fails over.
Response translation converts back to OpenAI Chat Completions for LiteLLM.
Telemetry writes the routing decision, latency percentiles, and quality score to SQLite for later inspection.
Memory subsystem scores the exchange for surprise / recency / relevance and stores any high-signal observations for future injection.

OpenHands sees a perfectly normal OpenAI response. It has no idea any of that happened.

Gotchas Worth Knowing

Don't also set ANTHROPIC_BASE_URL inside the OpenHands container. OpenHands talks via LiteLLM, not the Anthropic SDK directly — the LLM_* env vars are the correct lever.
On Linux, the --add-host host.docker.internal:host-gateway flag is what lets the container resolve back to your host. The command above already includes it; don't drop it.
Keep the openai/ prefix on LLM_MODEL. Without it, LiteLLM tries to use the Anthropic SDK against an OpenAI-shaped endpoint and fails confusingly.
If you're running Lynkr in Docker too, put both containers on the same Docker network and reference Lynkr by container name instead of host.docker.internal.
The LLM_API_KEY value is ignored by Lynkr but must be set to something — LiteLLM refuses to send requests without an API key field present.

The Bigger Picture

The reason this combo works isn't that either project is doing something magical in isolation — it's that they respect the same boundary. OpenHands knows it shouldn't care which model is on the other side of LiteLLM; Lynkr knows it shouldn't care which tool is asking. That clean separation is what makes the stack composable. Tomorrow you can swap OpenHands for Aider or Cline without touching Lynkr's config. Next month you can add Kimi K2 or whatever Mistral ships next without touching OpenHands. The agent layer and the routing layer evolve independently, and your wallet quietly benefits from both.

That's the version of the AI coding stack worth betting on in 2026: open, local-first, model-agnostic, and configurable from one file you actually control.

If you want the source: OpenHands lives at github.com/All-Hands-AI/OpenHands, Lynkr at github.com/Fast-Editor/Lynkr. Both accept contributions, and both maintainers ship fast.

DEV Community