Tang Weigang

Posted on Jul 2

OpenHands Can Run Coding Agents. First Decide What They Are Allowed to Touch.

#opensource

OpenHands puts an AI coding session inside a larger control surface: a frontend, an app server, sandbox services, code-host integrations, skills, and agent runtimes all meeting in one place.

That is the reason it is useful. It is also the reason the first question should not be "which model should I use?" The first question is: what is this agent allowed to touch?

Doramagic project page: https://doramagic.ai/en/projects/openhands/

Doramagic manual: https://doramagic.ai/en/projects/openhands/manual/

Upstream project: https://github.com/OpenHands/OpenHands

Treat OpenHands as an execution boundary, not a shortcut

The OpenHands repository describes a platform for running coding agents across local, remote, and cloud backends. The manual maps the system into a few practical layers: React/Remix frontend, FastAPI app server, conversation service, sandbox service, integrations, secrets, settings, event callbacks, and skills or microagents.

That layout matters because an agent session is more than text generation. A real session can involve:

selecting a repository;
preparing a sandbox;
running setup scripts;
using GitHub, GitLab, or Bitbucket integrations;
reading or writing files;
calling tools;
storing settings or secrets;
exposing local web apps through a browser surface.

If you only evaluate OpenHands by asking whether it can "code", you miss the operational surface. The practical evaluation is whether the agent can work inside boundaries that a human can inspect.

The smallest safe first run

Do not start by giving OpenHands a production bug and a real credential set. A safer first run is boring by design:

Use a temporary repository or a disposable branch.
Pick one small issue with a clear expected diff.
Disable or avoid real secrets.
Record the selected repository, branch, sandbox, and model profile.
Ask for a plan and a proposed file list before any edits.
Require tests or a smoke check before the session is called complete.
Review the final diff and command transcript before merge.

This does not make the agent slow. It makes the first failure diagnosable.

Watch the sandbox state

The manual shows sandbox states such as missing, starting, running, paused, and error. That sounds like infrastructure detail until a session appears stuck or silently stops progressing.

For day-one use, the operator should be able to answer:

which sandbox is active;
whether it is local, Docker, remote, or cloud-backed;
whether setup scripts have run;
which URLs were exposed;
whether the session is waiting for a sandbox or already running code;
what happens when concurrency or runtime limits are hit.

One recorded community risk is that runtime limits can look like a workflow problem if older sandboxes are silently paused instead of a clear quota error. The lesson is simple: log sandbox state as product evidence, not as a hidden implementation detail.

Secrets and integrations need a sharper policy than "be careful"

OpenHands integrates with code-host resources and can handle user context, settings, secrets, and event callbacks. That power is exactly why the first policy should be explicit.

A useful first policy is:

no personal token or production secret in the first session;
no write access to the primary repository until a read-only run succeeds;
no automatic webhook or callback configuration without review;
no claim of "ready" unless the selected repository and branch are visible;
no merge or deploy action inside the same exploratory run.

This is not anti-agent. It is how an agent earns more authority.

Do not skip the boring failure modes

The Doramagic pitfall log for OpenHands points to installation, configuration, and permission risks. The manual also preserves concrete operational issues: self-hosted UI startup problems, browser launch flags in containerized environments, root-owned files from dev containers, LLM profile editing rough edges, and runtime-cap behavior.

Those are not reasons to avoid the project. They are reasons to keep the first workflow narrow enough that the cause of failure is visible.

If the first run fails, the right question is not "is OpenHands bad?" It is:

did the sandbox start;
did the repository preparation finish;
did setup scripts run as the expected user;
did the browser or exposed URL need an extra flag;
did a model profile or API key setting change unexpectedly;
did the session pause because of a runtime limit;
did the final diff match the original issue?

That list is more useful than a generic "AI coding agents are risky" warning.

What Doramagic adds

The Doramagic OpenHands pack is not an official upstream document and should not be treated as endorsement. It is an independent project context pack: quick start, prompt preview, human manual, pitfall log, boundary card, and eval checks.

The pack is useful when you want to load a host instruction into Claude Code, Codex, Cursor, Aider, or another AI coding host and make it ask better first questions before installing or changing anything.

The useful prompt is not "use OpenHands." It is closer to:

Before taking action, restate the task, identify the required tool access, separate read-only checks from write actions, mark any command or install step as requiring approval, and say when evidence is missing.

That changes the first interaction from hype to control.

A practical acceptance checklist

Before you trust a real OpenHands session, ask for these proofs:

selected repository and branch;
sandbox id or state;
model or LLM profile being used;
tool and integration surface;
files the agent intends to read or edit;
commands it intends to run;
expected rollback path;
final diff;
test or smoke-check result;
unresolved risks.

If an agent cannot show these, it may still be useful, but it has not earned unsupervised authority.

The operating rule

OpenHands is strongest when it is treated as an agent control plane rather than a magic developer. Give it small, checkable tasks first. Make the sandbox, repository, integration, secret, and rollback boundaries visible. Then expand its authority only when the evidence improves.

That is the difference between "letting an AI code" and operating an AI coding system.

DEV Community