clearloop for OpenWalrus

Posted on Mar 15 • Originally published at openwalrus.xyz

Sandboxing AI agents: beyond Docker and WASM

#ai #research #opensource #openwalrus

An AI coding agent that can't run shell commands, edit files, or install
packages is useless. An AI coding agent that can do all of those things
without restriction is dangerous. Every production agent system sits
somewhere on this spectrum — and most of them are struggling with it.

In 2025 alone, three runc CVEs (CVE-2025-31133, CVE-2025-52565, CVE-2025-52881) demonstrated that standard Docker containers
share the host kernel and can be escaped. A filesystem MCP server had
symlink-based CVEs that allowed full system takeover from inside a
sandboxed setup. A Cursor sandbox vulnerability
leaked secrets
from the host environment. The problem isn't theoretical.

We surveyed how seven major agent systems handle sandboxing and permissions,
mapped the pain points, and looked at what's emerging beyond Docker — including
why WASM with host functions doesn't solve what you think it solves.

How production agents handle permissions

Claude Code: OS-level sandbox, no containers

Claude Code uses the operating system's own isolation primitives.
On macOS: Apple's Seatbelt (sandbox-exec) with a runtime-generated
policy. On Linux: bubblewrap
(bwrap) for namespace isolation. No Docker daemon, no containers, no VMs.

The sandbox restricts the agent to the working directory for writes.
All network traffic routes through a proxy Unix domain socket — outbound
is blocklisted by default, with an allowlist for approved domains.

On top of the sandbox, Claude Code has five
permission modes:
default (ask before everything risky), acceptEdits (auto-approve
file edits, prompt for shell), plan (read-only), dontAsk and
bypassPermissions (for fully isolated CI environments). A
PreToolUse hook system lets users write shell scripts that
approve or deny tool calls programmatically — automating the
permission decision without disabling the safety layer.

Cursor: Landlock + seccomp, workspace-scoped

Cursor's sandbox uses the same primitives as Claude Code — Seatbelt on macOS,
Landlock LSM + seccomp on
Linux — but with a workspace-scoped policy generated at runtime. The agent
can read/write the open workspace and /tmp, read the broader filesystem,
but cannot write outside the workspace or make network requests without
explicit user approval.

The results are concrete: Cursor
reports that sandboxed agents
stop 40% less often than unsandboxed ones. Fewer false positives, fewer
interruptions, more autonomy. As of early 2026, one third of all Cursor
requests run sandboxed.

Enterprise admins get granular network allowlists/denylists. Regular users
don't — network approval is per-request.

OpenAI Codex CLI: Landlock + seccomp in Rust

Codex CLI follows the same pattern — Seatbelt on macOS, Landlock + seccomp on
Linux — but implements the Linux sandbox in Rust using the
seccompiler crate (the same BPF
compiler used by AWS Firecracker). A codex-linux-sandbox helper process
enforces restrictions. No Docker, no VM.

Network is blocked by default in sandboxed modes. Filesystem writes are
restricted to configured writable roots. Debug commands (codex debug seatbelt,
codex debug landlock) help diagnose policy issues.

Devin: Docker with managed cloud

Devin runs each session in a Docker container hosting a full environment:
terminal, browser, code editor, and planner. The container runs on AWS
(multi-tenant SaaS) or in the customer's VPC via AWS PrivateLink.
SOC 2 Type II certified.

The advantage: the agent gets a complete, isolated machine. The tradeoff:
Docker's shared-kernel isolation model, cloud latency, and the inability
to run locally. Devin is a managed service, not a local tool.

OpenHands: Docker per session, docker.sock risk

OpenHands (formerly OpenDevin)
runs each task in a Docker container, torn down post-session. The agent
accesses the container via SSH. Workspace files are mounted via Docker
volumes.

The documented security caveat: OpenHands requires mounting
/var/run/docker.sock, which gives the container full control over the
host Docker daemon. If an agent escapes the container, it has access to
all Docker resources on the host. This is
acknowledged in their own materials
and remains an active research problem.

Aider: no sandbox

Aider runs entirely in your terminal
with direct filesystem access. No container, no VM, no OS-level isolation.

The safety model is git: all changes go through git, so git diff and
git checkout are the undo mechanism. This is an explicit design choice —
simplicity over isolation. The community workaround for stronger isolation
is to run Aider inside a devcontainer.

GitHub Copilot Coding Agent: Actions + firewall

GitHub's Copilot Coding Agent runs in a GitHub Actions environment with
internet access controlled by a firewall. The agent has read-only repo
access for exploration and can only push to branches prefixed with copilot/.
Standard branch protection rules and required checks apply.

No arbitrary terminal access to the user's local machine. The agent lives
in GitHub's infrastructure, not yours.

The permission model spectrum

[Interactive chart — see original post]
The radar shows a clear split. Claude Code, Cursor, and Codex CLI converge
on the same architecture: OS-level primitives (Seatbelt/Landlock/seccomp)
with no Docker dependency. They score high on filesystem isolation and
network control but share a cross-platform weakness — maintaining three
separate sandbox implementations for macOS, Linux, and Windows.

Devin and OpenHands use Docker, which gives strong shell restrictions
(everything runs inside the container) but weaker network control and no
cross-platform story beyond "install Docker." Aider skips sandboxing
entirely, trading security for zero setup friction.

The real gate: in-system permissions for bash

All the sandbox technologies above — Docker, Firecracker, Landlock,
Seatbelt — answer the same question: if the agent runs a dangerous
command, how do we limit the blast radius? But there's a prior
question that matters more in practice: how does the agent get
permission to run bash in the first place?

Every agent system is fundamentally a loop that decides whether to
grant an LLM access to a shell. The sandbox is the safety net. The
permission system is the gate. In practice, the gate does more work
than the net.

Permission model comparison

System	Default bash policy	Approval mechanism	Escape hatch	Granularity
Claude Code	Ask before every shell command	User prompt per-command; `PreToolUse` hooks for automation	`--dangerously-skip-permissions`	Per-command pattern matching (`Bash(git commit:*)`)
Cursor	Deny outside workspace	Auto-approve safe tools (grep, ls); prompt for state-changing commands	Enterprise admin overrides	Per-tool category (safe / state-changing)
Codex CLI	Blocked in sandbox modes	`--full-auto` mode for trusted environments	`--full-auto` flag	Per-mode (interactive / sandbox / full-auto)
Windsurf	Ask before terminal commands	User approval per-command; `.codeiumignore` for file exclusions	Security rules with `NEVER` flags	Per-command with rule overrides
Devin	Full access inside container	Container is the boundary — no per-command approval	N/A (container is the sandbox)	All-or-nothing (container-level)
OpenHands	Full access inside container	Event stream API gates actions inside container	N/A	Per-action type via event API
Aider	Full access, no restriction	None — git is the undo mechanism	N/A	None
GitHub Copilot	No local shell access	Runs in Actions; can only push to `copilot/` branches	N/A	Branch-scoped

Two philosophies

The table reveals two fundamentally different approaches:

Gate the command, sandbox the process. Claude Code, Cursor, Codex CLI,
and Windsurf all run on the user's machine and mediate individual bash
commands. The agent proposes a command; the system (or user) approves or
denies it. The OS-level sandbox (Landlock/Seatbelt) is the fallback if
an approved command does something unexpected. This is high-friction but
fine-grained — you can allow git commit while blocking rm -rf /.

Gate the environment, give full access inside. Devin, OpenHands, and
GitHub Copilot put the agent in an isolated environment and let it run
anything. No per-command approval, no permission prompts. The boundary
is the container or VM wall. This is low-friction but coarse — you can't
allow some commands while blocking others, because the agent has root
inside its sandbox.

The first approach treats the sandbox as defense-in-depth. The second
treats the sandbox as the only defense. When the container is the only
boundary, a container escape is game over. When per-command approval
exists alongside OS-level restrictions, both have to fail simultaneously.

The browser problem

There's a third dimension that neither philosophy handles well: browser
access with authenticated sessions.

For autonomous agents doing real-world tasks — filling forms, navigating
internal tools, interacting with SaaS dashboards — the browser with the
user's logged-in profiles is arguably the most valuable resource on the
machine. Cookies, saved passwords, OAuth tokens, browser extensions,
session state across dozens of services — all of it lives in the user's
browser profile.

Docker and VM-based sandboxes cut the agent off from this entirely. The
agent gets a fresh browser with no sessions, no cookies, no saved
credentials. Every service requires re-authentication. Extensions don't
exist. The agent is starting from zero in the environment where the
user has years of accumulated state.

Anthropic's Computer Use
runs agents against a visible desktop — with access to whatever the
user has open, including browsers. Daytona's Computer Use sandbox takes
a similar approach for Linux/macOS/Windows desktop automation. But these
operate outside the sandboxing models discussed above. The agent
either gets the user's full desktop (powerful but dangerous) or a
fresh container desktop (safe but useless for authenticated workflows).

This is the fundamental tension for autonomous agents. The resources
that make agents most useful — authenticated browser sessions, logged-in
APIs, the user's actual environment — are exactly the resources that
sandboxing is designed to restrict. No current system resolves this
cleanly. The closest approximation is scoped OAuth tokens and
per-service API keys granted to the agent explicitly, but that requires
every service to support programmatic access, which most internal tools
don't.

What works best

Cursor's data gives the clearest signal: agents with proper environmental
sandboxing stop 40% less often than unsandboxed ones. The intuition
is counterintuitive — more isolation produces more autonomy, not less.

The reason: without a sandbox, the system must prompt the user for every
potentially dangerous action because there's no safety net. With a sandbox,
the system can auto-approve most actions because the blast radius is
contained. The sandbox replaces per-command friction with environmental
safety.

Claude Code's hook system points at a middle path. PreToolUse hooks
let users write shell scripts that approve or deny tool calls based on
pattern matching — Bash(npm install:*) auto-approves, Bash(rm -rf:*)
always denies. This moves permission decisions from interactive prompts
to declarative policy. The agent doesn't stop to ask; the policy
pre-answers.

The pattern that emerges across all systems: the best permission model
is the one where the user configures policy once and then forgets about
it. Per-command prompts don't scale. Container-level all-or-nothing
doesn't give enough control. Declarative per-capability policies —
what commands, what directories, what network endpoints — hit the
sweet spot. No system has fully nailed this yet.

Pain points

The restrictive/permissive policy trap

ARMO's research
documents a pattern seen across organizations: security teams set overly
restrictive sandbox policies. The agent breaks within 48 hours. Teams
loosen policies incrementally until they become permissive enough to be
meaningless. The result is security theater — a sandbox that exists on
paper but blocks nothing in practice.

The fix requires progressive enforcement: start permissive with monitoring,
tighten based on observed behavior. But most sandbox systems offer
all-or-nothing controls, not gradual ones.

Docker cold starts and resource overhead

Docker containers incur 500ms–2s cold starts depending on base image.
For coding agents, the situation is worse — cloning a repository and
installing packages can add 20–30 seconds before any code executes.
At high concurrency (~400 parallel starts), CNI plugin and virtual switch
setup inflate boot times by up to
263%.

Teams maintain "warm pools" of pre-booted containers to mitigate this,
creating an economic trap: idle containers waste compute, but building
a predictive autoscaler is a serious engineering challenge.

Docker socket escape

OpenHands requires /var/run/docker.sock access. This is well-documented
and widely discussed. The socket gives any process with access full control
over the host Docker daemon — start containers, mount host filesystems,
access networks. Container escape equals full host access.

Cross-platform inconsistency

Claude Code, Cursor, and Codex CLI each maintain separate sandbox
implementations for macOS (Seatbelt) and Linux (Landlock/bubblewrap/seccomp).
Windows gets WSL2 at best. Every sandbox has different capabilities,
different edge cases, and different failure modes. Apple marks sandbox-exec
as deprecated — the macOS implementation is load-bearing but architecturally
uncertain.

Symlink escapes and path traversal

CVE-2025-53109 and CVE-2025-53110 in a filesystem MCP server showed that
path prefix matching — the most common way sandboxes restrict filesystem
access — can be bypassed through symlinks. An agent creates a symlink
inside the allowed directory pointing outside it, and the sandbox's path
check passes while the actual access goes elsewhere. This is a class of
vulnerability, not a single bug.

Constant permission prompts vs autonomy

Agents that sandbox at the process level but need host access for installing
packages, running tests, or accessing databases face a dilemma: prompt the
user for every action (destroying flow) or run without restriction (destroying
safety). Cursor's data — 40% fewer stops with proper sandboxing — shows
that environmental sandboxing (the agent runs inside the sandbox, so package
installs and test runs don't need approval) is better than per-action
approval.

Beyond Docker: the sandbox landscape

Firecracker microVMs

Firecracker — the technology
behind AWS Lambda — gives each sandbox its own kernel. Boot time is under
200ms. Memory overhead per microVM is under 5 MiB. This eliminates the
shared-kernel escape vectors that plague runc containers.

E2B builds on Firecracker to offer ephemeral sandboxes
purpose-built for AI agents. Pre-warmed snapshot pools bring startup under
200ms. Manus
uses E2B for its agent execution.
Microsandbox is an
open-source alternative using libkrun (KVM-based microVMs) with sub-200ms
boot, Apache-2.0 license, and MCP integration.

The tradeoff: Firecracker requires KVM (Linux only), and networking setup
at scale (TAP interfaces, IP tables, CNI plugins) becomes the primary
bottleneck at ~400 parallel starts.

OS-level primitives: what Claude Code and Codex actually use

The most significant finding in this survey: the two most widely-used
local coding agents — Claude Code and Codex CLI — don't use Docker or
VMs at all. They use kernel-level primitives directly.

Landlock (Linux 5.13+) lets a process restrict its own filesystem and
network access at the kernel level without requiring root. Unlike seccomp
(which filters syscalls) or namespaces (which need privileges), Landlock
lets a process sandbox itself from within. It filters at the point of
operation in the kernel, not at the syscall interface.

seccomp-bpf attaches a BPF program to a process that filters every
syscall. It's the most surgical tool: you can allow or deny specific
syscalls with specific arguments. The limitation: syscall lists are
architecture-specific.

bubblewrap combines Linux namespaces (PID, mount, network, user) with
an unprivileged setup — the same tool Flatpak uses for desktop app
sandboxing.

On macOS, Seatbelt (sandbox-exec) provides kernel-enforced sandboxing
via a Scheme-like profile language. Several community tools have been built
on this pattern for AI agent sandboxing:
agent-seatbelt-sandbox,
scode.

The advantage: near-zero overhead, no daemon process, no root required
(on Linux). The disadvantage: not a full machine — the agent shares the
host kernel and can't install system packages without host access.

gVisor: user-space kernel

gVisor interposes on syscalls in user space,
implementing a subset of the Linux kernel in Go. It sits between containers
and the host kernel, providing stronger isolation than runc without
requiring a full VM.

Google's
Agent Sandbox
— announced at KubeCon NA 2025 as a CNCF project — uses gVisor as its
foundation, with optional Kata Containers for workloads needing full
microVM isolation. Pre-booted warm pools and a declarative CRD API
target sub-second cold starts.

The tradeoff: 20–50% overhead on syscall-heavy workloads. For AI agents
running compilers and test suites, that's significant.

WebContainers: browser-native

StackBlitz WebContainers
run a WASM-based operating system entirely in the browser — no remote
server provisioned. The browser's own security sandbox handles isolation.
Bolt.new uses this to give AI
agents full control over a Node.js environment including filesystem,
package manager, and terminal, entirely client-side.

The hard limitation: Node.js and browser-compatible runtimes only. You
can't run Python, Rust, Go, or any native binary. For web development
agents, it's excellent. For general-purpose coding agents, it's a
non-starter.

Emerging platforms

Daytona pivoted from dev environments to
AI agent infrastructure in early 2025. Claims sub-27ms sandbox spin-up,
raised $24M Series A in early 2026. Supports process execution, filesystem,
native Git, and Computer Use sandboxes for desktop automation.

Modal offers a serverless container fabric
tested up to 1,000 sandbox creations per second. Used by Lovable and
Quora's Poe for AI code execution.

Morph Cloud focuses on instant environment
branching — snapshot-and-restore workflows where agents branch from a
known state, execute, then discard or persist.

Why WASM doesn't solve it

WASM is the obvious candidate for sandboxing: memory-safe execution with
microsecond overhead, capability-based I/O, and strong module isolation.
Wasmtime enforces that modules cannot access system resources without
explicit capability grants. In theory, perfect.

In practice, the boundary breaks as soon as you need it.

WASI capabilities punch holes in the sandbox. The moment you expose
filesystem or network capabilities to a WASM module — which you must, for
any agent that needs to read files, install packages, or make API calls —
the module has access to those resources. The sandbox is only as strong as
the capability grants, and useful agents need broad capabilities.

Resource exhaustion bypasses the capability model entirely. A 2025
security analysis showed that exposed WASI/WASIX interfaces allow
malicious modules to starve shared OS resources — CPU cycles, disk I/O,
bandwidth, entropy pools, kernel objects. Even with cgroups and quotas,
syscall floods can
degrade system performance by over 94%.
WASM isolates memory. It doesn't isolate compute.

Arbitrary tools can't run in WASM. An agent that runs cargo test,
pytest, gcc, or docker compose needs native binaries. Compiling
every tool and its dependencies to WASM is a massive operational burden —
and many tools (anything using raw syscalls, threads, or mmap) don't
compile at all. Pydantic's Monty is the most credible attempt at a
WASM-based Python sandbox, and it explicitly covers only a subset of
Python.

The I/O complexity problem. Agents need external data — files,
databases, APIs — to act. WASM strictly limits I/O. The result is complex
JavaScript glue code or custom host functions that effectively bypass
the security model you built WASM to enforce.

Microsoft's Wassette
(August 2025) is the most principled attempt — a Rust-based runtime
executing WASM Components via MCP with deny-by-default capabilities.
But it's limited to WASM Components, not arbitrary code execution.

The bottom line: WASM is excellent for plugin sandboxing — running
untrusted extensions in a controlled environment with minimal
capabilities. It is not a solution for general agent execution where the
agent needs to run arbitrary tools, install packages, and interact with
the host system.

The isolation hierarchy

[Interactive chart — see original post]
| Technology | Boot time | Memory overhead | Isolation | Root needed | Best for |
|---|---|---|---|---|---|
| Landlock/Seatbelt | ~1ms | near-zero | OS-level (shared kernel) | No | Local coding agents |
| WASM (Wasmtime) | microseconds | ~1 MiB | Language VM | No | Plugin sandboxing |
| Docker/runc | 500ms–2s | 10–50 MiB | Shared kernel + namespaces | Yes (daemon) | Legacy, dev environments |
| gVisor (runsc) | ~container | moderate | User-space kernel | Yes | Kubernetes workloads |
| Firecracker | ~125ms | under 5 MiB | Dedicated kernel | Yes (KVM) | Untrusted agent code at scale |
| Full VM (KVM/QEMU) | 1–30s | hundreds of MiB | Strongest | Yes | Maximum isolation |

The industry direction: for executing AI-generated code with untrusted
inputs at scale, the minimum viable isolation is a microVM (Firecracker,
libkrun, Kata Containers). For local developer tooling, OS-level
primitives (Landlock + seccomp + bubblewrap/Seatbelt) are practical and
require no daemon.

Docker sits in an awkward middle ground — more overhead than OS-level
primitives, less isolation than microVMs. Three runc CVEs in 2025 alone
have made it clear that shared-kernel container isolation is not sufficient
for untrusted code.

Open questions

This survey raises more questions than it answers. A few that stood out:

Can permission models be progressive rather than binary? The
restrictive/permissive policy trap suggests that all-or-nothing
controls don't work in practice. A permission system that starts
restrictive and widens based on observed behavior — earning trust
over a session — doesn't exist yet. What would it take to build one?

Is there a cross-platform sandbox primitive waiting to be built?
Every system maintaining three implementations (macOS/Linux/Windows)
is doing redundant work. Apple marking sandbox-exec as deprecated
adds urgency. A portable capability-based sandbox — something like
pledge/unveil but cross-platform —
would change the economics for every agent project. The Rust ecosystem
has pieces (extrasafe,
seccompiler,
cap-std) but no
unified abstraction.

How do you sandbox an agent that needs your browser? The user's
authenticated browser — cookies, extensions, OAuth tokens, session state
across dozens of services — is the single most valuable resource for
autonomous agents doing real-world tasks. Every sandboxing approach
either gives the agent full desktop access (dangerous) or a fresh
environment with no sessions (useless). Scoped OAuth tokens are a
partial answer, but most internal tools don't support programmatic
access. This might be the hardest unsolved problem in agent sandboxing.

What's the right granularity for agent permissions? Per-command
approval (Claude Code's default mode) is safe but slow. Per-session
blanket access (Aider) is fast but unsafe. Per-capability declarations
(the pledge model) sit in between but require the agent to know its
needs upfront. The approval gates pattern
from planning research suggests permission decisions should happen at
plan time, not execution time — but no system implements this yet.

We explore one possible design direction — a workspace-as-sandbox model
with opt-in host resource linking — in a
follow-up post.

DEV Community