Alan West

Posted on May 24

How to sandbox AI coding agents without crippling them

#security #ai #docker #devops

The Problem: Your AI Agent Has Root

A few months back I was helping a team set up a self-hosted AI coding agent. Standard setup — an LLM with tool access, running on a shared dev server, able to read files, execute commands, hit APIs. The usual.

Then someone ran a prompt that included pasted output from an untrusted webpage. The agent dutifully interpreted some embedded instructions and started rm -rf'ing a directory it had no business touching.

Nothing critical was lost. But it could have been.

This is the dirty secret of running agents that execute code — by default, they run with whatever permissions your process has. If that process is your dev environment, your agent has access to your SSH keys, your cloud credentials, your git history. Everything.

Let me walk through how to actually sandbox these things properly.

Why "Just Use Docker" Isn't Enough

The obvious answer is to stick the agent in a container. And yes, that's a start. But naive Docker setups still have:

Root inside the container by default (escapable through several known paths)
Full network access to your internal services
Bind mounts you didn't think hard enough about
No syscall filtering — kernel exploits exist

I've seen "sandboxed" setups where docker.sock was mounted in for convenience. That's not a sandbox. That's a hot tub.

The Layered Approach

The way I've come around to thinking about this: defense in depth. Each layer assumes the previous one was bypassed.

Layer 1: Drop root

Containers should not run as root. Basic, but skipped constantly.

FROM ubuntu:22.04

# Dedicated user, no sudo, no shell escalation
RUN useradd -m -s /bin/bash agent

# Switch BEFORE any app setup so caches/files are owned correctly
USER agent
WORKDIR /home/agent

COPY --chown=agent:agent ./app /home/agent/app

Layer 2: User namespaces

Even as non-root inside the container, you want the container's UIDs remapped on the host. So even if the agent somehow becomes root in the container, it's an unprivileged UID on the outside.

Configure it in /etc/docker/daemon.json:

{
  "userns-remap": "default"
}

After restarting the daemon, container UIDs get shifted to a high host range. A "root" process inside has zero privileges against the host filesystem. See the Docker user namespace docs for the full setup.

Layer 3: Seccomp filtering

This is the layer most people skip. seccomp lets you whitelist syscalls — so even if the agent compromises the container, it can't make syscalls you haven't allowed.

Docker ships a default seccomp profile that blocks around 40 dangerous syscalls. For agent workloads I tighten it further:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "syscalls": [
    {
      "names": [
        "read", "write", "open", "openat", "close",
        "stat", "fstat", "lstat", "mmap", "brk",
        "rt_sigaction", "execve", "exit", "exit_group",
        "futex", "clone", "fork", "wait4"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Run it like this:

docker run \
  --security-opt seccomp=./agent-seccomp.json \
  --security-opt no-new-privileges \
  --cap-drop=ALL \
  agent-image

--cap-drop=ALL strips every Linux capability. --no-new-privileges blocks setuid binaries from elevating. Together they shrink the attack surface inside the container down to almost nothing.

Layer 4: Network egress control

Agents need to make HTTP calls. They do not need to scan your internal network.

The cleanest pattern I've found is routing the container through a proxy that whitelists destinations:

# docker-compose.yml
services:
  agent:
    image: agent-image
    # Agent shares the proxy's network namespace — no direct egress
    network_mode: "service:proxy"

  proxy:
    image: nginx:alpine
    volumes:
      - ./proxy.conf:/etc/nginx/nginx.conf:ro
    networks:
      - egress

networks:
  egress:
    driver: bridge

The proxy allows only the endpoints the agent legitimately needs. The agent has no network interface of its own — every packet has to go through nginx, which has to recognize the host.

Layer 5: Filesystem isolation

Mount points are where I see the most mistakes. The agent needs to work on code, but the principle is: mount exactly what's needed, read-only where possible, and never anything sensitive.

docker run \
  --read-only \                              # Root FS is immutable
  --tmpfs /tmp:size=100M \                   # Scratch space, capped
  -v "$PROJECT_DIR:/workspace:rw" \          # The actual work dir
  -v "$PROMPT_FILE:/input/prompt:ro" \       # Read-only inputs
  agent-image

Notice what is NOT mounted: no ~/.ssh, no ~/.aws, no docker.sock, no parent directories that happen to contain a .env file.

Handling Multi-Session Workloads

If multiple developers share the agent infrastructure, isolation between sessions becomes its own problem. The fix is straightforward — one container per session, lifecycle tied to the session.

import uuid
import subprocess

def start_agent_session(user_id: str, project_path: str) -> str:
    session_id = str(uuid.uuid4())
    container_name = f"agent-{user_id}-{session_id}"

    subprocess.run([
        "docker", "run", "-d",
        "--name", container_name,
        "--rm",                              # Auto-cleanup on stop
        "--memory", "2g",                    # Hard memory cap
        "--cpus", "1.0",                     # CPU quota
        "--pids-limit", "100",               # Prevent fork bombs
        "-v", f"{project_path}:/workspace",
        "agent-image",
    ], check=True)

    return session_id

The cgroup limits (--memory, --cpus, --pids-limit) are the unsung heroes. Without them, one runaway agent can take down the host. I learned this one the hard way after an agent got stuck in a loop spawning subprocesses.

Prevention Tips

A few things I've learned that weren't obvious to me at first:

Treat the agent's environment as untrusted. Anything in its filesystem or env vars can be exfiltrated via prompt injection.
Audit your mounts every single time. Bind mounts are the #1 source of escapes I've actually witnessed.
Log every command the agent runs. When something goes wrong, you'll want the trail.
Set timeouts on everything. Agents that should take 30 seconds sometimes try to run for 30 hours. Kill them.
Use ephemeral containers. Reusing the same container across sessions invites state pollution and credential leakage between users.

The mental shift that helped me — stop thinking of the agent as "code I trust running on infra I trust." Think of it as a stranger you handed a terminal. Then design accordingly.

The layers above won't make an agent invulnerable. But they'll turn a single bad prompt from a catastrophe into a footnote.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.