DEV Community

lelandfy
lelandfy

Posted on

Stop Writing Docker Wrappers for Your AI Agent's Code Execution

Every AI agent that executes code needs a sandbox. And teams building one often end up writing the same thing: a Python wrapper around subprocess.run(["docker", "run", ...]) with a growing list of security flags they keep forgetting to set.

The Problem

Here's what a typical "sandbox" looks like in most agent codebases:

import subprocess
import json

result = subprocess.run(
    ["docker", "run", "--rm", "--network=none",
     "--memory=512m", "--cpus=1",
     "--read-only", "--security-opt=no-new-privileges",
     "--pids-limit=64",
     "python:3.12-slim",
     "python3", "-c", "print('hello')"],
    capture_output=True, text=True, timeout=300
)
print(result.stdout)
Enter fullscreen mode Exit fullscreen mode

This works. Until it doesn't:

  • Someone forgets --network=none and your agent starts making HTTP requests.
  • The timeout handling is a mess when Docker itself hangs
  • Parsing stdout/stderr gets fragile fast
  • Cleanup on crash? Good luck
  • Want to swap Docker for Firecracker? Rewrite everything

What We Built

Roche is a sandbox orchestrator that replaces all of that with:

from roche_sandbox import Roche

with Roche().create(image="python:3.12-slim") as sandbox:
    result = sandbox.exec(["python3", "-c", "print('hello')"])
    print(result.stdout)
Enter fullscreen mode Exit fullscreen mode

That's it. The sandbox is created with secure defaults, the command runs, and the sandbox is destroyed when the context manager exits. Even if your code throws an exception.

What "Secure Defaults" Actually Means

When you call Roche().create() with no arguments, you get:

Setting Default Why
Network Disabled LLM-generated code should not make HTTP calls
Filesystem Read-only No persistent writes, no dropping payloads
Timeout 300 seconds No infinite loops eating your CPU
PID limit 64 No fork bombs
Privileges no-new-privileges No privilege escalation

Every one of these can be overridden when you need to:

sandbox = roche.create(
    image="python:3.12-slim",
    network=True,       # enable network
    writable=True,      # writable filesystem
    timeout_secs=600,   # longer timeout
    memory="1g",        # memory limit
    cpus=2.0,           # CPU limit
)
Enter fullscreen mode Exit fullscreen mode

But you have to opt in. Dangerous capabilities are never on by default.

Async Support

If you're building an async agent (most are), there's AsyncRoche:

from roche_sandbox import AsyncRoche

async def run_code(code: str) -> str:
    roche = AsyncRoche()
    async with (await roche.create()) as sandbox:
        result = await sandbox.exec(["python3", "-c", code])
        return result.stdout
Enter fullscreen mode Exit fullscreen mode

Using It With Agent Frameworks

Roche doesn't care what framework you use. Here's a quick example with OpenAI Agents:

from agents import Agent, Runner, function_tool
from roche_sandbox import Roche

roche = Roche()

@function_tool
def execute_python(code: str) -> str:
    """Execute Python code in a secure sandbox."""
    with roche.create() as sandbox:
        result = sandbox.exec(["python3", "-c", code])
        if result.exit_code != 0:
            return f"Error:\n{result.stderr}"
        return result.stdout

agent = Agent(
    name="Coder",
    instructions="You can run Python code using execute_python.",
    tools=[execute_python],
)
Enter fullscreen mode Exit fullscreen mode

Same pattern works with LangChain, CrewAI, Anthropic tool use, AutoGen, etc. The sandbox logic stays the same regardless of the framework.

Swapping Providers

The whole point of Roche is that provider choice is a config change, not a rewrite:

# Docker (default)
roche = Roche(provider="docker")

# Firecracker microVMs (stronger isolation)
roche = Roche(provider="firecracker")

# WebAssembly (lightweight, fast)
roche = Roche(provider="wasm")
Enter fullscreen mode Exit fullscreen mode

Your create / exec / destroy calls don't change. The security defaults adjust per provider but stay safe.

Architecture (For the Curious)

The core is a Rust library (roche-core) with a SandboxProvider trait:

Your Code (Python/TS/Go)
    |
    v
SDK (roche-sandbox on PyPI)
    |
    v
CLI subprocess or gRPC daemon (roched)
    |
    v
roche-core (Rust)
    |
    v
Docker / Firecracker / WASM
Enter fullscreen mode Exit fullscreen mode

The SDKs communicate with the Rust core either by shelling out to the roche CLI (zero setup) or through a gRPC daemon (roched) that adds sandbox pooling for faster acquisition.

You don't need to install Rust. pip install roche-sandbox is enough if you have Docker on your machine.

Getting Started

pip install roche-sandbox
Enter fullscreen mode Exit fullscreen mode
from roche_sandbox import Roche

with Roche().create() as sandbox:
    out = sandbox.exec(["python3", "-c", "import sys; print(sys.version)"])
    print(out.stdout)
Enter fullscreen mode Exit fullscreen mode

Requirements: Python 3.10+ and Docker.

Links

The whole thing is Apache-2.0. Contributions welcome.

Top comments (0)