lelandfy

Posted on Mar 16

Stop Writing Docker Wrappers for Your AI Agent's Code Execution

#python #rust #docker #ai

Every AI agent that executes code needs a sandbox. And teams building one often end up writing the same thing: a Python wrapper around subprocess.run(["docker", "run", ...]) with a growing list of security flags they keep forgetting to set.

The Problem

Here's what a typical "sandbox" looks like in most agent codebases:

import subprocess
import json

result = subprocess.run(
    ["docker", "run", "--rm", "--network=none",
     "--memory=512m", "--cpus=1",
     "--read-only", "--security-opt=no-new-privileges",
     "--pids-limit=64",
     "python:3.12-slim",
     "python3", "-c", "print('hello')"],
    capture_output=True, text=True, timeout=300
)
print(result.stdout)

This works. Until it doesn't:

Someone forgets --network=none and your agent starts making HTTP requests.
The timeout handling is a mess when Docker itself hangs
Parsing stdout/stderr gets fragile fast
Cleanup on crash? Good luck
Want to swap Docker for Firecracker? Rewrite everything

What We Built

Roche is a sandbox orchestrator that replaces all of that with:

from roche_sandbox import Roche

with Roche().create(image="python:3.12-slim") as sandbox:
    result = sandbox.exec(["python3", "-c", "print('hello')"])
    print(result.stdout)

That's it. The sandbox is created with secure defaults, the command runs, and the sandbox is destroyed when the context manager exits. Even if your code throws an exception.

What "Secure Defaults" Actually Means

When you call Roche().create() with no arguments, you get:

Setting	Default	Why
Network	Disabled	LLM-generated code should not make HTTP calls
Filesystem	Read-only	No persistent writes, no dropping payloads
Timeout	300 seconds	No infinite loops eating your CPU
PID limit	64	No fork bombs
Privileges	no-new-privileges	No privilege escalation

Every one of these can be overridden when you need to:

sandbox = roche.create(
    image="python:3.12-slim",
    network=True,       # enable network
    writable=True,      # writable filesystem
    timeout_secs=600,   # longer timeout
    memory="1g",        # memory limit
    cpus=2.0,           # CPU limit
)

But you have to opt in. Dangerous capabilities are never on by default.

Async Support

If you're building an async agent (most are), there's AsyncRoche:

from roche_sandbox import AsyncRoche

async def run_code(code: str) -> str:
    roche = AsyncRoche()
    async with (await roche.create()) as sandbox:
        result = await sandbox.exec(["python3", "-c", code])
        return result.stdout

Using It With Agent Frameworks

Roche doesn't care what framework you use. Here's a quick example with OpenAI Agents:

from agents import Agent, Runner, function_tool
from roche_sandbox import Roche

roche = Roche()

@function_tool
def execute_python(code: str) -> str:
    """Execute Python code in a secure sandbox."""
    with roche.create() as sandbox:
        result = sandbox.exec(["python3", "-c", code])
        if result.exit_code != 0:
            return f"Error:\n{result.stderr}"
        return result.stdout

agent = Agent(
    name="Coder",
    instructions="You can run Python code using execute_python.",
    tools=[execute_python],
)

Same pattern works with LangChain, CrewAI, Anthropic tool use, AutoGen, etc. The sandbox logic stays the same regardless of the framework.

Swapping Providers

The whole point of Roche is that provider choice is a config change, not a rewrite:

# Docker (default)
roche = Roche(provider="docker")

# Firecracker microVMs (stronger isolation)
roche = Roche(provider="firecracker")

# WebAssembly (lightweight, fast)
roche = Roche(provider="wasm")

Your create / exec / destroy calls don't change. The security defaults adjust per provider but stay safe.

Architecture (For the Curious)

The core is a Rust library (roche-core) with a SandboxProvider trait:

Your Code (Python/TS/Go)
    |
    v
SDK (roche-sandbox on PyPI)
    |
    v
CLI subprocess or gRPC daemon (roched)
    |
    v
roche-core (Rust)
    |
    v
Docker / Firecracker / WASM

The SDKs communicate with the Rust core either by shelling out to the roche CLI (zero setup) or through a gRPC daemon (roched) that adds sandbox pooling for faster acquisition.

You don't need to install Rust. pip install roche-sandbox is enough if you have Docker on your machine.

Getting Started

pip install roche-sandbox

from roche_sandbox import Roche

with Roche().create() as sandbox:
    out = sandbox.exec(["python3", "-c", "import sys; print(sys.version)"])
    print(out.stdout)

Requirements: Python 3.10+ and Docker.

Links

The whole thing is Apache-2.0. Contributions welcome.

Top comments (1)

klement Gunndu • Mar 17

The secure-by-default approach is smart. The number of agent codebases I've seen with network=none missing is alarming — forcing opt-in for dangerous capabilities is the right pattern.