DEV Community

AttractivePenguin
AttractivePenguin

Posted on

OpenSandbox: Build AI Agents That Actually Execute Code (Free & Open Source)

OpenSandbox: Build AI Agents That Actually Execute Code (Free & Open Source)

The Problem Every AI Developer Faces

You want to build an AI coding agent. You have the LLM. You have the prompt engineering. But here's the uncomfortable truth: your agent can't actually do anything.

It can write code. It can explain code. But execute? Run tests? Interact with a browser? That requires infrastructure. And most options are either:

  • Expensive: Third-party sandbox APIs charge per-minute (e.g., E2B at $0.04/min)
  • Complex: Rolling your own Docker isolation takes weeks of devops work
  • Insecure: Running agent code on your own infrastructure risks system compromise

This is the "execution layer" problem in the AI agent stack—and Alibaba just open-sourced the solution.


What is OpenSandbox?

OpenSandbox is an open-source (Apache 2.0) framework that provides AI agents with secure, isolated environments for:

  • Code execution (Python, TypeScript, Java/Kotlin)
  • Browser automation (full Chrome/Playwright support)
  • GUI interaction (full VNC desktop access)
  • RL training (isolated reinforcement learning environments)

Think of it as giving your AI agent its own sandboxed computer to work in—completely isolated from your host system, with a unified API that works regardless of language or deployment scale.

Why This Matters

Current sandbox solutions are either:

  1. Proprietary & expensive (E2B, CodeInterpreter)
  2. Complex to self-host (manual Docker + networking + security)
  3. Not designed for agents (local dev environments)

OpenSandbox solves all three. It's:

  • ✅ Open source (Apache 2.0)
  • ✅ Free to self-host
  • ✅ Built specifically for AI agents
  • ✅ Scales from laptop to Kubernetes cluster

Getting Started: Local Development

Let's build a working example. We'll create a simple agent that can:

  1. Receive a task
  2. Execute Python code in a sandbox
  3. Return results

Step 1: Install the Server

# Install OpenSandbox server
pip install opensandbox-server

# Initialize configuration
opensandbox-server init-config

# Start the server (runs on port 8000 by default)
opensandbox-server
Enter fullscreen mode Exit fullscreen mode

The server starts a FastAPI instance that manages sandbox lifecycle via Docker.

Step 2: Install the Python SDK

pip install opensandbox
Enter fullscreen mode Exit fullscreen mode

Step 3: Create Your First Sandbox

from opensandbox import SandboxClient

# Connect to your local server
client = SandboxClient("http://localhost:8000")

# Create a coding agent sandbox
sandbox = client.create_sandbox(
    sandbox_type="coding",  # Options: coding, gui, code-execution, rl-training
    runtime="docker"        # Options: docker (local), kubernetes (production)
)

print(f"Sandbox created: {sandbox.id}")
# Output: Sandbox created: sb-abc123xyz
Enter fullscreen mode Exit fullscreen mode

Step 4: Execute Code

# Execute Python code in the sandbox
result = sandbox.execute_code("""
import numpy as np
import json

# Simulate some data processing
data = np.random.rand(100, 5)
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)

result = {
    "mean": mean.tolist(),
    "std": std.tolist(),
    "shape": data.shape
}

print(json.dumps(result))
""")

print(result.stdout)
# {"mean": [0.52, 0.48, 0.51, 0.49, 0.50], "std": [0.29, 0.28, 0.30, 0.29, 0.29], "shape": [100, 5]}
Enter fullscreen mode Exit fullscreen mode

Step 5: Real-World Example - Web Scraping + Analysis

This is where it gets interesting. Your agent can combine tools:

# Create sandbox with browser automation
sandbox = client.create_sandbox(
    sandbox_type="coding",
    enable_browser=True  # Injects Playwright/Chrome
)

# Task: Fetch data, analyze it, return results
task = """
import asyncio
from playwright.async_api import async_playwright
import json

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        # Navigate to a page
        await page.goto(https://example.com)

        # Get content
        title = await page.title()
        content = await page.content()

        await browser.close()

        return {"title": title, "content_length": len(content)}

result = asyncio.run(main())
print(json.dumps(result, indent=2))
"""

result = sandbox.execute_code(task)
print(result.stdout)
Enter fullscreen mode Exit fullscreen mode

All of this happens in isolation. The agent can scrape, process, analyze—without touching your system.


Connecting to AI Frameworks

OpenSandbox integrates with popular agent frameworks:

LangGraph Integration

from langgraph.prebuilt import create_react_agent
from opensandbox import SandboxClient

# Wrap OpenSandbox as a LangChain tool
def code_executor(code: str) -> str:
    client = SandboxClient("http://localhost:8000")
    sandbox = client.create_sandbox(sandbox_type="coding")
    result = sandbox.execute_code(code)
    return result.stdout

# Create agent with code execution tool
agent = create_react_agent(
    llm,
    [code_executor]
)

# Now your agent can write AND execute code
result = agent.invoke({
    "messages": [{"role": "user", "content": "Calculate fibonacci(20) and return the result"}]
})
Enter fullscreen mode Exit fullscreen mode

Claude Code / Gemini CLI Integration

OpenSandbox provides native compatibility with:

  • Claude Code
  • Gemini CLI
  • OpenAI Codex
  • Google ADK

This means you can extend existing coding agents with secure execution environments without reinventing the wheel.


Deploying to Production (Kubernetes)

When you're ready to scale from dev to production:

# kubernetes-deployment.yaml
apiVersion: v1
kind: Deployment
metadata:
  name: opensandbox-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: opensandbox
  template:
    spec:
      containers:
      - name: server
        image: alibaba/opensandbox-server:latest
        ports:
        - containerPort: 8000
        resources:
          limits:
            memory: "2Gi"
            cpu: "1000m"
Enter fullscreen mode Exit fullscreen mode

The same API works whether you're running locally with Docker or at scale on Kubernetes. No code changes needed.


Limitations & Considerations

OpenSandbox is powerful, but understand the trade-offs:

Aspect Consideration
Self-hosting You manage infrastructure (vs. managed E2B)
Cold starts New sandboxes take seconds to initialize
Resource limits Configure CPU/memory caps per sandbox
Security Still need to sanitize prompts—agents can write any code
Scope Python/TypeScript/Java only (Go/C# coming)

When to Use What?

Use Case Recommended Solution
Prototyping AI agents locally OpenSandbox (free)
Production AI app needing code execution E2B or OpenSandbox on Kubernetes
Simple script execution OpenAI Function Calling (no sandbox needed)
Full browser automation OpenSandbox + Playwright

TL;DR

  • OpenSandbox is Alibaba's open-source solution for AI agent code execution
  • Provides secure, isolated sandboxes via unified API
  • Works locally (Docker) or at scale (Kubernetes)
  • Integrates with LangGraph, Claude Code, Gemini CLI, and more
  • Free to self-host (Apache 2.0 license)

If you're building AI coding agents and currently struggling with execution infrastructure, this is worth a serious look.

GitHub: alibaba/OpenSandbox
Docs: open-sandbox.ai


Have you tried OpenSandbox? Found interesting use cases? Drop a comment below.

Top comments (0)