AttractivePenguin

Posted on Mar 11

OpenSandbox: Build AI Agents That Actually Execute Code (Free & Open Source)

#ai #opensource #agents #llm

OpenSandbox: Build AI Agents That Actually Execute Code (Free & Open Source)

The Problem Every AI Developer Faces

You want to build an AI coding agent. You have the LLM. You have the prompt engineering. But here's the uncomfortable truth: your agent can't actually do anything.

It can write code. It can explain code. But execute? Run tests? Interact with a browser? That requires infrastructure. And most options are either:

Expensive: Third-party sandbox APIs charge per-minute (e.g., E2B at $0.04/min)
Complex: Rolling your own Docker isolation takes weeks of devops work
Insecure: Running agent code on your own infrastructure risks system compromise

This is the "execution layer" problem in the AI agent stack—and Alibaba just open-sourced the solution.

What is OpenSandbox?

OpenSandbox is an open-source (Apache 2.0) framework that provides AI agents with secure, isolated environments for:

Code execution (Python, TypeScript, Java/Kotlin)
Browser automation (full Chrome/Playwright support)
GUI interaction (full VNC desktop access)
RL training (isolated reinforcement learning environments)

Think of it as giving your AI agent its own sandboxed computer to work in—completely isolated from your host system, with a unified API that works regardless of language or deployment scale.

Why This Matters

Current sandbox solutions are either:

Proprietary & expensive (E2B, CodeInterpreter)
Complex to self-host (manual Docker + networking + security)
Not designed for agents (local dev environments)

OpenSandbox solves all three. It's:

✅ Open source (Apache 2.0)
✅ Free to self-host
✅ Built specifically for AI agents
✅ Scales from laptop to Kubernetes cluster

Getting Started: Local Development

Let's build a working example. We'll create a simple agent that can:

Receive a task
Execute Python code in a sandbox
Return results

Step 1: Install the Server

# Install OpenSandbox server
pip install opensandbox-server

# Initialize configuration
opensandbox-server init-config

# Start the server (runs on port 8000 by default)
opensandbox-server

The server starts a FastAPI instance that manages sandbox lifecycle via Docker.

Step 2: Install the Python SDK

pip install opensandbox

Step 3: Create Your First Sandbox

from opensandbox import SandboxClient

# Connect to your local server
client = SandboxClient("http://localhost:8000")

# Create a coding agent sandbox
sandbox = client.create_sandbox(
    sandbox_type="coding",  # Options: coding, gui, code-execution, rl-training
    runtime="docker"        # Options: docker (local), kubernetes (production)
)

print(f"Sandbox created: {sandbox.id}")
# Output: Sandbox created: sb-abc123xyz

Step 4: Execute Code

# Execute Python code in the sandbox
result = sandbox.execute_code("""
import numpy as np
import json

# Simulate some data processing
data = np.random.rand(100, 5)
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)

result = {
    "mean": mean.tolist(),
    "std": std.tolist(),
    "shape": data.shape
}

print(json.dumps(result))
""")

print(result.stdout)
# {"mean": [0.52, 0.48, 0.51, 0.49, 0.50], "std": [0.29, 0.28, 0.30, 0.29, 0.29], "shape": [100, 5]}

Step 5: Real-World Example - Web Scraping + Analysis

This is where it gets interesting. Your agent can combine tools:

# Create sandbox with browser automation
sandbox = client.create_sandbox(
    sandbox_type="coding",
    enable_browser=True  # Injects Playwright/Chrome
)

# Task: Fetch data, analyze it, return results
task = """
import asyncio
from playwright.async_api import async_playwright
import json

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        # Navigate to a page
        await page.goto(https://example.com)

        # Get content
        title = await page.title()
        content = await page.content()

        await browser.close()

        return {"title": title, "content_length": len(content)}

result = asyncio.run(main())
print(json.dumps(result, indent=2))
"""

result = sandbox.execute_code(task)
print(result.stdout)

All of this happens in isolation. The agent can scrape, process, analyze—without touching your system.

Connecting to AI Frameworks

OpenSandbox integrates with popular agent frameworks:

LangGraph Integration

from langgraph.prebuilt import create_react_agent
from opensandbox import SandboxClient

# Wrap OpenSandbox as a LangChain tool
def code_executor(code: str) -> str:
    client = SandboxClient("http://localhost:8000")
    sandbox = client.create_sandbox(sandbox_type="coding")
    result = sandbox.execute_code(code)
    return result.stdout

# Create agent with code execution tool
agent = create_react_agent(
    llm,
    [code_executor]
)

# Now your agent can write AND execute code
result = agent.invoke({
    "messages": [{"role": "user", "content": "Calculate fibonacci(20) and return the result"}]
})

Claude Code / Gemini CLI Integration

OpenSandbox provides native compatibility with:

Claude Code
Gemini CLI
OpenAI Codex
Google ADK

This means you can extend existing coding agents with secure execution environments without reinventing the wheel.

Deploying to Production (Kubernetes)

When you're ready to scale from dev to production:

# kubernetes-deployment.yaml
apiVersion: v1
kind: Deployment
metadata:
  name: opensandbox-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: opensandbox
  template:
    spec:
      containers:
      - name: server
        image: alibaba/opensandbox-server:latest
        ports:
        - containerPort: 8000
        resources:
          limits:
            memory: "2Gi"
            cpu: "1000m"

The same API works whether you're running locally with Docker or at scale on Kubernetes. No code changes needed.

Limitations & Considerations

OpenSandbox is powerful, but understand the trade-offs:

Aspect	Consideration
Self-hosting	You manage infrastructure (vs. managed E2B)
Cold starts	New sandboxes take seconds to initialize
Resource limits	Configure CPU/memory caps per sandbox
Security	Still need to sanitize prompts—agents can write any code
Scope	Python/TypeScript/Java only (Go/C# coming)

When to Use What?

Use Case	Recommended Solution
Prototyping AI agents locally	OpenSandbox (free)
Production AI app needing code execution	E2B or OpenSandbox on Kubernetes
Simple script execution	OpenAI Function Calling (no sandbox needed)
Full browser automation	OpenSandbox + Playwright

TL;DR

OpenSandbox is Alibaba's open-source solution for AI agent code execution
Provides secure, isolated sandboxes via unified API
Works locally (Docker) or at scale (Kubernetes)
Integrates with LangGraph, Claude Code, Gemini CLI, and more
Free to self-host (Apache 2.0 license)

If you're building AI coding agents and currently struggling with execution infrastructure, this is worth a serious look.

GitHub: alibaba/OpenSandbox
Docs: open-sandbox.ai

Have you tried OpenSandbox? Found interesting use cases? Drop a comment below.

DEV Community

OpenSandbox: Build AI Agents That Actually Execute Code (Free & Open Source)

OpenSandbox: Build AI Agents That Actually Execute Code (Free & Open Source)

The Problem Every AI Developer Faces

What is OpenSandbox?

Why This Matters

Getting Started: Local Development

Step 1: Install the Server

Step 2: Install the Python SDK

Step 3: Create Your First Sandbox

Step 4: Execute Code

Step 5: Real-World Example - Web Scraping + Analysis

Connecting to AI Frameworks

LangGraph Integration

Claude Code / Gemini CLI Integration

Deploying to Production (Kubernetes)

Limitations & Considerations

When to Use What?

TL;DR

Top comments (0)