Jangwook Kim

Posted on May 6 • Originally published at effloow.com

E2B Sandbox: Secure Code Execution for AI Agents

#e2b #aiagents #sandboxing #codeexecution

Every team building a code-executing AI agent eventually runs into the same wall: where does the model-generated code actually run? Running it on your host server is a liability. Running it inside Docker is better but not enough for untrusted LLM output at scale. E2B was built specifically to solve that problem, and in 2026 it became the de facto sandbox layer for production agent systems — including as a native provider in OpenAI's Agents SDK.

This guide covers what E2B is, how it works, how to integrate it with Python and the major LLM providers, and when it makes sense compared to alternatives like Modal and Daytona.

Effloow Lab note: Effloow Lab installed e2b-code-interpreter==2.6.2 and e2b==2.20.3, inspected the SDK surface, and confirmed the API design. Live sandbox execution requires an E2B API key; code examples below are sourced from official E2B documentation and the e2b-cookbook repository.

Why Sandboxed Code Execution Matters

When an AI agent generates and executes code, several things can go wrong that simply don't happen with human-written code:

The model generates import subprocess; subprocess.run(["rm", "-rf", "/"]) based on a prompt-injection attack
A data analysis agent leaks API keys from environment variables into output files
A code-generating chatbot causes a denial of service by spawning infinite threads
Cross-tenant attacks expose one user's data to another in a multi-tenant deployment

Traditional containerization mitigates some of these, but containers share the host kernel — a kernel exploit in model-generated code could affect other workloads. The security gap between "containerized" and "isolated" matters when the code author is an LLM responding to arbitrary user input.

E2B solves this with Firecracker microVMs: each sandbox runs inside its own lightweight virtual machine with a separate kernel, hardware-level isolation, and no shared attack surface between tenants.

What E2B Is

E2B (short for "Environment to Build") is an open-source cloud infrastructure platform for running AI-generated code in isolated sandboxes. Each sandbox is a microVM that:

Initializes in approximately 150ms
Runs a Jupyter kernel for interactive, stateful code execution
Supports Python, JavaScript/TypeScript, and other languages
Provides filesystem access, internet access, and package installation
Can run for up to 24 hours per session

The platform is LLM-agnostic: you bring your own model (Claude, GPT, Gemini, Llama) and use E2B for the execution layer. It ships Python and TypeScript/JavaScript SDKs.

Key numbers as of May 2026:

~8,900 GitHub stars on the main E2B repository
Apache 2.0 license (open-source core)
Native sandbox provider in OpenAI Agents SDK
Used by roughly half of the Fortune 500 for agent workloads

SDK Installation and Setup

Install the Python SDK:

pip install e2b-code-interpreter

This installs e2b-code-interpreter==2.6.2 and e2b==2.20.3 (as of May 2026). Get your API key from the E2B dashboard at e2b.dev/dashboard under the Team tab. Set it as an environment variable:

export E2B_API_KEY="e2b_..."

Or pass it directly in code:

from e2b_code_interpreter import Sandbox

with Sandbox.create(api_key="e2b_...") as sandbox:
    execution = sandbox.run_code("print('hello from sandbox')")
    print(execution.text)  # "hello from sandbox"

The with statement ensures the sandbox is killed when the block exits. For long-running agents, you can manage lifecycle manually:

sandbox = Sandbox.create()
# ... do work ...
sandbox.kill()

The run_code API

The run_code method is the core of E2B's code interpreter. Its full signature:

def run_code(
    self,
    code: str,
    language: Optional[str] = None,
    context: Optional[Context] = None,
    on_stdout: Optional[Callable[[OutputMessage], Any]] = None,
    on_stderr: Optional[Callable[[OutputMessage], Any]] = None,
    on_result: Optional[Callable[[Result], Any]] = None,
    on_error: Optional[Callable[[ExecutionError], Any]] = None,
    envs: Optional[Dict[str, str]] = None,
    timeout: Optional[float] = None,
    request_timeout: Optional[float] = None,
) -> Execution

The streaming callbacks — on_stdout, on_result, on_error — are particularly useful for agents that need to stream execution feedback to end users in real time rather than waiting for the full result.

A basic example with streaming output:

from e2b_code_interpreter import Sandbox

def handle_stdout(msg):
    print(f"[sandbox stdout] {msg.line}")

with Sandbox.create() as sandbox:
    # Stateful: variables persist within the same sandbox session
    sandbox.run_code("import pandas as pd\nimport numpy as np")

    execution = sandbox.run_code(
        """
df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
print(f"Shape: {df.shape}")
print(df.describe())
        """,
        on_stdout=handle_stdout,
    )

    if execution.error:
        print(f"Error: {execution.error.name}: {execution.error.value}")
    else:
        print("Final text output:", execution.text)

Because the sandbox runs a Jupyter kernel, state is preserved between run_code calls within the same session. Imports, variables, and loaded data persist until the sandbox is killed or a new context is created.

Integrating E2B with Claude (Anthropic)

E2B maintains an official cookbook with examples for all major providers. The Anthropic pattern uses Claude's tool use API to generate code, then executes it in an E2B sandbox:

import anthropic
from e2b_code_interpreter import Sandbox

client = anthropic.Anthropic()

CODE_INTERPRETER_TOOL = {
    "name": "execute_python",
    "description": "Execute Python code in a secure sandbox. Use for data analysis, calculations, and visualization.",
    "input_schema": {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "Python code to execute"
            }
        },
        "required": ["code"]
    }
}

def run_agent_with_sandbox(user_message: str):
    with Sandbox.create() as sandbox:
        messages = [{"role": "user", "content": user_message}]

        while True:
            response = client.messages.create(
                model="claude-opus-4-7",
                max_tokens=4096,
                tools=[CODE_INTERPRETER_TOOL],
                messages=messages
            )

            if response.stop_reason == "end_turn":
                # Extract final text response
                for block in response.content:
                    if hasattr(block, "text"):
                        return block.text
                break

            # Handle tool use
            tool_results = []
            for block in response.content:
                if block.type == "tool_use" and block.name == "execute_python":
                    execution = sandbox.run_code(block.input["code"])
                    result = execution.text if not execution.error else f"Error: {execution.error.name}: {execution.error.value}"
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            # Append assistant turn and tool results
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

result = run_agent_with_sandbox(
    "Load the iris dataset from sklearn, compute correlation matrix, and summarize key findings."
)
print(result)

This pattern gives Claude persistent, stateful code execution that spans multiple tool calls within a single agent turn. The sandbox outlives any individual request and carries state across the full conversation.

Integrating E2B with OpenAI Agents SDK

Since April 2026, E2B is a native sandbox provider in OpenAI's Agents SDK. The integration lets you declare E2B as the execution environment directly in your agent configuration:

from agents import Agent, Runner
from agents.sandbox.e2b import E2BSandbox

agent = Agent(
    name="DataAnalystAgent",
    instructions="You are a data analysis agent. Use the code interpreter to analyze data.",
    sandbox=E2BSandbox(api_key="e2b_..."),
)

result = Runner.run_sync(agent, "Analyze the distribution of AAPL stock returns in 2025")
print(result.final_output)

The Agents SDK handles the full lifecycle: spinning up the sandbox, routing code execution requests to it, and tearing it down when the session ends.

Execution Isolation With Code Contexts

For agents that need to run code from multiple untrusted users in the same sandbox session (e.g., a multi-tenant chatbot), E2B provides isolated code contexts:

with Sandbox.create() as sandbox:
    # Create isolated contexts for two users
    ctx_alice = sandbox.create_code_context()
    ctx_bob = sandbox.create_code_context()

    # Alice's code doesn't affect Bob's namespace
    sandbox.run_code("secret = 'alice_secret_value'", context=ctx_alice)
    sandbox.run_code("secret = 'bob_secret_value'", context=ctx_bob)

    # Each context has its own namespace
    result_alice = sandbox.run_code("print(secret)", context=ctx_alice)
    result_bob = sandbox.run_code("print(secret)", context=ctx_bob)

    print(result_alice.text)  # "alice_secret_value"
    print(result_bob.text)    # "bob_secret_value"

This is useful when you want to avoid spinning up a new sandbox (and paying the cold start cost) for every user while still preventing state leakage.

Snapshot and Persistence

E2B sandboxes can be snapshotted and restored, which matters for agents with expensive setup steps (installing packages, loading large models, warming up caches):

with Sandbox.create() as sandbox:
    # Expensive one-time setup
    sandbox.run_code("pip install -q scikit-learn matplotlib seaborn pandas")
    sandbox.run_code("from sklearn.datasets import load_iris; iris = load_iris()")

    # Save current state
    snapshot = sandbox.create_snapshot()
    snapshot_id = snapshot.snapshot_id
    print(f"Snapshot saved: {snapshot_id}")

# Later — reconnect to the pre-warmed state
with Sandbox.create(snapshot=snapshot_id) as pre_warmed_sandbox:
    result = pre_warmed_sandbox.run_code("print(iris.target_names)")
    print(result.text)  # ['setosa', 'versicolor', 'virginica']

Snapshots eliminate repeated install costs for workloads like data analysis agents that always need the same libraries pre-loaded.

MCP Server Integration

E2B 2.x ships a built-in MCP (Model Context Protocol) server interface, allowing any MCP-compatible agent to use E2B sandboxes as tool servers:

from e2b_code_interpreter import Sandbox, McpServer

with Sandbox.create() as sandbox:
    # Get MCP-compatible URL and token for this sandbox
    mcp_url = sandbox.get_mcp_url()
    mcp_token = sandbox.get_mcp_token()
    print(f"MCP endpoint: {mcp_url}")

This means LangGraph agents using langchain-mcp-adapters, ADK agents, or any other MCP-capable framework can connect to E2B sandboxes using the standard MCP protocol without custom integration code.

E2B also ships a GitHubMcpServer class for agents that need to interact with GitHub repositories inside a sandbox:

from e2b_code_interpreter import Sandbox, GitHubMcpServer

github_mcp = GitHubMcpServer(token="ghp_...")
with Sandbox.create(mcp_servers=[github_mcp]) as sandbox:
    # Agent can now read/write GitHub repos inside the sandbox
    result = sandbox.run_code("# list files from connected GitHub repo")

Async Support

For production FastAPI or async Python applications, use AsyncSandbox:

import asyncio
from e2b_code_interpreter import AsyncSandbox

async def analyze_data(user_code: str) -> str:
    async with AsyncSandbox.create() as sandbox:
        execution = await sandbox.run_code(user_code)
        if execution.error:
            return f"Error: {execution.error.name}"
        return execution.text

result = asyncio.run(analyze_data("print(sum(range(1000)))"))
print(result)  # "499500"

The async API mirrors the sync one completely. Use AsyncSandbox whenever your application is already async (FastAPI, Starlette, etc.) to avoid blocking the event loop on sandbox operations.

E2B vs. Alternatives

Feature	E2B	Modal	Daytona	Local Docker
Isolation	Firecracker microVM	gVisor (syscall intercept)	Docker container	Docker container
Cold start	~150ms	~200ms	27–90ms	1–5s
Purpose-built for AI	Yes	General compute + AI	Dev workspace	No
Free tier	$100 credit, no CC	$30 credit/mo	Limited	Self-managed
Cost per hour (1 vCPU)	~$0.05	~$0.06	~$0.067	Your infra cost
Session max duration	24h	Configurable	Configurable	Unlimited
OpenAI Agents SDK	Native provider	Native provider	Native provider	No
MCP server	Built-in	No	No	No
Best for	AI agent code execution	General inference + batch	Dev workspaces	Local dev only

The key distinction: E2B is purpose-built for untrusted LLM code execution with Firecracker microVM isolation. Modal runs on gVisor and sits inside a broader compute platform (inference, training, batch). Daytona comes from a developer workspace perspective — it's fast but designed for persistent environments, not ephemeral code execution from untrusted sources.

For production agents executing LLM-generated code, E2B's microVM boundary is the right default. Switch to Modal if you need to run the same platform for inference and execution. Use Daytona if your agents need persistent dev environment sessions.

Pricing

E2B charges per second of running sandbox time:

Hobby (free): $100 usage credit, no credit card required. 1-hour max sessions, 20 concurrent sandboxes.
Pro ($150/month): 24-hour sessions, more concurrency, custom CPU/RAM.
Enterprise (custom, $3,000/month minimum): SLAs, dedicated infrastructure, SSO.

At $0.000168/second for a 1 vCPU sandbox, the free $100 credit covers approximately 595,000 seconds of execution — plenty for building and testing an agent prototype.

A rough production estimate: 30-second average sandbox run, 100,000 runs/month ≈ $504/month in sandbox costs. Add your LLM API costs on top.

FAQ

Q: Does E2B work with local/self-hosted models like Ollama or vLLM?

Yes. E2B is completely LLM-agnostic — it handles the execution layer only. You point your agent at any model (Ollama, vLLM, Groq, self-hosted) and use E2B to run the code that model generates. The SDK doesn't care what produced the code string.

Q: Can I install custom packages in an E2B sandbox?

Yes. E2B sandboxes have internet access by default, so you can run pip install or npm install inside a run_code call. For repeated use, create a snapshot after installation so you don't pay the install time on every run.

Q: How does E2B compare to running code in a subprocess locally?

Local subprocess execution is not sandboxed: model-generated code can read environment variables, make network requests, access the filesystem, and potentially execute system commands. E2B runs code in an isolated microVM with its own kernel — even a kernel exploit can't reach your host. For production agents handling external user input, local subprocess execution is not a safe option.

Q: What languages does E2B support?

The run_code API supports Python and JavaScript by default (the language parameter selects between them). R, Bash, and other languages can be run via subprocess inside the Python environment. Custom language support is possible with custom sandbox templates.

Q: Is there an open-source self-hosted option?

E2B's core SDK is Apache 2.0 open source. The cloud infrastructure (Firecracker orchestration, API, billing) is not open-source. For self-hosted deployments at enterprise scale, contact E2B for the Enterprise plan. For local development, you can mock the sandbox execution locally, but production workloads with untrusted code should use the cloud service or equivalent hardware-level isolation.

Key Takeaways

E2B solves a real production problem that every code-executing agent eventually faces: where does LLM-generated code run safely? The answer isn't "in a container on your server" when the code author is an AI responding to arbitrary user prompts.

The SDK design reflects this clearly. run_code supports streaming callbacks for real-time output, code contexts for multi-tenant isolation, and snapshots for warm restarts. MCP server support means any modern agent framework can connect without custom glue code. The OpenAI Agents SDK integration in April 2026 confirmed E2B's position as the default sandbox provider for production Python agents.

Bottom Line

If your agent executes LLM-generated code, E2B is the right default for 2026: purpose-built for AI workloads, Firecracker microVM isolation, 150ms cold starts, and native support in OpenAI's Agents SDK. The free $100 credit is enough to fully evaluate it before committing to the $150/month Pro plan.

DEV Community