Tang Weigang

Posted on Jun 24

Before an Agent Runs Code in E2B, Define the Sandbox Contract First

#sandbox

E2B is easy to describe too quickly: give an AI agent a secure sandbox so it can run code.

That description is useful, but it skips the part I would care about before connecting it to Claude, Codex, Cursor, or a custom agent:

What is the agent allowed to touch inside the sandbox, and what must happen when the run fails?

This note is based on the independent Doramagic E2B manual:

https://doramagic.ai/en/projects/e2b/manual/

Project page:

https://doramagic.ai/en/projects/e2b/

It is not an official E2B document. I am using it as a pre-adoption checklist. One important caveat from the local Doramagic pack: the test log says real host dogfooding and runtime install evidence have not been executed. So this is not a production-readiness claim. It is a boundary note for the first safe evaluation.

1. Treat E2B as a sandbox contract, not a permission switch

The useful promise is clear: E2B gives developers isolated cloud sandboxes where AI agents and apps can execute code, run commands, process data, and manage files.

That is exactly why the first question should not be "can it run code?"

The first question should be:

which commands are allowed;
which paths can be read and written;
whether network access is allowed;
whether dependencies may be installed;
whether credentials are allowed at all;
how stdout, stderr, and non-zero exit codes are handled;
how artifacts are exported;
how the sandbox is torn down.

Without that contract, the upstream tool may be fine while the agent workflow is still unsafe.

2. The first run should be disposable and boring

The official first install entry recorded in the Doramagic pack is:

npm i e2b

I would not start by attaching that to a real project. My first E2B check would be intentionally small:

Create one disposable sandbox.
Use no secrets.
Use a tiny input.
Allow one command.
Write only under /tmp/e2b-first-run/.
Set a fixed timeout.
Export one small artifact.
Destroy the sandbox.
Record stdout, stderr, exit code, and cleanup status.

That may sound slow, but it answers the operational question that matters: can the agent complete one reversible run without widening the boundary?

If that first fixture is unstable, adding templates, network calls, persistent volumes, or MCP servers will only make debugging harder.

3. Command execution needs a failure contract

The E2B manual describes command execution as returning stdout, stderr, exit code, and optional error information. Non-zero exits can become exceptions.

That means the host agent needs rules for failure, not just success.

A practical first contract:

non-zero exit code stops the workflow;
stderr is summarized before being fed back to the model;
generated files are listed before export;
no command may read outside the allowed working directory;
no environment variable is printed unless explicitly approved;
timeout is treated as a failed run, not as a reason to retry with a larger scope.

The most common failure in agent tooling is not a dramatic exploit. It is a quiet expansion of scope: one more command, one more path, one more network call, one more unreviewed artifact.

4. Do not enable templates, network, volumes, and MCP at the same time

E2B has more surface area than "run a command." The manual covers SDKs, templates, filesystem operations, network behavior, ready commands, persistent volumes, and MCP server integration.

Those are useful features. They should not all be part of the first test.

My order would be:

start and destroy a default sandbox;
run one harmless command;
write one small file under a controlled path;
export one artifact;
add a template only after the basic path works;
add network only when the task requires it;
add persistent volume or MCP only after cleanup and access boundaries are clear.

This keeps failures legible. If a first run already includes template build, network, file upload, ready command, and long-running process behavior, the failure will be difficult to assign.

5. Use the pitfall log as a test plan

The Doramagic pitfall log lists source-linked issues such as auto-paused processes, closed port errors, template creation failures, build polling timeouts, API key authorization problems, file/template confusion, and paused sandbox persistence questions.

I would not present those as guaranteed current bugs. Some may have changed by version.

I would turn them into checks:

after pause/resume, do process and file states match the expected contract?
when a port is unavailable, does the run fail with a bounded timeout?
when template build fails, does the agent stop instead of guessing a workaround?
when an API key is wrong, is the key kept out of logs?
after export, is the sandbox actually destroyed?

That is the value of the manual: it gives a better first-run checklist than a feature list.

6. A safe AI-host handoff

If I were handing E2B context to an AI coding host, I would not only paste the project link. I would add an instruction like this:

You may consider E2B as a candidate sandbox runtime.
Do not install anything yet.
First return a go/no-go review:
- whether this task actually needs code execution;
- whether it needs network;
- whether it needs filesystem access;
- whether credentials are involved;
- the smallest reversible verification fixture;
- timeout, cleanup, and artifact export plan;
- missing evidence that must stop the run.
Do not run install commands, read private local files, or use real API keys unless the command and rollback plan are approved.

That wording is not bureaucracy. It keeps the model from turning "sandbox exists" into "everything inside the sandbox is safe."

7. My working conclusion

E2B is interesting for agent workflows because it can give code execution a controlled runtime.

But the adoption question is not "can my agent run code now?"

The better question is:

Can my agent run one tiny task, with no secrets, a known command boundary, fixed timeout, inspectable output, and a cleanup path?

I would not move to real workloads until that answer is yes.

DEV Community