Running untrusted AI-generated code safely is the obvious hard problem.
But sometimes the problems that break an agent workflow look like boring infrastructure work.
v0.6 began as plumbing:
- Persistent sandbox registry
- Automatic cleanup with TTL
Necessary, but not particularly glamorous.
Then the tests started failing.
The output corruption problem
Every execution returned something like this:
WARNING: Running pip as the 'root' user can result in broken permissions...
[notice] A new release of pip is available: 25.0.1 -> 26.1.2
hello world
The actual program output was buried under dependency-installation noise.
For a human reading a terminal, that is annoying.
For an AI agent parsing execution output, it is broken.
The cause was straightforward: dependency installation and code execution were chained into a single Docker call, with stderr redirected into stdout.
Everything ended up in the same stream.
The fix: two Docker calls, not one
We separated the operations.
Call 1: Install dependencies silently.
subprocess.run(
[...dependency_install_command],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
Call 2: Execute the user command and capture its output.
result = subprocess.run(
[...execution_command],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
return result.stdout
It is a small change, but the principle matters:
When infrastructure is built for AI agents, clean output is part of the API contract.
Agents parse what you return. Installation logs, warnings and runtime output cannot be treated as one undifferentiated stream.
Persistence: SQLite instead of an in-memory dictionary
The original sandbox registry was a Python dictionary.
Restart the service, and every sandbox record disappeared.
The containers might still exist, but Jhansi no longer knew about them. Any agent workflow expecting to reconnect after a service restart would fail.
We considered:
- JSON: simple, but vulnerable to partial writes and corruption during crashes
- Redis: native TTL and a good operational model, but another service for self-hosters to run
- SQLite: durable, transactional and already included with Python
We chose SQLite.
The schema is intentionally small:
CREATE TABLE IF NOT EXISTS sandboxes (
id TEXT PRIMARY KEY,
language TEXT NOT NULL,
container_id TEXT,
workspace_path TEXT,
status TEXT NOT NULL,
created_at TEXT NOT NULL,
expires_at TEXT NOT NULL
);
No ORM.
No migration framework.
Just SQLite doing what SQLite is good at.
TTL: last active, not creation time
Each sandbox receives an expires_at value, initially one hour after creation.
The important decision is that every execution resets the clock:
new_expires = (
datetime.now(timezone.utc)
+ timedelta(seconds=TTL_SECONDS)
)
registry.update_expires_at(
sandbox_id,
new_expires,
)
A background task runs every 60 seconds and removes expired sandboxes.
This makes the TTL activity-based rather than age-based.
An agent may perform dozens of small executions during a 20-minute analysis. A creation-time TTL can terminate the sandbox in the middle of an active workflow.
A last-active TTL does not.
Active sandboxes remain available. Only idle ones are cleaned up.
What this unlocks
With persistence and activity-based TTL, Jhansi sandboxes are becoming reliable execution primitives:
Create a sandbox once.
Use it repeatedly.
Survive service restarts.
Trust that active work will not disappear underneath the agent.
That is the foundation longer-running agent workflows need.
Next in v0.7: streaming execution through Server-Sent Events.
No more waiting for the entire command to finish before seeing its output.
Jhansi is an open-source cloud sandbox for running AI-generated code safely.
Self-host it with:
docker compose up
AI agents need execution, not credentials.
Star it if this problem resonates: https://github.com/jhansi-io/petri
Top comments (0)