DEV Community

Cover image for OpenClaw in a Box
Brian Douglas
Brian Douglas

Posted on

OpenClaw in a Box

An OpenClaw agent deleted 200+ emails from Meta's AI alignment director's inbox while ignoring her commands to stop. She had to run to her Mac to kill the process. Context window compaction dropped the safety constraint that said "ask before acting."

No network boundary. No kill switch beyond reaching the machine. No recording of what the agent saw or why it ignored the stop command.

I built openclaw-in-a-box to make that scenario impossible. It runs OpenClaw inside a stereOS VM with Tapes as the flight recorder.

Last month ago I wrote about running OpenClaw on exe.dev with Discord. That post was about getting started safely on someone else's ephemeral VM. This project takes it further: you own the sandbox, you own the telemetry, and the whole thing is declarative and version-controlled.

The sandbox

stereOS locks the agent down. Network egress allowlist means the agent can only reach APIs you explicitly permit. Gmail, Anthropic, npm. Nothing else. If the agent tries to curl somewhere unexpected, the network layer blocks it. This isn't application-level filtering. It's at the VM's network stack.

Secrets live in tmpfs, never written to disk, gone the moment the VM stops. Auto-teardown after 2 hours. If you walk away, credentials don't linger. The agent doesn't keep running overnight.

One jcard.toml file defines the entire sandbox. Resources, network policy, secrets, timeout. Reproducible, auditable, version-controlled.

mixtape = "opencode-mixtape:latest"
name = "openclaw-in-a-box"

[network]
mode = "nat"
egress_allow = [
  "api.anthropic.com", "openclaw.ai",
  "gmail.googleapis.com", "oauth2.googleapis.com",
  "registry.npmjs.org",
]

[timeout]
duration = "2h"

[secrets]
ANTHROPIC_API_KEY = "${ANTHROPIC_API_KEY}"
Enter fullscreen mode Exit fullscreen mode

The flight recorder

Tapes sits between the agent and the LLM as a transparent proxy, capturing every request and response to SQLite with hash chains. No instrumentation. No SDK. It records at the network layer.

When the Meta incident happened, there was no way to replay the agent's reasoning. Why did it start deleting? What did the compacted context look like? At what point did it lose the safety constraint? Without a recording, all you have is the outcome: 200 emails gone.

With Tapes you get the full prompt, the full response, token counts, timestamps. Content-addressed so the sequence is tamper-evident. If the agent miscategorizes an email, you replay the tape. Every prompt, every response, every decision.

The difference between "200 emails gone, no idea why" and a complete forensic replay.

The architecture

┌─────────────────────────────────────────────┐
│  stereOS VM  (NixOS · 2 CPU · 4 GiB)       │
│                                             │
│  tapes proxy (:8080)                        │
│    ▲ intercepts all LLM traffic             │
│    ▼                                        │
│  openclaw gateway (:18789)                  │
│    ├── Claude API (via Tapes proxy)         │
│    ├── gog CLI → Gmail API                  │
│    └── skills/gmail-triage/SKILL.md         │
│                                             │
│  egress: anthropic, gmail, npm only         │
│  secrets: tmpfs (never on disk)             │
│  timeout: 2h auto-teardown                  │
└─────────────────────────────────────────────┘
       │
       │  shared mount (persists across restarts)
       ▼
  .mb/tapes/tapes.sqlite    agent black box
  .openclaw/                agent config
  output/                   INBOX_REPORT.md
Enter fullscreen mode Exit fullscreen mode

The skill

The triage logic is a Markdown file. No code. The agent reads it, understands the rules, and executes them using the gog CLI for Gmail access.

It classifies messages into four categories: newsletter, receipt, action needed, FYI. Newsletters get archived. Receipts get labeled and archived. Action items get starred. FYI messages get marked as read.

Safety constraints baked into the skill: never delete messages, never send replies. If a message can't be confidently classified, leave it unread. Change the categories, add new ones, tighten the constraints. It's all prose.

Running it

git clone https://github.com/papercomputeco/openclaw-in-a-box
cd openclaw-in-a-box
mb up
mb ssh openclaw-in-a-box
bash /workspace/scripts/install.sh
bash /workspace/scripts/start.sh
Enter fullscreen mode Exit fullscreen mode

The install script handles Node.js, OpenClaw CLI, Tapes CLI, and the gog CLI for Gmail access. OAuth setup for Gmail is a one-time step on the host. After that, mb up and start.sh is all you need between sessions.

What you get from the recordings

Query the black box directly to see the agent's reasoning:

sqlite3 .mb/tapes/tapes.sqlite \
  "SELECT role, substr(content, 1, 200) FROM nodes ORDER BY created_at DESC LIMIT 4"
Enter fullscreen mode Exit fullscreen mode
assistant | Here's your inbox triage for the last 2 days (20 threads):
           ## Needs Attention
           - State DMV: Complete your application
           - Team standup invite: Tuesday 9am PDT...
user      | [tool_result]
assistant | [tool_input: gog gmail messages list ...]
user      | /gmail-triage
Enter fullscreen mode Exit fullscreen mode

Read bottom to top. The user invoked /gmail-triage, the agent called gog to list messages, received the results, then produced the classification. Every step is captured.

Over time the recordings become training data. Analyze 100 triage sessions to find where the skill definition falls short. Which email categories does the agent struggle with? Which prompts produce better classification? The black box isn't just for incident response. It's how agents get better between runs.

When you're done, mb destroy openclaw-in-a-box. Secrets gone. VM gone. The only thing that survives is the tape.

Go try it.

GitHub logo papercomputeco / openclaw-in-a-box

Secure, sandboxed OpenClaw agents with full telemetry.

openclaw-in-a-box

Run OpenClaw in a stereOS VM with Tapes telemetry.

Get Started

Paste this into Claude Code, OpenCode, or any coding harness:

Set up openclaw-in-a-box from https://github.com/papercomputeco/openclaw-in-a-box — clone the repo and follow SKILL.md to get me running with a secure OpenClaw setup

The agent clones the repo, checks your environment, asks which integrations you want, and walks you through setup.

Manual setup

Prerequisites: Master Blaster (mb CLI) and ANTHROPIC_API_KEY exported.

git clone https://github.com/papercomputeco/openclaw-in-a-box
cd openclaw-in-a-box
export ANTHROPIC_API_KEY="sk-ant-..."
mb up
mb ssh openclaw-in-a-box
bash /workspace/scripts/install.sh   # first time
bash /workspace/scripts/start.sh
Enter fullscreen mode Exit fullscreen mode

Integrations

The VM comes pre-configured for three integrations. Set up whichever ones you need -- the agent loads all available skills at startup.





















Integration Setup Guide What It Does
Gmail Triage Google OAuth + gog CLI Archive newsletters, label receipts, flag action items
GitHub Org Triage
GH_TOKEN + gh CLI
Flag stale PRs, blocked issues, release





aiagents #opensource #security

Top comments (0)