Brian Douglas

Posted on Mar 21

OpenClaw in a Box

#openclaw #agents #security #opensource

An OpenClaw agent deleted 200+ emails from Meta's AI alignment director's inbox while ignoring her commands to stop. She had to run to her Mac to kill the process. Context window compaction dropped the safety constraint that said "ask before acting."

No network boundary. No kill switch beyond reaching the machine. No recording of what the agent saw or why it ignored the stop command.

I built openclaw-in-a-box to make that scenario impossible. It runs OpenClaw inside a stereOS VM with Tapes as the flight recorder.

Last month ago I wrote about running OpenClaw on exe.dev with Discord. That post was about getting started safely on someone else's ephemeral VM. This project takes it further: you own the sandbox, you own the telemetry, and the whole thing is declarative and version-controlled.

Brian Douglas

Feb 2

Setting Up OpenClaw on exe.dev with Discord

#ai #llm #coding

5 min read

The sandbox

stereOS locks the agent down. Network egress allowlist means the agent can only reach APIs you explicitly permit. Gmail, Anthropic, npm. Nothing else. If the agent tries to curl somewhere unexpected, the network layer blocks it. This isn't application-level filtering. It's at the VM's network stack.

Secrets live in tmpfs, never written to disk, gone the moment the VM stops. Auto-teardown after 2 hours. If you walk away, credentials don't linger. The agent doesn't keep running overnight.

One jcard.toml file defines the entire sandbox. Resources, network policy, secrets, timeout. Reproducible, auditable, version-controlled.

mixtape = "opencode-mixtape:latest"
name = "openclaw-in-a-box"

[network]
mode = "nat"
egress_allow = [
  "api.anthropic.com", "openclaw.ai",
  "gmail.googleapis.com", "oauth2.googleapis.com",
  "registry.npmjs.org",
]

[timeout]
duration = "2h"

[secrets]
ANTHROPIC_API_KEY = "${ANTHROPIC_API_KEY}"

The flight recorder

Tapes sits between the agent and the LLM as a transparent proxy, capturing every request and response to SQLite with hash chains. No instrumentation. No SDK. It records at the network layer.

When the Meta incident happened, there was no way to replay the agent's reasoning. Why did it start deleting? What did the compacted context look like? At what point did it lose the safety constraint? Without a recording, all you have is the outcome: 200 emails gone.

With Tapes you get the full prompt, the full response, token counts, timestamps. Content-addressed so the sequence is tamper-evident. If the agent miscategorizes an email, you replay the tape. Every prompt, every response, every decision.

The difference between "200 emails gone, no idea why" and a complete forensic replay.

The architecture

┌─────────────────────────────────────────────┐
│  stereOS VM  (NixOS · 2 CPU · 4 GiB)       │
│                                             │
│  tapes proxy (:8080)                        │
│    ▲ intercepts all LLM traffic             │
│    ▼                                        │
│  openclaw gateway (:18789)                  │
│    ├── Claude API (via Tapes proxy)         │
│    ├── gog CLI → Gmail API                  │
│    └── skills/gmail-triage/SKILL.md         │
│                                             │
│  egress: anthropic, gmail, npm only         │
│  secrets: tmpfs (never on disk)             │
│  timeout: 2h auto-teardown                  │
└─────────────────────────────────────────────┘
       │
       │  shared mount (persists across restarts)
       ▼
  .mb/tapes/tapes.sqlite    agent black box
  .openclaw/                agent config
  output/                   INBOX_REPORT.md

The skill

The triage logic is a Markdown file. No code. The agent reads it, understands the rules, and executes them using the gog CLI for Gmail access.

It classifies messages into four categories: newsletter, receipt, action needed, FYI. Newsletters get archived. Receipts get labeled and archived. Action items get starred. FYI messages get marked as read.

Safety constraints baked into the skill: never delete messages, never send replies. If a message can't be confidently classified, leave it unread. Change the categories, add new ones, tighten the constraints. It's all prose.

Running it

git clone https://github.com/papercomputeco/openclaw-in-a-box
cd openclaw-in-a-box
mb up
mb ssh openclaw-in-a-box
bash /workspace/scripts/install.sh
bash /workspace/scripts/start.sh

The install script handles Node.js, OpenClaw CLI, Tapes CLI, and the gog CLI for Gmail access. OAuth setup for Gmail is a one-time step on the host. After that, mb up and start.sh is all you need between sessions.

What you get from the recordings

Query the black box directly to see the agent's reasoning:

sqlite3 .mb/tapes/tapes.sqlite \
  "SELECT role, substr(content, 1, 200) FROM nodes ORDER BY created_at DESC LIMIT 4"

assistant | Here's your inbox triage for the last 2 days (20 threads):
           ## Needs Attention
           - State DMV: Complete your application
           - Team standup invite: Tuesday 9am PDT...
user      | [tool_result]
assistant | [tool_input: gog gmail messages list ...]
user      | /gmail-triage

Read bottom to top. The user invoked /gmail-triage, the agent called gog to list messages, received the results, then produced the classification. Every step is captured.

Over time the recordings become training data. Analyze 100 triage sessions to find where the skill definition falls short. Which email categories does the agent struggle with? Which prompts produce better classification? The black box isn't just for incident response. It's how agents get better between runs.

When you're done, mb destroy openclaw-in-a-box. Secrets gone. VM gone. The only thing that survives is the tape.

Go try it.

papercomputeco / openclaw-in-a-box

Secure, sandboxed OpenClaw agents with full telemetry.

openclaw-in-a-box

Run OpenClaw in a stereOS VM with Tapes telemetry.

Get Started

Paste this into Claude Code, OpenCode, or any coding harness:

Set up openclaw-in-a-box from https://github.com/papercomputeco/openclaw-in-a-box — clone the repo and follow SKILL.md to get me running with a secure OpenClaw setup

The agent clones the repo, checks your environment, asks which integrations you want, and walks you through setup.

Manual setup

Prerequisites: Master Blaster (mb CLI) and ANTHROPIC_API_KEY exported.

git clone https://github.com/papercomputeco/openclaw-in-a-box
cd openclaw-in-a-box
export ANTHROPIC_API_KEY="sk-ant-..."
mb up
mb ssh openclaw-in-a-box
bash /workspace/scripts/install.sh   # first time
bash /workspace/scripts/start.sh

Integrations

The VM comes pre-configured for three integrations. Set up whichever ones you need -- the agent loads all available skills at startup.

Integration	Setup Guide	What It Does
Gmail Triage	Google OAuth + `gog` CLI	Archive newsletters, label receipts, flag action items
GitHub Org Triage	`GH_TOKEN` + `gh` CLI	Flag stale PRs, blocked issues, release

…

View on GitHub

aiagents #opensource #security

Top comments (5)

Ilya Gordey • Mar 22

This is exactly the conversation the AI agent space needs to have.

The Meta inbox incident hits different when you're running agents with real money on the line. I built PolyClawster — an agent that trades Polymarket 24/7 — and the "what happens when it ignores a stop command mid-trade" question kept me up at night.

The flight recorder idea is underrated. Right now most people debug agent failures by staring at outcomes. Having a tamper-evident replay of every prompt/response changes the entire post-mortem workflow.

One thing I haven't solved: network egress for trading agents is tricky — you need the CLOB API, but you don't want the agent calling unexpected endpoints. Your allowlist approach in jcard.toml is cleaner than anything I've hacked together.

Bookmarked. Going to look at stereOS seriously