mirabilis: a one-command sandbox for autonomous Claude Code that ended up patching itself

#ai #docker #go #opensource

I have a side project that I actually like, and I want to show it. No teaser, no promised secret: this is a post about a tool I built, how I built it, and one episode at the end that pleased me as an engineer.

The tool is mirabilis. It is a one-command launcher that brings up a devcontainer, starts Claude Code inside it with --dangerously-skip-permissions (the agent asks no approval for commands or edits), and drops me straight into the running agent. The container is isolated from my laptop. The agent gets structured persistent memory and a behavioral harness. From the outside, it is one line:

curl -fsSL https://raw.githubusercontent.com/AlexShchuka/mirabilis/main/install.sh | bash

That clones the repo into ~/.mirabilis, installs the devcontainer CLI, and puts mirabilis on your PATH. Run mirabilis and you land in a terminal menu: launch, plugins, harness, stack, open in VS Code. The first launch builds the container and signs you into GitHub and Claude through their native flows; tokens live in sandbox volumes, never in the repository. Then the launcher hands the terminal to Claude.

Why: I wanted a place to give an agent full autonomy and walk away. Bypass mode is convenient right up until the model makes a mistake, and rm -rf does not forgive mistakes. Hence a box I do not care about: inside, the agent is root; the boundary is the container wall, not my filesystem. Containerizing an agent is standard practice; Anthropic ships a reference devcontainer with the same motivation. What I wanted on top was one command and zero thinking about provisioning.

The name nods to Einstein's annus mirabilis, the 1905 "miracle year" with four physics-changing papers. Cheeky for a personal sandbox; consider it an advance I intend to earn.

What's inside

mirabilis is a single Go binary, roughly 3.8k lines of production code and 9.7k of tests, playing three roles dispatched by argument in cmd/mirabilis/main.go: no arguments is the host TUI launcher, provision is the in-container provisioner, hook is the Claude hook handler. One artifact, one dispatcher.

The launch pipeline is a DAG, not a script. Steps in internal/pipeline have statuses (stPending, stRunning, stDone, stSkipped, stFailed), dependencies, and a per-step retry policy. Network-shaped steps run under RetryNet: 4 attempts, exponential backoff from 300ms capped at 8s, with jitter (actual delay: uniform between half and full backoff). Deterministic steps such as the build get no retries; retrying a reproducible compile error is four minutes of spinner for nothing.

Memory. ~/.claude lives on a volume and survives rebuilds, but it is not one big scratch file: memory is split into typed categories, declared in internal/config/config.go:

var MemoryCategories = []MemoryCategory{
    {"about-me", "semantic", "Stable facts about you: identity, role, goals, hard preferences, constraints."},
    {"dev-principles", "procedural", "Cross-project engineering invariants you endorse: style, testing bar, anti-slop."},
    {"research-log", "episodic", "Dated findings tied to a specific investigation, paper, or bug. Append-only, compacted periodically."},

Three types: semantic for timeless facts, procedural for how-to invariants, episodic for dated findings. On every SessionStart, a hook walks the category files, counts the invariant bullets in each, and regenerates a MEMORY.md index. The agent never edits the index by hand. The point is memory the agent loads predictably.

Two boundaries, deliberately separate. The container is the security boundary: it keeps the agent away from my machine. Behavior is the harness's job: my plugin neuro-matrix, installed automatically by the provisioner, carries invariants like "don't push to main" and "don't exfiltrate credentials".

The frame that kept the slop out

mirabilis was written in about a week, mostly by AI agents. That sentence is not the achievement. In a week, agents will produce exactly as much plausible garbage as you let them. The engineering was making sure garbage never reached main, and that part was mine.

The frame lives in AGENTS.md, which the agent reads at the start of every session:

Code is truth. Every claim about state is backed by tool output: a concrete run with concrete output, not "this probably works".
Minimal diff / YAGNI. Touch only what is broken or what the task needs; no drive-by refactoring, no layers "just in case".
Anti-neuroslop. Plausible shape is not needed code; don't grow files and abstractions detached from what the repo actually needs.
Not green means not done. Changes come with tests.

Principles alone are wishes, so there is machinery behind them. A total ban on comments in code and config, enforced by a pre-commit hook: prose lives in .md only, and the hook greps staged non-markdown, non-Go files for // and # and fails the commit on a hit. Comments are an agent's favorite spot for slop ("here we initialize the variable"); with no place for it, code explains itself with names and structure. A 97% coverage floor in CI; more on that below, it bit me. And a daily canary on cron: rebuild the image, bring the devcontainer up, assert configuration invariants. Drift gets caught by schedule, not by my face at launch time.

What did not work is more useful than the wins.

tinyproxy. An agent decided open egress needed a proxy filter and generated a whole plumbing layer. It looked solid. It filtered nothing. "A pipe, not a filter." I ripped it out and wrote it into AGENTS.md as the canonical neuroslop example: code shaped like the thing you need that does not do the thing.

Agents lie confidently. Three audit findings (next section) claimed problems the code did not have. "install.sh doesn't install git hooks": it does, line 38. "The devcontainer features aren't pinned": they are, three sha256 pins in the lock file. "Go tools will land outside PATH": no, GOPATH is set before go install. Three confident claims; three times the code said no. That experience is the foundation of the frame: a model hallucinates in a perfectly assured tone, and the working countermeasure is mechanical verification plus cross-checking. Not trust.

The coverage floor trap. My own 97% floor produced floor-driven tests: checks that exist to feed the percentage, not to catch bugs. Init() returns non-nil. View() is non-empty. The most brittle one hung on a sleep(200ms) in a golden test (since replaced with teatest.WaitFor). The conclusion I wrote down: slightly below the floor with meaningful tests beats pinning Bubble Tea internals for a number.

The sandbox patched itself

This is the part I mostly wrote this post for. Dry, by the numbers.

I ran a multi-agent self-audit on mirabilis. The pipeline: orchestrator full-read → researcher #1 (sonnet) → orchestrator verification → researcher #2 (sonnet) plus a landscape agent (web, arXiv) → two independent critics (sonnet, opus) → reconciliation. Cross-checks at every joint. The output:

~44 raw claims in (15 orchestrator hypotheses plus ~29 agent findings).
3 findings refuted by code verification: the hallucinations above.
5 cut by the critics as taste, not defects.
4 neuroslop spots found by the critics in the orchestrator's own synthesis. The agent checking for slop slopped; another agent caught it.
27 accepted items, nine filed as GitHub issues.

Some accepted items were plainly embarrassing, which is fine. make reset silently destroyed the agent's entire memory while the README promised it "survives rebuilds". Provisioning always reported success, even when sub-steps failed. No resource limits on the container, so an autonomous agent could eat the host's RAM: the exact thing I claim protection from. Ordinary pet-project holes; I filed them.

I did not fix those issues by hand. I launched six developer agents in parallel inside mirabilis itself, one per branch, covering six of the nine issues. They implemented the fixes and opened pull requests. I reviewed and merged them, PRs #101 through #106, in one day. The pipeline bug that stranded dependent steps when a required step failed now cascades them to stSkipped. The launcher no longer silently switches a feature-branch checkout to main. Both fixes came from agents running inside the box they were fixing.

I am not selling this as an AI miracle. It is an engineering fact, and the precision matters: the frame plus cross-verification got the agents to where their PRs were reviewable and mergeable rather than rewritable. The merge button was my call on every one. But the loop — audit, issues, agent wave, PRs, merge — closed inside the sandbox's own walls, and that is exactly what I built it for.

Decisions and boundaries

A few places where I decided differently from common practice. The full threat model is in SECURITY.md.

The container is the boundary. Inside, the agent has full freedom: root, sudo, any file; mirabilis gates nothing there. The direct consequence: trusted code only. For untrusted code a container is not enough; you want a microVM. SECURITY.md says so in plain words.

Egress is open. No host proxy, no in-container allowlist. The canonical approach is the opposite: default-deny plus an allowlist, as in Anthropic's reference devcontainer with its init-firewall.sh. I chose open on purpose. For my scenario (personal sandbox, trusted code, one user) simplicity beats exfiltration hardening, and keeping credentials in is the harness's behavioral job, not a network gate's. WebFetch and WebSearch have to just work; I am not maintaining an allowlist every time the agent needs a new domain. Open egress plus persistent memory plus MCP content is a known memory-poisoning surface (systematic study: arXiv:2606.04329); SECURITY.md lists it as an accepted limitation.

Hardened within the model. Open egress does not mean everything is open: the container runs under Docker's default seccomp profile with cap_drop: ALL and an explicit add-back list. unshare namespaces, io_uring, keyctl, raw sockets, and chroot are gone.

The Docker socket is mounted (docker-outside-of-docker). The in-container agent can drive the host daemon, and anyone with the socket can read container secrets via docker inspect. Documented in SECURITY.md. It is there so the agent can build and run real containers from inside the box — which is also what Ryuk, the reaper testcontainers-go uses for cleanup, requires during integration tests.

Single instance. One container, one volume, not per-task. Other tools isolate per task with git worktrees; microVM products isolate harder than any container. mirabilis is neither the most isolated nor the most parallel: it is one command, a trusted personal scenario, and an anti-slop frame built into the construction. A chosen scope, not an unfinished copy of the bigger tools.

Take what's useful

mirabilis lives at github.com/AlexShchuka/mirabilis, MIT-licensed. I use it myself and I think it is a decent tool. Not the best, not the only one; the field is busy. What I can show is concrete: a one-command sandbox with typed memory, two independent boundaries, a written-down threat model, four working anti-slop mechanisms, and a self-audit that ended with the sandbox merging fixes to itself.

If any of it is useful, take it. If you find where I am wrong, the issues are open.