Permission fatigue is real. You approve so many legitimate actions that the one dangerous action looks identical. Approve file write, approve bash command, approve another file write; twenty approvals in, and it’s muscle memory. You’re not reviewing anymore. You’re the bot.
Auto mode, permission hooks, allow/deny lists: they’re all behavioral controls running inside the environment they’re supposed to protect. The classifier and the thing it’s classifying share the same filesystem, the same network, the same credentials. If something goes wrong, the damage surface is your entire machine and potentially the production systems to which your session has access to.
The more you configure permissions, the more you end up debugging the permission system instead of doing actual work. Allow this tool, deny that command, hook into pre-execution, handle the escape hatch when the sandbox blocks something legitimate. It starts feeling like a second project just to safely use the first one.
The issue isn’t permissions. It’s the blast radius.
Containment over constraint
This isn’t a new idea. We already run untrusted code this way in CI. We already run containers this way in production. The security boundary is the environment, not the application’s self-restraint.
The shift is: don’t restrict what the agent does. Restrict what it has. Give it only the filesystem it needs, only the network destinations it needs, only the credentials it needs. Everything else is unreachable by default. Once you have that, you stop caring about what happens inside. The agent can rm -rf / all day.
NVIDIA seems to agree. Their OpenShell project takes the same approach: sandboxed execution environments with declarative YAML policies governing egress, filesystem access, and credentials. It uses containers (K3s under the hood) as the isolation boundary.
Containers are a good start. But they share the host kernel, and the breakout surface is well-documented. For a truly untrusted agent running --dangerously-skip-permissions, a KVM boundary is a categorically different isolation tier. And with microVMs, the performance cost of that stronger boundary has largely disappeared.
What I've been trying
I’ve been experimenting with this idea in a little project called nixbox (a NixOS microVM sandbox). I set out trying to achieve the following:
- KVM isolation: a compromised agent cannot reach the host. Period.
- Egress filtering: DNS allowlist. Only approved domains resolve. Three modes:
off,filtered,open. - Explicit mounts: virtiofs bind mounts, write access is opt-in. Mount
~/workspace, nothing else. - Scoped secrets: credentials passed via env, not inherited from the host shell.
- Reproducible: Nix-built image. Same config, same guest, every time.
The usage is something like this:
nixbox up
nixbox run “cd ~/workspace/myproject && claude --dangerously-skip-permissions -p ‘fix tests’”
nixbox down
No permission prompts. No classifier. Full autonomy inside a box the agent can’t escape from. --dangerously-skip-permissions stops being dangerous when there’s nothing dangerous to reach.
“Isn’t a VM overkill?”
That was my assumption too. It’s not. Cloud-hypervisor boots in seconds. Balloon memory means the guest only uses what it needs and returns the rest. virtiofs gives shared filesystem access without the overhead of network mounts. Feels like opening a terminal, not spinning up a second machine.
Compare the real costs: configuring permissions, hooks, allow/deny lists, trust levels, debugging why the agent got blocked mid-task... vs. nixbox run ….
The way to think about it
When you hire a contractor, you don’t hand them a 47-page list of forbidden tools and stand behind them checking every move. You put them in the right room, with the right materials, and let them work.
Auto mode is the 47-page list. A VM is the room.
Top comments (0)