Razvan

Posted on Mar 25

Stop babysitting your AI agent!

#ai #security #programming #linux

Permission fatigue is real. You approve so many legitimate actions that the one dangerous action looks identical. Approve file write, approve bash command, approve another file write; twenty approvals in, and it’s muscle memory. You’re not reviewing anymore. You’re the bot.

Auto mode, permission hooks, allow/deny lists: they’re all behavioral controls running inside the environment they’re supposed to protect. The classifier and the thing it’s classifying share the same filesystem, the same network, the same credentials. If something goes wrong, the damage surface is your entire machine and potentially the production systems to which your session has access to.

The more you configure permissions, the more you end up debugging the permission system instead of doing actual work. Allow this tool, deny that command, hook into pre-execution, handle the escape hatch when the sandbox blocks something legitimate. It starts feeling like a second project just to safely use the first one.

The issue isn’t permissions. It’s the blast radius.

Containment over constraint

This isn’t a new idea. We already run untrusted code this way in CI. We already run containers this way in production. The security boundary is the environment, not the application’s self-restraint.

The shift is: don’t restrict what the agent does. Restrict what it has. Give it only the filesystem it needs, only the network destinations it needs, only the credentials it needs. Everything else is unreachable by default. Once you have that, you stop caring about what happens inside. The agent can rm -rf / all day.

NVIDIA seems to agree. Their OpenShell project takes the same approach: sandboxed execution environments with declarative YAML policies governing egress, filesystem access, and credentials. It uses containers (K3s under the hood) as the isolation boundary.

Containers are a good start. But they share the host kernel, and the breakout surface is well-documented. For a truly untrusted agent running --dangerously-skip-permissions, a KVM boundary is a categorically different isolation tier. And with microVMs, the performance cost of that stronger boundary has largely disappeared.

What I've been trying

I’ve been experimenting with this idea in a little project called nixbox (a NixOS microVM sandbox). I set out trying to achieve the following:

KVM isolation: a compromised agent cannot reach the host. Period.
Egress filtering: DNS allowlist. Only approved domains resolve. Three modes: off, filtered, open.
Explicit mounts: virtiofs bind mounts, write access is opt-in. Mount ~/workspace, nothing else.
Scoped secrets: credentials passed via env, not inherited from the host shell.
Reproducible: Nix-built image. Same config, same guest, every time.

The usage is something like this:

nixbox up
nixbox run “cd ~/workspace/myproject && claude --dangerously-skip-permissions -p ‘fix tests’”
nixbox down

No permission prompts. No classifier. Full autonomy inside a box the agent can’t escape from. --dangerously-skip-permissions stops being dangerous when there’s nothing dangerous to reach.

“Isn’t a VM overkill?”

That was my assumption too. It’s not. Cloud-hypervisor boots in seconds. Balloon memory means the guest only uses what it needs and returns the rest. virtiofs gives shared filesystem access without the overhead of network mounts. Feels like opening a terminal, not spinning up a second machine.

Compare the real costs: configuring permissions, hooks, allow/deny lists, trust levels, debugging why the agent got blocked mid-task... vs. nixbox run ….

The way to think about it

When you hire a contractor, you don’t hand them a 47-page list of forbidden tools and stand behind them checking every move. You put them in the right room, with the right materials, and let them work.

Auto mode is the 47-page list. A VM is the room.

Top comments (8)

Mykola Kondratiuk • Mar 27

the permission fatigue point is so accurate. the hundredth approve looks exactly like the first and your brain just stops processing it. i think the real fix is not better tooling inside the agent - its moving the trust boundary outside. separate env, minimal blast radius. babysitting an agent in your prod environment is the wrong setup from the start.

Botánica Andina • Mar 29

I largely agree with the shift to "containment over constraint" – that blast radius point really resonates. But even within a contained environment, don't permissions still play a vital role in further reducing potential internal damage? It feels like they're still a layer, albeit a less critical one than the environment itself.

Razvan • Mar 31

With contained environments, I think it goes without saying that one should scrutinise the injected secrets. As for permission validation (think auto-mode): the model is structurally unsafe. It will always be catching up to new encodings of unsafe instructions in seemingly benign input.

klement Gunndu • Mar 27

The permission fatigue point resonates — we moved to scoped environments for long-running agent sessions and the biggest win wasn't security, it was that agents actually performed better without the constant interrupt overhead.

Apex Stack • Mar 29

The contractor analogy at the end is perfect. You don't hand someone a deny-list of hammers — you put them in the right room with the right tools.

I run 10+ scheduled agents on a content/data platform and the permission fatigue was killing productivity. Every session had dozens of approve/deny prompts for filesystem writes, API calls, git operations. The agents that run overnight (ETL pipelines, site audits, data syncs) were especially painful because you can't babysit a 3 AM cron job.

What I ended up doing is closer to your "scoped environment" approach than a full VM — each agent gets a defined workspace directory, explicit file mount points, and allowlisted network destinations. The key insight was exactly what @klement_gunndu said: agents actually perform better when they're not constantly interrupted by permission gates. Fewer context switches, more coherent output.

One thing I'd add: even inside a contained environment, output validation matters more than input restriction. Let the agent do whatever it wants inside the box, but validate what comes out before it touches anything permanent. A deploy safety check that refuses to sync if the build has fewer than N pages has saved me from more disasters than any permission prompt ever did.

Charlie Mulic • Mar 29

What's the worst you've been burned by skipping permissions? I once had an agent handle a production deployment and they nuked the env files unknowingly. It ended up not being too bad to recover from, but it was definitely a panic inducing moment that could have been much worse.

Luckily this was early in development and before I had any real process in place for deployments. Some lessons you learn the hard way.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.