rednakta

Posted on Apr 18 • Edited on Apr 24

Zero Token Architecture: Why Your AI Agent Should Never See Your Real API Key

#agents #architecture #security #openclaw

Hot take: every AI agent security guide I've read is solving the wrong problem.

We spend hours sandboxing the runtime. We lock down the filesystem. We audit every package. We wrap the agent in Docker, then wrap Docker in a VM, then wrap the VM in policy.

And then we hand the agent a plaintext API key and call it secure.

Stop protecting the token. Just don't hand it over.

TL;DR

Prompt injection + arbitrary package execution means any token your AI agent can see is a token it can leak.
Instead of protecting the token after the agent has it, pass the agent a fake token whose value equals its own name.
Intercept the agent's outbound API call at the boundary and swap in the real token there.
If the fake leaks, the attacker gets a useless string. The real token never leaves your trusted process.

The problem with "protect the token"

Here's what an AI agent's environment typically looks like:

OPEN_API_TOKEN=sk-proj-1a2b3c4d5e...

That's a real, working key. The agent reads it, puts it in an Authorization: Bearer header, and makes calls. Fine — until any of these happen:

Prompt injection convinces the agent to echo $OPEN_API_TOKEN into its next response.
A malicious npm/pip package the agent installed reads process.env and POSTs it to a server far, far away.
The agent writes a log file that happens to include the header it just sent.
A tool call returns the token because the model decided it would be helpful.

Every mitigation we reach for — sandboxes, permission prompts, egress filtering, audit logs — is downstream of the mistake. The mistake is that the secret exists inside a process we do not trust.

You cannot perfectly contain a value inside a process that runs arbitrary, model-generated code. You just can't. So stop trying.

The paradigm flip

Ask a different question:

What if the agent never had the real token in the first place?

This sounds impossible, because API calls need tokens. But the agent doesn't need the real token — it just needs the call to succeed. If something else substitutes the real token on the way out, the agent's world is unchanged.

That something else is a tiny proxy sitting between your agent and the upstream LLM. Let's call it the boundary.

Before

# In the agent's environment
OPEN_API_TOKEN=sk-proj-1a2b3c4d5e...

The real token sits inside the agent. Compromise the agent, compromise the token.

After

# In the agent's environment
OPEN_API_TOKEN=OPEN_API_TOKEN

That's not a typo. The variable's value is its own name. The agent reads it, builds Authorization: Bearer OPEN_API_TOKEN, sends the request. It has no idea anything is weird.

The boundary intercepts the outbound call, recognizes the placeholder, swaps in the real token (which lives encrypted, outside the agent's reach), and forwards the request upstream.

┌───────────┐   OPEN_API_TOKEN   ┌──────────┐   sk-proj-real   ┌──────┐
│  Agent    │  ───────────────▶  │ Boundary │  ──────────────▶ │ LLM  │
└───────────┘                    └──────────┘                  └──────┘
     ▲                                                              │
     │                         response                             │
     └──────────────────────────────────────────────────────────────┘

From the agent's perspective: totally normal request, totally normal response. From the attacker's perspective, there's nothing worth stealing.

The hacker scenario

Let's pretend the worst happened. Prompt injection, malicious dependency, whatever — the attacker exfiltrates everything in the agent's environment.

Old world:

OPEN_API_TOKEN=sk-proj-1a2b3c4d5e...

Game over. Billable incidents. Rotation storm. PagerDuty at 3am.

New world:

OPEN_API_TOKEN=OPEN_API_TOKEN

Congratulations, they got a string. They can't call the LLM with it. They can't charge your account with it. They can't even prove which vendor it was for without extra context.

The leak still happened. We simply made the leaked value worthless.

This is the same logic as a one-time password or a macaroon: assume the secret will escape, and design so that escaping it costs the attacker nothing and you nothing.

Why this matters right now

Three trends collide:

Agents are running untrusted code. Tool use, code interpreters, and "install this skill" flows mean agent processes routinely execute arbitrary inputs.
Prompt injection is not solved. It's not going to be solved by a better system prompt. Treat agent processes as adversarial, always.
Tokens are expensive. A leaked OpenAI or Anthropic key is not just a credential breach, it's a bill.

Every AI agent stack I see ships with the real token in an env var because that's how twelve-factor apps work. Agents aren't twelve-factor apps. They're sandboxes for arbitrary model output, except the sandbox is "a language model promised to be careful."

The fix isn't a better sandbox. The fix is not putting the secret in the sandbox in the first place.

How to apply this

If you're rolling your own agent harness:

Put a local HTTP proxy between your agent and any upstream API.
Give the agent a placeholder token (KEY=KEY works fine).
Store the real secret outside the agent's process — OS keychain, a separate daemon, whatever.
In the proxy, match on the placeholder and substitute the real bearer before forwarding.
Refuse to forward requests that didn't come through the expected placeholder — this also catches agents trying to call arbitrary URLs.

If you'd rather not build this yourself, this idea is the spine of nilbox, an open-source desktop runtime for AI agents. It bundles the proxy, VM isolation, and an encrypted token store so any agent you install can't see your keys — even if it wants to. The full write-up lives in the Zero Token Architecture docs.

The takeaway

The whole security conversation around AI agents is framed as "how do we protect the token we gave the agent?" That's the wrong question.

The right question is: why did we give it a token at all?

If the agent never had it, the agent can't leak it. Everything else is downstream.

DEV Community