We Sandboxed Every AI Agent. Here's What We Learned (And What You Should Too)

#ai #productivity #tutorial #automation

One morning, my AI agent sent me a reminder to take my fish oil supplements. Specific brands, exact dosages, the time I'm supposed to take them. The problem is I never told my agent any of that. Those were someone else's supplements.

We were in the early days of building our product on OpenClaw, an open-source agent framework. We'd set up a single OpenClaw instance, created about ten agents for the team, and handed them out to colleagues. Everyone was chatting with their agents, testing workflows, having fun with it. The kind of energy you get when something genuinely new is working.

Then the supplement reminder showed up. I started digging, and the root cause was almost comically simple: all the agents were running on one shared instance. One agent couldn't deliver a message to its owner, so it used every available channel to try. That included other people's conversations. Personal health data, showing up in the wrong chat.

Other colleagues reported the same thing. Reminders about appointments they didn't have, tasks they didn't set. One agent's context was leaking into another's conversations because the underlying system treated them all as connected.

That was our wake-up call. Not a theoretical security paper, not a compliance requirement. A real data leak, in the first week, among friends.

What we tried before sandboxing

Our first instinct was communication. We wrote detailed instructions in MD configuration files telling agents what they could and couldn't do. Think of it like giving a child a long list of house rules. Don't go into other people's rooms. Don't share personal information. Don't touch anything you didn't create.

It worked, sort of. For a while. Agents mostly followed the rules. But context windows compact over time, sessions restart, and the rules fade. An agent that respected boundaries on Monday might cross them on Thursday because the instruction had been pushed out of its working memory.

We also realized a second problem that the rules approach couldn't touch: users can break their own agents. When someone's chatting with an AI that has file system access and the ability to modify code, it takes one casual request to accidentally corrupt the configuration that makes the whole thing run. A SaaS product where the customer can brick their own agent with one message isn't a product anyone will keep using.

MD files and good behavior instructions weren't going to cut it. We needed actual walls, not suggestions.

Why containers, not VMs

We briefly considered giving each client their own virtual machine. Full isolation, total separation. The math killed it fast. Supporting thousands of VMs with individual maintenance, patching, and monitoring is a different business entirely. We needed isolation that could scale without requiring a dedicated ops team per hundred clients.

The turning point was a comment from Peter, OpenClaw's creator. He put it simply: "OpenClaw is not a bus carrying lots of different passengers. If you need isolation and multiple owners, dockerize." That settled the debate for us.

Containers were the answer. One isolated Docker container per client, each with its own file system, its own process space, its own resource limits. The agent sees /workspace and nothing else. No /home, no environment variables with API keys, no access to the host. If you ask the agent what's in the root directory, it shows you an empty workspace. That's the entire world from its perspective.

But a container alone wasn't enough. We ended up building seven layers of isolation in total. The container is the most visible one, but the others handle things like credential separation, network boundaries, and media access control. I won't detail every layer here, but the point is that if any single one fails, the others still hold.

One specific OpenClaw setting made the difference between a sandbox and a decoration: tools.elevated.enabled. OpenClaw allows agents to run code on the host machine by default (true). That's powerful for development but catastrophic in a multi-tenant environment. We set it to false for every client container. Without that single flag, none of the other isolation measures matter.

Agents should never hold real keys

Here's the architectural decision I'm most convinced was right: our agents don't make external API calls. Not to language model providers, not to media generation services, not to messaging platforms. None of it.

A separate service sits outside the sandbox and handles all external communication. When the agent needs to send a Telegram message or generate an image, it talks to an internal endpoint. That endpoint validates the request, injects the real credentials, makes the actual API call, and returns the result. The agent never sees a single API key.

We call these tokenized routes. The agent gets a capability (send a message, generate media) without getting the credentials that make it possible. Even if someone manages to get the agent to dump its entire environment, there's nothing useful there.

We learned this one the hard way too. Early on, a tester asked their agent for the language model API key. The agent shared it happily. It had access, there was no rule saying it couldn't, so it did. After that, we redesigned the entire approach so secrets never enter the sandbox in the first place.

For comparison: I signed up for another platform that offers AI agents, created an agent, and asked it for its API keys. It refused, which was good. Then I asked it to archive everything visible in its root directory and send me the file. The ZIP arrived within seconds. Inside were working OpenRouter API keys. Their agent followed the instruction not to share keys when asked directly, but didn't understand that zipping the file system and sending the archive bypassed that restriction entirely. The keys were sitting in plaintext config files the agent could read and package.

That's the difference between rules and architecture. Rules can be circumvented by creative phrasing. Architecture can't.

Everything broke, and that was fine

When we first deployed containers, nothing worked. The agent stopped responding entirely. After we fixed the basics, it could talk again but couldn't access its long-term memory, couldn't run skills, couldn't process voice messages. Every single capability in OpenClaw was wired through paths that no longer existed inside the container.

We had to rewire everything. Every file path, every service connection, every tool integration. It took weeks. There were mornings when the temptation to just revert to the old setup was real.

But here's what keeps you going in those moments: you know the old setup leaks data. You know the rules-based approach decays over time. There's no comfortable middle ground to retreat to. The only way is through.

Once it was done, something unexpected happened. The sandbox didn't just protect clients from each other. It protected each client from themselves. Users could experiment freely with their agents, try things, make mistakes, push limits, and the worst that could happen was a restart from a clean state. It turned out that real isolation is as much an enabler as it is a constraint.

The cost of doing it right

I'll be honest about the overhead. Running each agent in its own isolated container, with all the proxy layers and seven isolation boundaries, costs roughly 2.5 times more per session than running agents on shared infrastructure. That's real money.

I think it's a small price. One data leak in a multi-tenant AI system doesn't just affect the users involved. It destroys trust in the entire product category. We've seen it happen with other types of SaaS. For AI agents that handle personal data, calendars, emails, and voice messages, the bar has to be higher.

We introduced quality scores after the sandbox was stable. Over twenty metrics tracking agent behavior, response quality, tool usage patterns, and boundary compliance. The sandbox gave us a controlled environment where we could actually measure these things reliably.

What I'd tell another team

If you're building anything where AI agents interact with user data, sandbox from day one. Not after your first incident, not after you reach a certain scale. From the beginning.

The refactoring cost only grows over time. Every feature you build on shared infrastructure becomes a feature you later have to re-architect for isolation. We did it relatively early and it still took weeks of everything being broken. Teams that wait longer will have a much harder time.

Rules and prompts are not security. They're suggestions with a half-life measured in context windows. Real isolation means the agent physically cannot access what it shouldn't, regardless of what anyone asks it to do.

This is what we build at Amplify. Personal AI assistants that live in your messenger and handle real work, from email and calendar management to voice notes and media generation. Every client gets their own isolated environment with seven layers of protection. The agent knows your work, your preferences, your contacts. It doesn't know anyone else's.

We built Amplify on OpenClaw. If you're working on agent isolation or multi-tenant AI architecture, the framework and our approach are a good place to start.

Yevhen Fychak is CTO and co-founder of Amplify. He writes about building AI products and the engineering decisions behind them.