Juan Torchia

Posted on Apr 17 • Edited on Apr 20 • Originally published at juanchi.dev

Sandboxes for Coding Agents: What Freestyle Is and Why I Care

#english #technology #devops #claudecode

In 2005, when I was running a cyber café at 14, I got my first real lesson about processes that run unsupervised. A customer had left a script running — something that was "just downloading files," he said — and when I found it half an hour later it had consumed every last bit of the shop's bandwidth. Ten machines dead, people pissed, me with no idea where to even start. I learned that night that what you can't see executing can break everything.

I think about that every time I give Claude Code permission to make changes on a real project.

Sandboxes for coding agents: the problem nobody names clearly

There's something posts about coding agents keep avoiding saying out loud: the biggest risk isn't that they write bad code. Bad code you review, revert, fix. The real risk is execution.

An agent that writes wrong code is a quality problem. An agent that runs rm -rf in the wrong directory, or does an npm publish you never asked for, or calls an API with your cached credentials — that's a security problem. And it's a problem very few people are naming with the precision it deserves.

When I started integrating coding agents into my workflow — real projects, not demos — the first thing I did was read the logs of what was actually executing. Not because I specifically distrust Claude. Because I'm the same guy who took down a production server his first week with an rm -rf. I know exactly how much damage one command in the wrong context can do.

That's what led me to Freestyle.

What Freestyle is and what it actually solves

Freestyle landed on Hacker News recently with 188 points — a number that, to me, signals it hit a real nerve, not just hype. The pitch is straightforward: an execution sandbox for coding agents.

It's not a new concept. Sandboxes have existed in security forever. What's new is applying it specifically to the problem of coding agents that need to:

Run arbitrary code
Install dependencies
Execute tests
Possibly make HTTP requests
All of that without touching your real system

Freestyle gives you an isolated execution environment where the agent can do all of those things. If it breaks something, it breaks the sandbox. Your machine, your database, your credentials — they stay intact.

The architecture, in plain terms:

// What happens WITHOUT a sandbox (probably your current situation)
const agentRun = async (code: string) => {
  // The agent executes directly in your Node process
  // Has access to process.env (your secrets!)
  // Has access to the real filesystem
  // An npm install modifies your real node_modules
  eval(code) // oversimplified, but conceptually this is it
}

// What Freestyle proposes
const agentRunSandboxed = async (code: string) => {
  // Each execution runs in an isolated environment
  // Ephemeral filesystem — dies with the sandbox
  // Environment variables controlled explicitly
  // Network access is configurable (you can block it entirely)
  const sandbox = await Freestyle.createSandbox({
    runtime: 'node20',
    env: {
      // Only the secrets you want to expose, nothing else
      DATABASE_URL: process.env.SANDBOX_DATABASE_URL,
    },
    network: {
      // You can allowlist specific domains
      allowedHosts: ['api.openai.com']
    }
  })

  return await sandbox.execute(code)
}

In practical terms, that's enormous.

How it maps against my current workflow

My stack today is Next.js, TypeScript, PostgreSQL on Railway, and Claude Code as my main assistant. If you want the full breakdown, it's in how I built juanchi.dev and in the post on the stack I'd choose in 2025.

When I use Claude Code in interactive mode — the one that suggests changes and applies them — there's a constant tension. I give it enough context to be useful, which includes access to the project. But that access, by definition, includes things I don't want it touching automatically.

My current solution is basically manual: I review every change before confirming, I keep git at every step, and I never run suggestions directly on the project connected to the production database. It works. But it's friction.

A sandbox like Freestyle changes that equation. Instead of me supervising every micro-action, the sandbox defines the boundaries structurally. The agent can run whatever it wants inside the sandbox. Outside the sandbox, it doesn't exist.

For someone who's optimizing production performance with agents helping generate benchmarks and tests — this is the difference between "I'll let the agent try it" and "I'm scared to let the agent try it."

The common mistakes when you integrate agents without a sandbox

I'll be specific because I lived these.

Mistake 1: Credentials in the agent's context

If your agent runs in the same process as your app, it has access to process.env. All of it. DATABASE_URL, API keys, tokens. If the agent makes an HTTP request — for any reason — it could be exfiltrating those credentials. Not because it's malicious. Because the context isn't delimited.

// ❌ This looks harmless but it isn't
const agent = new ClaudeAgent({
  cwd: process.cwd(), // Has access to the entire project
  // process.env is implicitly available
})

// ✅ Be explicit about what it has and what it doesn't
const agent = new ClaudeAgent({
  cwd: '/tmp/sandbox-workspace', // Isolated directory
  env: {
    NODE_ENV: 'test',
    // Only what it needs for the specific task
  }
})

Mistake 2: npm install without control

An agent that can install packages can install anything. There are npm packages with malicious code that executes at install time. If the agent runs on your machine, that code also runs on your machine.

Mistake 3: Trusting that the agent "will ask for permission"

Some agents have confirmation mechanisms. Great. But that's UI, not security. Security has to live in system-level isolation, not in the model's good intentions.

Mistake 4: Mixing your dev database with production

This always applies, but with agents it becomes critical. If the agent has access to your production connection string — even "just to read" — you're one mistake away from something very expensive. My TypeScript patterns include specific helpers to separate these contexts, but a sandbox solves it at the infrastructure level.

What gives me pause about Freestyle (the critical part)

With all that said — and I mean it genuinely because I believe in the problem it's solving — there are things that still make me ask questions.

The cold start is real. Creating a sandbox per execution has latency. For interactive flows where the agent does many small iterations, that latency compounds. Freestyle mentions startup optimizations, but I haven't measured it in a real workflow yet.

Production pricing is still an open question. Sandboxes as a service have non-trivial infrastructure costs. If your agent does 50 executions per task, that pricing model scales very differently from a local execution.

Integration with Claude Code specifically isn't clearly documented. Or at least I didn't find it when I went looking. For my main workflow, I need to know how this connects to the agent I'm already using, not a new one.

It doesn't solve the output problem. The sandbox isolates execution, but the code the agent produces still ends up in your codebase. The sandbox saves you from damage during the process; code review saves you from damage in the result. You need both.

What I'd do differently: before adopting Freestyle as a primary dependency, I'd first build a basic sandbox myself with Docker — something I covered in the Docker for Node.js post — to understand the trade-offs before delegating that responsibility to an external service.

FAQ: Sandboxes for coding agents

What exactly is a sandbox for coding agents?
It's an isolated execution environment where a coding agent can run commands, install dependencies, and execute scripts without accessing the host system. Think of it as an ephemeral VM: everything that happens inside, dies inside. Your real filesystem, your environment variables, and your database aren't accessible unless you explicitly allow it.

Why isn't running the agent in Docker locally enough?
Docker is a valid option and it's what many of us do today. The specific value of Freestyle is that it abstracts the sandbox infrastructure and exposes it as an API, with lifecycle management, configurable networking, and support for multiple runtimes. Local Docker works, but it has setup friction and doesn't give you the same network isolation guarantees out of the box.

Is Freestyle open source or a service?
It's a service with an SDK. The SDK is open source; the infrastructure running the sandboxes is managed. Similar model to Vercel with Next.js: you can run it yourself, but the managed service is the natural entry point.

Does it work with any coding agent or only specific ones?
Conceptually it works with any agent that needs to execute code — Claude Code, GPT Engineer, Devin, or your own custom agent. The concrete integration depends on how your agent handles execution. If the agent calls a subprocess or an execution API, you can redirect those calls to Freestyle. If the agent has a very tightly coupled execution model, the integration gets more complex.

Does a sandbox solve all the security problems of coding agents?
No. The sandbox solves the problem of unsupervised execution. It doesn't solve: malicious code the agent produces and you deploy, prompt injection if the agent processes external input, or what you do with the sandbox output once you have it. It's a security layer, not complete security.

Is it worth it for personal projects or only for teams?
Depends on how much you use coding agents. If you use Claude Code or similar occasionally for code suggestions that you apply manually, you probably don't need a formal sandbox. If you have agents running autonomously — making commits, running tests, installing deps — a sandbox stops being nice-to-have and becomes necessary. For my current usage it's on the edge. When I move to more autonomous workflows, I'm going straight to the sandbox.

The right fear

The cyber café taught me that the problem isn't what you see running — it's what runs while you're not watching.

Coding agents are incredibly useful. I use them every day and I wouldn't go back. But there's a difference between using them as smart autocomplete and using them as autonomous agents with access to your real environment. That difference matters.

Freestyle is pointing at the right problem. The sandbox isn't paranoia — it's the same principle that led me to have separate databases per environment, to use secrets managers instead of .env in production, and to never run third-party code with more permissions than necessary. Principles that anyone who's worked in real infrastructure understands viscerally.

What I'd do today: if you're already using agents on real projects, check what access they actually have to your environment. If they have access to your full process.env, to your real database, or they run in the same process as your app — that's technically a zero sandbox. Start there, with Docker or with Freestyle, before the problem becomes more than theoretical.

Next.js App Router taught me that sometimes you spend two weeks angry at the right abstraction. I don't want to repeat that mistake with agent sandboxes.