DEV Community

Hoàn Lương
Hoàn Lương

Posted on

We built a scripting language just for AI agents. Here's why.

One of our AI agents deleted a directory it was never supposed to touch. The Python it wrote was valid. The model was confident. It did the wrong thing.

The agent was only supposed to query a database. But we gave it a full Python runtime, so it had access to os, shutil, everything. That's when we realized the problem wasn't the model — it was us handing it way too much power.

Why sandboxing is harder than it looks

The usual options aren't great:

  • Full runtime (Python/Node.js): easy to set up, hard to lock down properly. Restricting it after the fact is whack-a-mole.
  • Docker per agent: proper isolation, but ~200ms cold start and 100MB+ RAM each. At 50 concurrent agents that's 5GB just idling.

We wanted something lighter. Not "restricted Python" — something designed from scratch for how AI actually writes code.

AI code has a specific profile

After running a lot of agent scripts in production, the pattern is pretty consistent:

  • Under 100 lines almost always
  • Runs frequently, not once
  • Doesn't need filesystem, network, or OS access
  • Tends to produce infinite loops, wrong types, null accesses

General-purpose languages aren't built for this. So we built Autolang — a small scripting VM where AI can only call functions you explicitly registered. Nothing else is reachable.

How it works

AI writes Autolang script
    → static compiler validates types and scope
        → your registered JS / C++ functions do the actual work
Enter fullscreen mode Exit fullscreen mode

You wrap your existing functions as bindings. The AI calls those. That's it. It can't reach outside what you've registered.

Here's a real example — register a database binding:

compiler.registerBuiltInLibrary("company/products", `
  class Product (val name: String, val price: Int, val inStock: Bool)
  class Database {
    @native("get_products")
    static func get_products(): Array<Product>
  }
`, { autoImport: true }, {
  "get_products": () => fetchFromYourDB()
})
Enter fullscreen mode Exit fullscreen mode

The AI then writes something like:

@import("company/products")

val affordable = Database.get_products()
  .filter {|p| p.inStock && p.price <= 30 }

affordable.forEach {|p| println("- ${p.name}: $${p.price}") }
Enter fullscreen mode Exit fullscreen mode

It can't touch anything outside company/products. If it writes an infinite loop, the opcode limit kills it before it hangs your process.

The numbers

Native npm
Cold start ~10ms ~20ms
Warm start 1–2ms 2–4ms
RAM per instance ~4MB ~12MB

50 concurrent agents: ~200MB total. Docker would be 5GB+.

When it makes sense

Good fit if you're running 5+ concurrent agents, scripts are short and frequent, and you want controlled access to existing functions without rewriting them.

Probably not worth it if you have only a handful of agents, need OS-level security guarantees, need Python bindings (not ready yet), or your AI writes long complex programs.


npm install autolang-compiler
Enter fullscreen mode Exit fullscreen mode

Github: https://github.com/hoansdz/Autolang

Philosophy: autolang.vercel.app/docs/philosophy-vision

Live editor: autolang.vercel.app/docs/editor

Curious how others are handling this. What's your current setup for sandboxed agent code?

Top comments (0)