Guyoung Studio

Posted on Jun 9

BoxAgnts Tool System (2) — The Security Model of Wasmtime Sandboxing

#ai #agents #rust #webassembly

The core rationale behind BoxAgnts choosing WebAssembly sandboxing: "capability-based injection" rather than "permission reduction."

What exactly does the Wasmtime sandbox isolate? Where are the boundaries of each layer of defense? And why are typical attack vectors ineffective against this model?

Why Traditional Sandboxes Are Patchwork

Take Docker as an example. Its security model relies on Linux namespaces (UTS, PID, mount, network, IPC, user, cgroup) combined with seccomp profiles. This combination works reasonably well at the application level, but for AI Agent tool scenarios, several problems emerge.

The first problem is the inherent flaw of syscall blacklists. seccomp's default behavior is "allow all syscalls, only block the specified list." Docker disables approximately 44 syscalls by default (reboot, kexec_load, add_key, etc.). If a newly discovered dangerous call isn't on the list, the protection is non-existent. More critically, AI model-driven tool invocation behavior is unpredictable — a human developer wouldn't write code that calls ptrace on other processes, but a bash command generated on the fly by an AI might accidentally trigger an unblocked syscall.

The second problem is the shared kernel attack surface. Namespaces provide view isolation (PID 1 inside the container is not the host's PID 1), but all containers share the same kernel instance. If a WASM tool triggers a kernel vulnerability through some path (e.g., an eBPF-related CVE), the escape risk propagates to the host level.

The WASM sandbox differs here structurally. A WASM program doesn't run on the host CPU — it runs on a virtual instruction set layer generated by Wasmtime's interpreter/JIT. It doesn't know what the x86 syscall instruction is, and it can't access the host's memory pages. Its "operating system" is the WASI interface — a function table explicitly injected by the host program.

// wasm-sandbox/src/run.rs - WASI capability injection
run_common.common.wasi.cli = Some(true);           // allow command-line arguments
run_common.common.wasi.http = Some(true);          // allow HTTP outbound
run_common.common.wasi.inherit_network = Some(true);
run_common.common.wasi.allow_ip_name_lookup = Some(true);
run_common.common.wasi.tcp = Some(true);
run_common.common.wasi.udp = Some(true);

Every Some(true) is an explicit authorization decision. Unauthorized WASI interfaces are completely invisible to the Guest — not "can't call," but "calling target doesn't exist." This is the core of the capability-based injection model.

Filesystem Isolation: Preopen Directory Handles

Traditional path-based whitelist approaches face symbolic link attacks and TOCTOU (time-of-check-time-of-use) problems. WASM's filesystem isolation takes a different path: preopen directory handles.

// wasm-sandbox/src/run.rs
let mut dirs: Vec<(String, String)> = Vec::new();
if let Some(dir) = option.work_dir {
    dirs.push((dir, "/".to_string()));  // host directory → Guest root
}
if let Some(map_dirs) = option.map_dirs {
    for (k, v) in map_dirs {
        dirs.push((k, v));  // custom mapping
    }
}

The principle works like this: during component initialization, Wasmtime passes the host directory into WASI through the preopen_dir interface. What's passed is a file descriptor (fd), not a path string. The Guest's / is the virtual root of this fd. Regardless of how the Guest internally performs cd, open, readdir, it always uses the fd provided by Wasmtime, and this fd's visibility scope was fixed at creation time by openat + O_NOFOLLOW.

This means symbolic link attacks are ineffective within the Guest — the WASM Guest never even received the directory fd containing the symlink target, and the kernel's path resolution is cut off on the host side.

TOCTOU is handled similarly. Preopen occurs at WASM component initialization; once the fd is created, subsequent directory permission changes don't affect the already-opened fd. Attackers cannot expand the Guest's visible scope at runtime by replacing directory contents.

Network ACL: Dual-Channel Validation

BoxAgnts adopted its network control design from the Spin Framework, implementing a whitelist + blacklist dual-channel validation system.

The whitelist (OutboundAllowedHosts) defines which domains WASM tools can connect to:

// Format examples
"https://api.github.com"           // exact match
"https://*.example.com"           // subdomain wildcard
"http://localhost:*"              // any port
"*://*.github.net"                // any protocol + subdomain

The blacklist (BlockedNetworks) blocks specific IP ranges and internal network access:

// block 1.1.1.1/32 (single IP)
// "private" keyword blocks all RFC 1918 addresses + loopback

Every time a WASM program initiates a connection through the network interface, a two-step check is triggered:

wasmtime_wasi::socket_addr_check(addr, addr_use, hosts, networks)
  │
  ├── Step 1: BlockedNetworks::is_blocked(ip)
  │     - Hit IP blacklist? (IpNetworkTable longest-prefix match)
  │     - block_private mode: reject all non-global-routing addresses
  │     - Special handling: IPv4-mapped IPv6 addresses (::ffff:x.x.x.x) reduced to IPv4 before check
  │
  └── Step 2: OutboundAllowedHosts::check_url(url, scheme)
        - Parse URL host portion
        - Match against whitelist entries (supports * wildcards and template variables)

There's a noteworthy implementation detail here. The IPv6 protocol defines the IPv4-mapped address format (::ffff:10.0.0.1). If the blacklist only checks the IPv4 format, an attacker could use http://[::ffff:10.0.0.1] to bypass a 10.0.0.0/8 blacklist rule. BoxAgnts' BlockedNetworks::is_blocked() explicitly handles this case:

// blocked_networks.rs
if let IpAddr::V6(ipv6) = ip_addr {
    if let Some(ipv4_compat) = ipv6.to_ipv4() {
        return self.is_blocked(&IpAddr::V4(ipv4_compat));
    }
}

Similarly, SocketAddrUse::TcpBind and UdpBind are directly rejected — WASM tools cannot listen on ports or run as servers.

Instruction-Level Resource Control: wasm_fuel

CPU time limits are typically implemented using timeouts. But timeouts have a granularity problem: if a WASM program executes 1 billion instructions in 1 second, pegs a CPU core, and then blocks before being killed by the timeout — that 1 second of CPU consumption is already a fait accompli. In dense multi-tool concurrency scenarios, this is enough to impact host responsiveness.

Wasmtime provides a finer-grained solution: Fuel Metering.

// run.rs
pub wasm_fuel: Option<u32>,   // initial fuel allocation

Each WASM instruction consumes 1 unit of fuel when executed (nop, drop, block, loop, and other control flow instructions consume 0). When fuel is exhausted, Wasmtime generates a trap; the host catches it, terminates the component, and reclaims all resources.

This mechanism relies on Wasmtime's Store::set_fuel() and Store::consume_fuel() APIs under the hood. Wasmtime checks remaining fuel at the entry of each basic block rather than per-instruction — a performance compromise (per-instruction checking is too expensive), but for WASM programs with substantial code, each basic block typically contains at most a few dozen instructions, so the precision loss is acceptable.

Combined with wasm_timeout, wasm_max_memory_size, and wasm_max_wasm_stack, BoxAgnts forms a complete two-dimensional (time + space) resource constraint matrix for WASM tools:

Constraint Dimension	Parameter	Violation Behavior
CPU Time	`wasm_timeout`	Timeout → kill component
CPU Instructions	`wasm_fuel`	Fuel exhausted → trap
Heap Memory	`wasm_max_memory_size`	memory.grow failure → OOM trap
Stack Memory	`wasm_max_wasm_stack`	Stack overflow → trap

These constraints take effect at the Wasmtime Engine level. WASM programs cannot bypass them through any code path — this isn't the sandbox "intercepting," it's the sandbox's core semantics making over-limit operations impossible in the first place.

The PermissionLevel Classification System

Sandboxing ensures security isolation, but another problem remains: users need to set different trust levels for different tools. Giving a web search tool the same permissions as Bash is unreasonable.

BoxAgnts defines a four-level permission classification:

pub enum PermissionLevel {
    None,       // pure information query, no side effects
    ReadOnly,   // read-only operations
    Write,      // write operations
    Execute,    // command execution
}

Permission enforcement happens before tool execution. After run_query_loop() detects a ToolUse request and before calling tool.execute(), it performs a permission cross-check:

// Pseudocode; actual implementation in gateway/api/tool.rs
match (tool.permission_level(), ctx.permission_mode) {
    (_, PermissionMode::Full) => Ok(()),     // full permission mode, no restrictions
    (None | ReadOnly, PermissionMode::ReadOnly) => Ok(()),
    (Write | Execute, PermissionMode::ReadOnly) => Err("insufficient permission"),
}

Users can configure different PermissionMode settings for different Agents through the Dashboard. For example, create a code-review-only Agent set to ReadOnly — even if it calls the Bash tool, the system will reject it before execution.

Why This Model Beats AppArmor/seccomp for AI Agents

Let's review the overall structure. The security sequence for traditional approaches is:

Tool code loaded → Execute → Syscall → seccomp check → Allow/Deny
                                        ↑ Risk has already occurred before this point

BoxAgnts' security sequence is:

WASM component loaded → Capability injection (file fd, network whitelist, resource limits) → Execute
                        ↑ All permissions determined before execution, non-extensible

The difference lies in the "timing of security boundary establishment." In traditional approaches, there's a natural time gap between execution and permission checking — this gap is the attack surface. In the WASM approach, the execution environment is fully constrained before code ever gains control; the Guest has no means to extend its own permission boundary.

This isn't to say the WASM sandbox is naturally immune to all security issues. Wasmtime itself may have security vulnerabilities (see wasmtime-related entries in the RustSec advisory database), and compiler infrastructure errors could lead to Guest escape. But for AI Agent tool use cases — primarily file operations, network access, database queries — the WASM sandbox's security boundary far exceeds the necessary level.

This also explains why BoxAgnts' built-in bash tool doesn't need an additional WASM sandbox layer. The bash tool itself is WASM-compiled — it doesn't invoke the host shell; it's a complete shell implementation compiled to wasm32-wasi, running inside the sandbox. All the isolation mechanisms described above apply to it equally.

Summary

Wasmtime sandboxing provides BoxAgnts with three capabilities that traditional approaches cannot simultaneously satisfy:

Instruction-level isolation. WASM programs run on a virtual instruction set, with no direct access to host CPU, memory, or syscall interfaces. The security boundary isn't "filtered syscalls" — it's "no syscall invocation path exists."
Capability-based resource control. Filesystem (preopen fd), network (whitelist + blacklist dual-channel), CPU (wasm_fuel + timeout), memory (max_memory_size + max_wasm_stack) — all resources are precisely constrained before component launch and cannot be extended at runtime.
Microsecond-level sandbox startup. Unlike Docker's second-level cold starts and hundreds-of-millisecond warm starts, WASM component loading and initialization overhead is in the microsecond range. This speed difference is critical for high-frequency tool invocation in Agent conversations — when the AI repeatedly calls tools in a loop for parameter exploration, sandbox startup latency directly determines user experience.

The IPv4-mapped IPv6 anti-bypass handling and four-level PermissionLevel classification further harden the security boundary: the former closes IP whitelist/blacklist vulnerabilities, and the latter lets users authorize different tool sets to different Agents based on trust levels.