I built a WASM execution firewall for AI agents — here’s why

#typescript #rust #ai #discuss

I’ve been spending a lot of time lately experimenting with autonomous AI agents doing real work—writing code, running tools, coordinating steps. The kind of thing that goes beyond chat and starts touching real systems.

One thing that keeps bothering me is what happens when those agents start generating or modifying code that runs on its own. Not theoretical. Not a hallucination. Just actual code—specifically WASM modules—that gets compiled and executed without human review.

That’s where the idea for Night Core came from.

⸻

What I’m building

Night Core is a console for controlling execution of WebAssembly modules, especially when the code comes from agents, remote systems, or any source that isn’t fully trusted.

Before anything runs, it applies a few basic rules:
• Signature verification (Ed25519)
• Optional human approval before execution
• Logging and audit trail
• Sandboxed execution using Wasmtime

The architecture separates a worker that enforces policy from a UI that helps with approvals and visibility. It’s built in Rust with a Tauri frontend and TypeScript for the UI.

The code’s open here:
https://github.com/xnfinite/nightcoreapp

⸻

Why this matters (to me)

Most of the agent discussion I see is about whether the output is correct. But I’m more interested in what happens when that output becomes an action—especially code execution.

Once something runs, you’re already in response mode. Logs, alerts, and sandboxes are helpful, but they’re all after the fact. That’s what pushed me to treat execution itself as the boundary.

⸻

The threat model is simple
• Agents can generate or modify WASM
• The code might be functional but not necessarily safe
• Sandboxing limits damage, but doesn’t eliminate risk
• Repetition without review increases that risk

⸻

Still early, but here’s what I’m wondering
• Should AI-generated code always require a human review before it runs?
• Where is that too slow to be useful?
• How do we reason about trust and intent in generated artifacts?

This isn’t finished. It’s a working sketch. But I’d like to hear how others are thinking about this, or if you’re seeing the same edge cases.

You can see the full thread and screenshots here: https://x.com/xnfinitecore/status/2010067157774991457?s=46

GitHub: https://github.com/xnfinite/nightcoreapp