DEV Community

Cover image for BoxAgnts Runtime (5) — MCP Is Just the Beginning, the Runtime Layer Is What Matters
Guyoung Studio
Guyoung Studio

Posted on

BoxAgnts Runtime (5) — MCP Is Just the Beginning, the Runtime Layer Is What Matters

The emergence of MCP (Model Context Protocol) marks a major milestone for the AI ecosystem. For the first time, the industry is converging around a shared interface for tool interaction—standardizing how models discover tools, invoke capabilities, exchange context, and communicate with external systems.

But MCP also reveals a larger architectural gap: it solves the protocol problem, not the runtime problem. And the runtime problem is becoming increasingly critical.


Protocols Are Not Runtimes

MCP standardizes communication—defining tool discovery, invocation, and resource management. This is valuable. But protocols only define "how systems communicate," not "how systems safely execute."

To analogize: HTTP standardized web communication, but it didn't solve application isolation, runtime governance, resource scheduling, or execution security. Those are the responsibilities of operating systems and runtimes.

BoxAgnts' MCP implementation embodies this layering. boxagnts/mcp/src/lib.rs handles all protocol-level logic—JSON-RPC 2.0 message format, initialize/initialized handshake, tools/list discovery, tools/call execution, stdio and HTTP/SSE transport:

// MCP client connection
pub async fn connect_stdio(config: &McpServerConfig) -> anyhow::Result<Self> {
    let backend = RmcpClientBackend::connect_stdio(config).await?;
    Ok(Self::from_backend(Arc::new(backend)))
}

// Tool invocation
pub async fn call_tool(&self, name: &str, arguments: Option<Value>) 
    -> anyhow::Result<CallToolResult> {
    self.backend()?.call_tool(name, arguments).await
}
Enter fullscreen mode Exit fullscreen mode

But note—the MCP client is only responsible for "calling the tool and getting the result." It is not responsible for "whether this tool should be called" or "under what constraints the call should execute." That responsibility belongs to the runtime layer.


The Current Agent Stack Is Incomplete

Most AI system architectures look like:

LLM → Prompt Framework → Tool Calling Protocol → Host Execution
Enter fullscreen mode Exit fullscreen mode

A layer is missing in the middle: runtime infrastructure. This layer is responsible for execution isolation, capability boundaries, resource constraints, state persistence, and execution observability.

BoxAgnts' complete stack clearly shows this layering:

LLM (api/ layer)
  ↓
Gateway / Query (gateway/ + query/)
  ↓
Tool Interface (tools/ + wasm-tools/)
  ↓
WASM Sandbox (wasm-sandbox/ layer)  ← This is the real runtime
  ↓
Host Resources
Enter fullscreen mode Exit fullscreen mode

MCP sits alongside the Tool Interface layer—it brings external tools into the agent's toolkit but doesn't alter the underlying execution isolation. In BoxAgnts, MCP tools are registered via McpToolWrapper:

// boxagnts/gateway/src/api/mcp.rs
pub struct McpToolWrapper {
    pub tool_def: ToolDefinition,
    pub server_name: String,
    pub manager: Arc<boxagnts_mcp::McpManager>,
}

impl Tool for McpToolWrapper {
    fn permission_level(&self) -> PermissionLevel {
        PermissionLevel::Execute  // MCP tools default to Execute level
    }
    // execute delegates the call to the remote MCP server
}
Enter fullscreen mode Exit fullscreen mode

Once MCP tools are plugged in, they use the same Tool trait interface as native tools—but their execution happens on the remote MCP server, outside BoxAgnts' WASM sandbox protection. This is a security boundary difference that requires clear awareness.


Tool Calling ≠ Tool Execution

MCP standardizes tool calling—the model selects a tool name, structured arguments, and an execution request. But the harder problems come after invocation: what permissions does the tool receive? What files can it access? Which network endpoints are allowed? How are resources constrained? How is execution isolated? How is behavior audited?

These are runtime concerns. MCP cannot answer them. BoxAgnts places MCP tools and native WASM tools under the same interface layer but distinguishes their execution paths:

  • WASM Tools: Execute inside the Wasmtime sandbox, fully constrained by RunOption
  • MCP Tools: Delegated through McpToolWrapper; trust boundary is the MCP server itself

This means the security of MCP tools depends on the implementation quality of the MCP server provider. If an MCP server doesn't sandbox—its tool calls are equivalent to direct host execution.


Why AI Agents Need Runtime Isolation

Traditional software already assumes applications may fail, dependencies may be compromised, and processes may behave unexpectedly. That's why containers, VMs, and process boundaries exist. AI agents face more severe problems: LLM-driven systems are exposed to prompt injection, adversarial documents, and manipulated context.

BoxAgnts' Connection Manager (boxagnts/mcp/src/connection_manager.rs) demonstrates that even MCP connections need governance:

pub async fn connect_all(&self) -> anyhow::Result<()> {
    for name in names {
        if let Err(e) = self.connect(&name).await {
            error!(server = %name, error = %e, 
                   "MCP server failed to connect during startup");
        }
    }
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Connection failures are handled in isolation—one MCP server going down doesn't affect others. This seems obvious, but many agent frameworks don't have even this layer.


The Industry Standardized the Wrong Layer First

The current ecosystem invests heavily in standardizing model interfaces, tool protocols, prompt formats, and orchestration frameworks. These are useful, but history shows infrastructure ultimately gets constrained by execution, not interfaces.

The web didn't scale purely because of HTTP—it scaled because of operating systems, process isolation, container orchestration, runtime environments, and scheduling systems. AI infrastructure is no different: tool protocols are necessary but not sufficient. Ultimately, the key differentiator is runtime reliability, not tool invocation syntax.

BoxAgnts' architecture foresaw this: the protocol layer (MCP) sits above, the runtime layer (WASM Sandbox) sits below. New tools can be discovered via protocol, but execution constraints are uniformly controlled by the runtime.


Runtime Engineering: An Emerging Infrastructure Discipline

Reliable AI systems require deterministic execution, explicit permissions, sandboxed tooling, governed orchestration, bounded side effects, resource accounting, and execution observability—these extend far beyond prompt engineering.

BoxAgnts embodies this direction across several key modules:

  • boxagnts/wasm-sandbox/: Execution isolation and capability constraints
  • boxagnts/tools/: Tool interface and permission model
  • boxagnts/gateway/cron/: Scheduled task execution governance
  • boxagnts/workspace/: State persistence and management

The future AI stack should be:

LLM
  ↓
Protocol Layer (MCP)
  ↓
Runtime Layer ← This layer needs massive engineering investment
  ↓
Capability Sandbox (WASM)
  ↓
Execution Infrastructure
Enter fullscreen mode Exit fullscreen mode

MCP Remains Extremely Important

None of the above diminishes MCP's value. Quite the opposite—standardized protocols make runtime innovation easier. A shared tool interface enables portable runtimes, interchangeable orchestration systems, and standardized capability injection.

Protocols simplify integration. Runtimes enforce behavior. Both layers matter, but they must not be conflated.


Conclusion

MCP standardizes how models communicate with external systems—an important milestone. But communication is only half the problem. The harder challenge is execution safety. As agents gain operational authority, production systems need runtime isolation, capability governance, deterministic execution, and sandboxed tooling.

The critical question is no longer "Can the model invoke tools?"—it's "Can the system execute safely?"


Resources

Top comments (0)