Code Mode: Batching MCP Tool Calls in a WASM Sandbox to Cut LLM Token Usage by 30-80%

#ai #llm #mcp #performance

The Problem: One Tool Call Per Turn Is Expensive

If you've worked with LLMs and tool use, you know the pattern. The model decides it needs to call a tool. It emits a tool call. Your system executes it, returns the result. The model reads the result, reasons about it, and decides it needs another tool call. Repeat.

Every round trip burns tokens. The model re-reads the entire conversation history each time. For workflows that touch 5-10 tools — think "look up the customer, check their subscription, fetch recent invoices, calculate usage, draft a summary" — you're paying for the same context window over and over. The token cost adds up fast, and latency compounds with each turn.

The Solution: Let the LLM Write the Orchestration

Code Mode flips the pattern. Instead of one tool call per LLM turn, the model writes a short JavaScript program that orchestrates multiple tool calls in a single execution. The model gets the results all at once and reasons over the complete picture.

This is inspired by Cloudflare's Code Mode concept. The difference: VoidLLM's implementation is fully self-hosted, runs in a WASM-sandboxed runtime, and integrates with any MCP server you're already running.

In practice, this reduces token usage by 30-80% depending on the complexity of the tool workflow.

Architecture

The execution pipeline:

The LLM receives auto-generated TypeScript type declarations describing all available MCP tools
The LLM emits a JavaScript block that calls the tools it needs
VoidLLM executes the JS inside a QuickJS runtime compiled to WebAssembly
Tool calls within the JS are dispatched to upstream MCP servers via Streamable HTTP
Results are collected and returned to the LLM in a single response

The WASM layer is powered by Wazero, a pure Go WebAssembly runtime. No CGO, no external dependencies. VoidLLM stays a single static binary.

Tools are exposed through an ES6 Proxy pattern — the LLM can call any tool by name without per-tool bindings.

Code Example

const [customer, invoices] = await Promise.all([
  tools.crm.get_customer({ id: "cust_8a3f" }),
  tools.billing.list_invoices({ customer_id: "cust_8a3f", limit: 5 })
]);

let churnRisk = "low";
if (invoices.some(inv => inv.status === "overdue")) {
  const usage = await tools.analytics.get_usage({
    customer_id: "cust_8a3f",
    period: "30d"
  });
  churnRisk = usage.trend === "declining" ? "high" : "medium";
}

console.log(`Evaluated ${invoices.length} invoices`);
return { customer, invoices, churnRisk };

Five tool calls, conditional logic, parallel execution — all in one LLM turn instead of five. The console.log output is captured and returned alongside the result.

Security

The QuickJS WASM runtime has:

No filesystem access — cannot read or write files on the host
No network access — only dispatched MCP tool calls go through VoidLLM's controlled proxy layer
No host access — the WASM module runs in an isolated memory space

On top of the sandbox, admins get a per-tool blocklist. You can allow Code Mode access to your CRM tools but block it from calling database.execute_raw_sql. Configuration is managed through VoidLLM's admin API and UI.

Getting Started

Add your MCP servers to voidllm.yaml:

mcp_servers:
  - name: AWS Knowledge
    alias: aws
    url: https://knowledge-mcp.global.api.aws
    auth_type: none

settings:
  mcp:
    code_mode:
      enabled: true
      pool_size: 8
      timeout: 30s
      max_tool_calls: 50

The Code Mode endpoint lives at /api/v1/mcp. Connect your IDE (Claude Code, Cursor, Windsurf) and the LLM will have list_servers, search_tools, and execute_code available alongside your regular tools.

Limitations

Being upfront about what Code Mode doesn't do yet:

SSE transport not supported — only Streamable HTTP. Deprecated SSE servers are auto-detected and flagged.
No OAuth for upstream MCP servers — API keys and custom headers only.
Single instance only — WASM pool is in-memory.

Try It

VoidLLM is a lightweight LLM proxy written in Go — less than 2ms overhead, org/team/user hierarchy, key management, and usage tracking. Code Mode is the newest addition.

GitHub: https://github.com/voidmind-io/voidllm