The Problem: One Tool Call Per Turn Is Expensive
If you've worked with LLMs and tool use, you know the pattern. The model decides it needs to call a tool. It emits a tool call. Your system executes it, returns the result. The model reads the result, reasons about it, and decides it needs another tool call. Repeat.
Every round trip burns tokens. The model re-reads the entire conversation history each time. For workflows that touch 5-10 tools — think "look up the customer, check their subscription, fetch recent invoices, calculate usage, draft a summary" — you're paying for the same context window over and over. The token cost adds up fast, and latency compounds with each turn.
The Solution: Let the LLM Write the Orchestration
Code Mode flips the pattern. Instead of one tool call per LLM turn, the model writes a short JavaScript program that orchestrates multiple tool calls in a single execution. The model gets the results all at once and reasons over the complete picture.
This is inspired by Cloudflare's Code Mode concept. The difference: VoidLLM's implementation is fully self-hosted, runs in a WASM-sandboxed runtime, and integrates with any MCP server you're already running.
In practice, this reduces token usage by 30-80% depending on the complexity of the tool workflow.
Architecture
The execution pipeline:
- The LLM receives auto-generated TypeScript type declarations describing all available MCP tools
- The LLM emits a JavaScript block that calls the tools it needs
- VoidLLM executes the JS inside a QuickJS runtime compiled to WebAssembly
- Tool calls within the JS are dispatched to upstream MCP servers via Streamable HTTP
- Results are collected and returned to the LLM in a single response
The WASM layer is powered by Wazero, a pure Go WebAssembly runtime. No CGO, no external dependencies. VoidLLM stays a single static binary.
Tools are exposed through an ES6 Proxy pattern — the LLM can call any tool by name without per-tool bindings.
Code Example
const [customer, invoices] = await Promise.all([
tools.crm.get_customer({ id: "cust_8a3f" }),
tools.billing.list_invoices({ customer_id: "cust_8a3f", limit: 5 })
]);
let churnRisk = "low";
if (invoices.some(inv => inv.status === "overdue")) {
const usage = await tools.analytics.get_usage({
customer_id: "cust_8a3f",
period: "30d"
});
churnRisk = usage.trend === "declining" ? "high" : "medium";
}
console.log(`Evaluated ${invoices.length} invoices`);
return { customer, invoices, churnRisk };
Five tool calls, conditional logic, parallel execution — all in one LLM turn instead of five. The console.log output is captured and returned alongside the result.
Security
The QuickJS WASM runtime has:
- No filesystem access — cannot read or write files on the host
- No network access — only dispatched MCP tool calls go through VoidLLM's controlled proxy layer
- No host access — the WASM module runs in an isolated memory space
On top of the sandbox, admins get a per-tool blocklist. You can allow Code Mode access to your CRM tools but block it from calling database.execute_raw_sql. Configuration is managed through VoidLLM's admin API and UI.
Getting Started
Add your MCP servers to voidllm.yaml:
mcp_servers:
- name: AWS Knowledge
alias: aws
url: https://knowledge-mcp.global.api.aws
auth_type: none
settings:
mcp:
code_mode:
enabled: true
pool_size: 8
timeout: 30s
max_tool_calls: 50
The Code Mode endpoint lives at /api/v1/mcp. Connect your IDE (Claude Code, Cursor, Windsurf) and the LLM will have list_servers, search_tools, and execute_code available alongside your regular tools.
Limitations
Being upfront about what Code Mode doesn't do yet:
- SSE transport not supported — only Streamable HTTP. Deprecated SSE servers are auto-detected and flagged.
- No OAuth for upstream MCP servers — API keys and custom headers only.
- Single instance only — WASM pool is in-memory.
Try It
VoidLLM is a lightweight LLM proxy written in Go — less than 2ms overhead, org/team/user hierarchy, key management, and usage tracking. Code Mode is the newest addition.
Top comments (0)