Abstract
As Large Language Models (LLMs) integration becomes ubiquitous in distributed systems, the cost of context window consumption has emerged as a critical infrastructure metric. While JSON remains the de facto standard for data interchange, its syntactic verbosity imposes a measurable "token tax" on every API call. This article introduces TOON (Token-Oriented Object Notation), a data serialization format designed explicitly for LLM token efficiency, and details the implementation of a Model Context Protocol (MCP) server in Rust. We demonstrate how this architecture achieves an 18-40% reduction in token usage while maintaining type safety and interoperability.
1. The Economics of Context
In the modern AI stack, tokens are a finite capability and a direct cost center. Whether you are running inference on OpenAI's GPT-4, Anthropic's Claude 3.5, or a self-hosted Llama 3, the billing model remains consistent: you pay for what you send.
Standard JSON, while human-readable and universally supported, is suboptimal for token-based pricing models. Consider the structural redundancy in a typical array of objects:
[
{ "id": "u_001", "name": "Alice Corp", "access_level": "admin", "region": "us-east-1" },
{ "id": "u_002", "name": "Bob Ltd", "access_level": "write", "region": "eu-west-1" },
{ "id": "u_003", "name": "Charlie Inc", "access_level": "read", "region": "ap-northeast-1" }
]
Every repeated key ("access_level", "region") and every predictable delimiter consumes context window space. For large datasets—such as RAG (Retrieval-Augmented Generation) payloads, log analysis, or historical data ingestion—this overhead accumulates rapidly, increasing both latency and operational expendature.
2. The Solution: Token-Oriented Object Notation (TOON)
TOON was architected to solve this specific inefficiency. It eliminates redundant keys and minimizes syntactic noise without sacrificing the schema-less flexibility that makes JSON attractive.
The previous example, encoded in TOON:
users[3]{id,name,access_level,region}:
u_001,Alice Corp,admin,us-east-1
u_002,Bob Ltd,write,eu-west-1
u_003,Charlie Inc,read,ap-northeast-1
Key Technical Characteristics
- Header-Row Schematization: Keys are defined once per object list, drastically reducing character count.
- Minimal Delimiters: Commas and newlines replace heavy bracketing.
- Type Inference: The format supports strict typing while allowing for concise representation.
In our benchmarks, this transformation yields a consistent 30-40% reduction in token count for array-heavy structures.
3. System Architecture: The Rust MCP Server
To operationalize TOON, we leveraged the Model Context Protocol (MCP). MCP provides a standardized interface for AI models to interact with external tools and data contexts. By wrapping the TOON logic in an MCP server, we decouple the implementation from the client, allowing any MCP-compliant agent (Claude Desktop, Cursor, IDE assistants) to utilize these optimization tools natively.
3.1 Why Rust?
We selected Rust for the server implementation (toon-mcp) to satisfy three non-functional requirements:
- Zero-Cost Abstractions: The
serdeecosystem allows for high-performance serialization/deserialization with minimal memory overhead. - Safety: Rust's ownership model ensures memory safety without a garbage collector, crucial for continuous-running sidecar processes.
- Portability: Compiling to a single, static binary simplifies the deployment of the MCP server across diverse engineering environments.
3.2 Tool Implementation
The server exposes a suite of distinct tools via the MCP protocol. The core logic is defined using the rmcp SDK's procedural macros, ensuring type-safe interfaces between the LLM and the Rust runtime.
The toon_encode Tool
This functional primitive accepts a generic JSON value and returns the compressed TOON string.
// Simplified signature for the MCP tool handler
#[tool(
name = "toon_encode",
description = "Optimizes JSON payloads into TOON format for token efficiency."
)]
pub fn encode(args: EncodeArgs) -> Result<EncodingResult, Error> {
let options = args.options.unwrap_or_default();
let toon_string = toon::to_string_with_options(&args.json, options)
.map_err(|e| Error::custom(format!("Serialization failed: {}", e)))?;
Ok(EncodingResult { content: toon_string })
}
The server also enables toon_decode for bi-directional interoperability (allowing the LLM to output TOON for the system to process) and toon_stats, a utility for realtime cost-benefit analysis.
4. Integration & Real-World Usage
Deploying the toon-mcp server involves a simple configuration addition to your MCP client (e.g., claude_desktop_config.json). Once active, the LLM treats toon_encode as a native capability.
Agent Workflow Example
When tasked with analyzing a large log file, an agent equipped with toon-mcp can autonomously optimize its context:
- Agent Logic: Recognizes a large JSON dataset in the prompt.
- Tool Invocation: Calls
toon_encode(json=dataset). - Context Injection: Receives the compact TOON string.
- Analysis: Processes the data using significantly fewer tokens.
This workflow is transparent to the end-user but results in faster inference and lower bills.
5. Benchmark Results
We conducted a series of tests against standard datasets to quantify the efficiency gains.
| Dataset Type | Size (JSON) | Size (TOON) | Reduction (Bytes) | Est. Token Reduction |
|---|---|---|---|---|
| User Logs (1k rows) | 145 KB | 92 KB | 36% | ~38% |
| E-commerce Config | 24 KB | 19 KB | 21% | ~20% |
| Geo-spatial Points | 512 KB | 290 KB | 43% | ~45% |
Note: Token counts are estimated using the cl100k_base tokenizer (GPT-4).
Conclusion
The toon-mcp server represents a pragmatic application of systems engineering to the problem of AI operational costs. By combining the efficiency of the TOON format with the performance of Rust and the standardization of MCP, we have created a robust tool for modern AI workflows.
For engineers looking to optimize their LLM infrastructure, toon-mcp captures the ethos of doing more with less—less latency, less cost, and less waste.
Repository:
copyleftdev
/
toon-mcp
MCP server for TOON format - token-efficient JSON alternative for LLM prompts
TOON MCP Server
MCP server exposing TOON format for LLM cost optimization. 18-40% token savings over JSON.
Repository: github.com/copyleftdev/toon-mcp
Demo
Installation
cargo build --release
Usage
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"toon": {
"command": "/path/to/toon-mcp"
}
}
}
Claude Code CLI
Add to ~/.claude/settings.json:
{
"mcpServers": {
"toon": {
"command": "/path/to/toon-mcp"
}
}
}
Or for project-specific configuration, create .mcp.json in your project root:
{
"mcpServers": {
"toon": {
"command": "/path/to/toon-mcp"
}
}
}
Cursor IDE
Add to .cursor/mcp.json in your project:
{
"mcpServers": {
"toon": {
"command": "/path/to/toon-mcp"
}
}
}
Generic MCP Client
The server uses stdio transport. Connect by spawning the process and communicating via stdin/stdout:
# Start server and send initialize request
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"my-client","version":"1.0"}}}' | ./toon-mcp
…

Top comments (0)