Mr. 0x1

Posted on Jan 17

Optimizing LLM Context Windows: Reducing Token Usage by 40% with TOON and Rust

#rust #ai #performance

Abstract

As Large Language Models (LLMs) integration becomes ubiquitous in distributed systems, the cost of context window consumption has emerged as a critical infrastructure metric. While JSON remains the de facto standard for data interchange, its syntactic verbosity imposes a measurable "token tax" on every API call. This article introduces TOON (Token-Oriented Object Notation), a data serialization format designed explicitly for LLM token efficiency, and details the implementation of a Model Context Protocol (MCP) server in Rust. We demonstrate how this architecture achieves an 18-40% reduction in token usage while maintaining type safety and interoperability.

1. The Economics of Context

In the modern AI stack, tokens are a finite capability and a direct cost center. Whether you are running inference on OpenAI's GPT-4, Anthropic's Claude 3.5, or a self-hosted Llama 3, the billing model remains consistent: you pay for what you send.

Standard JSON, while human-readable and universally supported, is suboptimal for token-based pricing models. Consider the structural redundancy in a typical array of objects:

[
  { "id": "u_001", "name": "Alice Corp", "access_level": "admin", "region": "us-east-1" },
  { "id": "u_002", "name": "Bob Ltd", "access_level": "write", "region": "eu-west-1" },
  { "id": "u_003", "name": "Charlie Inc", "access_level": "read", "region": "ap-northeast-1" }
]

Every repeated key ("access_level", "region") and every predictable delimiter consumes context window space. For large datasets—such as RAG (Retrieval-Augmented Generation) payloads, log analysis, or historical data ingestion—this overhead accumulates rapidly, increasing both latency and operational expendature.

2. The Solution: Token-Oriented Object Notation (TOON)

TOON was architected to solve this specific inefficiency. It eliminates redundant keys and minimizes syntactic noise without sacrificing the schema-less flexibility that makes JSON attractive.

The previous example, encoded in TOON:

users[3]{id,name,access_level,region}:
  u_001,Alice Corp,admin,us-east-1
  u_002,Bob Ltd,write,eu-west-1
  u_003,Charlie Inc,read,ap-northeast-1

Key Technical Characteristics

Header-Row Schematization: Keys are defined once per object list, drastically reducing character count.
Minimal Delimiters: Commas and newlines replace heavy bracketing.
Type Inference: The format supports strict typing while allowing for concise representation.

In our benchmarks, this transformation yields a consistent 30-40% reduction in token count for array-heavy structures.

3. System Architecture: The Rust MCP Server

To operationalize TOON, we leveraged the Model Context Protocol (MCP). MCP provides a standardized interface for AI models to interact with external tools and data contexts. By wrapping the TOON logic in an MCP server, we decouple the implementation from the client, allowing any MCP-compliant agent (Claude Desktop, Cursor, IDE assistants) to utilize these optimization tools natively.

3.1 Why Rust?

We selected Rust for the server implementation (toon-mcp) to satisfy three non-functional requirements:

Zero-Cost Abstractions: The serde ecosystem allows for high-performance serialization/deserialization with minimal memory overhead.
Safety: Rust's ownership model ensures memory safety without a garbage collector, crucial for continuous-running sidecar processes.
Portability: Compiling to a single, static binary simplifies the deployment of the MCP server across diverse engineering environments.

3.2 Tool Implementation

The server exposes a suite of distinct tools via the MCP protocol. The core logic is defined using the rmcp SDK's procedural macros, ensuring type-safe interfaces between the LLM and the Rust runtime.

The `toon_encode` Tool

This functional primitive accepts a generic JSON value and returns the compressed TOON string.

// Simplified signature for the MCP tool handler
#[tool(
    name = "toon_encode",
    description = "Optimizes JSON payloads into TOON format for token efficiency."
)]
pub fn encode(args: EncodeArgs) -> Result<EncodingResult, Error> {
    let options = args.options.unwrap_or_default();
    let toon_string = toon::to_string_with_options(&args.json, options)
        .map_err(|e| Error::custom(format!("Serialization failed: {}", e)))?;

    Ok(EncodingResult { content: toon_string })
}

The server also enables toon_decode for bi-directional interoperability (allowing the LLM to output TOON for the system to process) and toon_stats, a utility for realtime cost-benefit analysis.

4. Integration & Real-World Usage

Deploying the toon-mcp server involves a simple configuration addition to your MCP client (e.g., claude_desktop_config.json). Once active, the LLM treats toon_encode as a native capability.

Agent Workflow Example

When tasked with analyzing a large log file, an agent equipped with toon-mcp can autonomously optimize its context:

Agent Logic: Recognizes a large JSON dataset in the prompt.
Tool Invocation: Calls toon_encode(json=dataset).
Context Injection: Receives the compact TOON string.
Analysis: Processes the data using significantly fewer tokens.

This workflow is transparent to the end-user but results in faster inference and lower bills.

5. Benchmark Results

We conducted a series of tests against standard datasets to quantify the efficiency gains.

Dataset Type	Size (JSON)	Size (TOON)	Reduction (Bytes)	Est. Token Reduction
User Logs (1k rows)	145 KB	92 KB	36%	~38%
E-commerce Config	24 KB	19 KB	21%	~20%
Geo-spatial Points	512 KB	290 KB	43%	~45%

Note: Token counts are estimated using the cl100k_base tokenizer (GPT-4).

Conclusion

The toon-mcp server represents a pragmatic application of systems engineering to the problem of AI operational costs. By combining the efficiency of the TOON format with the performance of Rust and the standardization of MCP, we have created a robust tool for modern AI workflows.

For engineers looking to optimize their LLM infrastructure, toon-mcp captures the ethos of doing more with less—less latency, less cost, and less waste.

Repository:

copyleftdev / toon-mcp

MCP server for TOON format - token-efficient JSON alternative for LLM prompts

TOON MCP Server

MCP server exposing TOON format for LLM cost optimization. 18-40% token savings over JSON.

Repository: github.com/copyleftdev/toon-mcp

Demo

Installation

cargo build --release

Usage

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "toon": {
      "command": "/path/to/toon-mcp"
    }
  }
}

Claude Code CLI

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "toon": {
      "command": "/path/to/toon-mcp"
    }
  }
}

Or for project-specific configuration, create .mcp.json in your project root:

{
  "mcpServers": {
    "toon": {
      "command": "/path/to/toon-mcp"
    }
  }
}

Cursor IDE

Add to .cursor/mcp.json in your project:

{
  "mcpServers": {
    "toon": {
      "command": "/path/to/toon-mcp"
    }
  }
}

Generic MCP Client

The server uses stdio transport. Connect by spawning the process and communicating via stdin/stdout:

# Start server and send initialize request
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"my-client","version":"1.0"}}}' | ./toon-mcp

…

View on GitHub

DEV Community

Optimizing LLM Context Windows: Reducing Token Usage by 40% with TOON and Rust

Abstract

1. The Economics of Context

2. The Solution: Token-Oriented Object Notation (TOON)

Key Technical Characteristics

3. System Architecture: The Rust MCP Server

3.1 Why Rust?

3.2 Tool Implementation

The `toon_encode` Tool

4. Integration & Real-World Usage

Agent Workflow Example

5. Benchmark Results

Conclusion

copyleftdev / toon-mcp

MCP server for TOON format - token-efficient JSON alternative for LLM prompts

TOON MCP Server

Demo

Installation

Usage

Claude Desktop

Claude Code CLI

Cursor IDE

Generic MCP Client

Top comments (0)

Abstract

1. The Economics of Context

2. The Solution: Token-Oriented Object Notation (TOON)

Key Technical Characteristics

3. System Architecture: The Rust MCP Server

3.1 Why Rust?

3.2 Tool Implementation

The toon_encode Tool

4. Integration & Real-World Usage

Agent Workflow Example

5. Benchmark Results

Conclusion

copyleftdev / toon-mcp

MCP server for TOON format - token-efficient JSON alternative for LLM prompts

TOON MCP Server

Demo

Installation

Usage

Claude Desktop

Claude Code CLI

Cursor IDE

Generic MCP Client

The `toon_encode` Tool