DEV Community

Teruo Kunihiro
Teruo Kunihiro

Posted on

Assessing TOON Token Savings in an MCP Server

I have been wiring TOON support with toon-token-diff into this MCP server to understand whether converting JSON payloads to TOON meaningfully reduces prompt costs. The short answer: TOON is elegant, but in my test harness it delivered microscopic savings for real-world workloads.

Environment

  • Project mode: toon-token-diff in libraryMode via npm install toon-token-diff
  • Models monitored: openai (tiktoken GPT-5 profile) and claude
  • Integration strategy: lightweight instrumentation that appends token stats into a JSONL ledger for later analysis
import { estimateAndLog } from "toon-token-diff/libraryMode";

// inside my MCP tool handler
estimateAndLog(JSON.stringify(result), {
  models: ["openai", "claude"],
  file: "./token-logs.jsonl",
  format: "json",
  label: "mcp_tool_call",
});
Enter fullscreen mode Exit fullscreen mode

This snippet runs after the MCP tool produces a JSON response. It serializes the payload, estimates TOON vs JSON tokens, and emits a structured record to token-logs.jsonl. The rest of the MCP server stays untouched—no need to change transport or business logic.

Observations

Timestamp (UTC) openai JSON openai TOON openai Δ (%) claude JSON claude TOON claude Δ (%)
2025-11-19T14:16:54.296Z 127 126 0.79 130 129 0.77
2025-11-19T14:17:15.720Z 53,703 53,702 0.0019 54,977 54,976 0.0018
2025-11-19T14:17:34.988Z 14 12 14.29 14 12 14.29
2025-11-19T14:17:39.246Z 53,703 53,702 0.0019 54,977 54,976 0.0018
2025-11-19T14:17:48.333Z 29 29 0.00 28 28 0.00
2025-11-19T14:18:13.725Z 91,729 91,728 0.0011 98,607 98,606 0.0010
2025-11-19T14:21:19.174Z 127 126 0.79 130 129 0.77
2025-11-19T14:21:23.370Z 91,729 91,728 0.0011 98,607 98,606 0.0010
2025-11-19T14:21:30.314Z 53,703 53,702 0.0019 54,977 54,976 0.0018

Nine consecutive tool runs told the same story: production payloads barely moved. Only the intentionally tiny sample showed double-digit savings, which is irrelevant for backlog-scale prompts.

Why the Reduction Rate Is Flat

  1. Content dominates token volume – The payload body itself accounts for nearly every token, so TOON’s structural tweaks barely register in the total.

Practical Guidance

  • Keep TOON handy as a normalization format, but don't promise cost savings without benchmarking your actual payloads.
  • Instrument with the libraryMode snippet above before ship time; it gives you historical evidence of whether TOON helps.
  • If savings are negligible, redirect effort toward higher-impact tactics: pruning unused fields, batching small tool calls, or applying semantic compression upstream.

Next Experiments

  • Compare with alternative tokenizers (Gemini, Llama) to see whether non-GPT vocabularies respond differently.
  • Add diff tooling that highlights specific fields TOON shrinks, so we can manually prune them if needed.
  • Explore policy-driven trimming (e.g., dropping debug blobs) prior to TOON conversion.

TOON remains a clever serialization trick, but as my MCP experiment showed, it is not an automatic token economy lever. Measure, log, and decide based on real numbers.

Resources

Top comments (1)

Collapse
 
appledevcanuck profile image
Apple Dev

Really appreciate the hard numbers here—nice reality check on TOON as a “token saver” vs just a clean format. Has anyone seen materially better deltas with different schemas, models, or extreme payload shapes? Any tips on semantic compression that worked well?