I have been wiring TOON support with toon-token-diff into this MCP server to understand whether converting JSON payloads to TOON meaningfully reduces prompt costs. The short answer: TOON is elegant, but in my test harness it delivered microscopic savings for real-world workloads.
Environment
-
Project mode:
toon-token-diffinlibraryModevianpm install toon-token-diff -
Models monitored:
openai(tiktoken GPT-5 profile) andclaude - Integration strategy: lightweight instrumentation that appends token stats into a JSONL ledger for later analysis
import { estimateAndLog } from "toon-token-diff/libraryMode";
// inside my MCP tool handler
estimateAndLog(JSON.stringify(result), {
models: ["openai", "claude"],
file: "./token-logs.jsonl",
format: "json",
label: "mcp_tool_call",
});
This snippet runs after the MCP tool produces a JSON response. It serializes the payload, estimates TOON vs JSON tokens, and emits a structured record to token-logs.jsonl. The rest of the MCP server stays untouched—no need to change transport or business logic.
Observations
| Timestamp (UTC) | openai JSON | openai TOON | openai Δ (%) | claude JSON | claude TOON | claude Δ (%) |
|---|---|---|---|---|---|---|
| 2025-11-19T14:16:54.296Z | 127 | 126 | 0.79 | 130 | 129 | 0.77 |
| 2025-11-19T14:17:15.720Z | 53,703 | 53,702 | 0.0019 | 54,977 | 54,976 | 0.0018 |
| 2025-11-19T14:17:34.988Z | 14 | 12 | 14.29 | 14 | 12 | 14.29 |
| 2025-11-19T14:17:39.246Z | 53,703 | 53,702 | 0.0019 | 54,977 | 54,976 | 0.0018 |
| 2025-11-19T14:17:48.333Z | 29 | 29 | 0.00 | 28 | 28 | 0.00 |
| 2025-11-19T14:18:13.725Z | 91,729 | 91,728 | 0.0011 | 98,607 | 98,606 | 0.0010 |
| 2025-11-19T14:21:19.174Z | 127 | 126 | 0.79 | 130 | 129 | 0.77 |
| 2025-11-19T14:21:23.370Z | 91,729 | 91,728 | 0.0011 | 98,607 | 98,606 | 0.0010 |
| 2025-11-19T14:21:30.314Z | 53,703 | 53,702 | 0.0019 | 54,977 | 54,976 | 0.0018 |
Nine consecutive tool runs told the same story: production payloads barely moved. Only the intentionally tiny sample showed double-digit savings, which is irrelevant for backlog-scale prompts.
Why the Reduction Rate Is Flat
- Content dominates token volume – The payload body itself accounts for nearly every token, so TOON’s structural tweaks barely register in the total.
Practical Guidance
- Keep TOON handy as a normalization format, but don't promise cost savings without benchmarking your actual payloads.
- Instrument with the libraryMode snippet above before ship time; it gives you historical evidence of whether TOON helps.
- If savings are negligible, redirect effort toward higher-impact tactics: pruning unused fields, batching small tool calls, or applying semantic compression upstream.
Next Experiments
- Compare with alternative tokenizers (Gemini, Llama) to see whether non-GPT vocabularies respond differently.
- Add diff tooling that highlights specific fields TOON shrinks, so we can manually prune them if needed.
- Explore policy-driven trimming (e.g., dropping debug blobs) prior to TOON conversion.
TOON remains a clever serialization trick, but as my MCP experiment showed, it is not an automatic token economy lever. Measure, log, and decide based on real numbers.
Top comments (1)
Really appreciate the hard numbers here—nice reality check on TOON as a “token saver” vs just a clean format. Has anyone seen materially better deltas with different schemas, models, or extreme payload shapes? Any tips on semantic compression that worked well?