Last month, I watched a production RAG pipeline burn $1940 in one weekend. A single 500-row customer table, encoded the usual way in classic JSON, did the damage. The exact same data would have cost $760 in TOON. Same model. Same answers. Same latency. 61 % fewer tokens.
You might have felt it yourself. You add one extra field to your context payload. The token counter spikes by hundreds. Suddenly, you trim keys or pray the model reads the structure right. We all patch around it because JSON has been the default for twenty years.
Most developers forget one detail. JSON landed in 2001. 5 years before the iPhone. 14 years before GPT-1. Douglas Crockford built JSON for Ajax round-trips between browsers and servers, not for trillion-parameter models that bill you per token. Every quoted key. Every repeated field name in an array. Every curly brace made perfect sense in a world without inference pricing.
In 2025, those symbols cost real money.
TOON kills that cost. It preserves every piece of the JSON data model (objects, arrays, numbers, and nulls), but rewrites the text for the one reader you actually pay for the LLM itself. It replaces multiple key rows with a single header row. Drops unnecessary quotes. Uses indentation instead of braces. Adds explicit length guards to prevent the model from guessing array sizes.
This article shows exactly why JSON became an accidental tax on AI work, how TOON removes that tax at the syntax level, and how you add it to your code today without rewriting your stack.
If you pay for tokens, keep reading. Your next bill depends on it.
JSON’s Legacy: A Web Standard, Not an AI One
JSON remains the gold standard for general-purpose data interchange. Its quoted keys, braces, brackets, and commas guarantee unambiguous parsing across every programming language and make payloads easy to inspect in browser consoles.
When JSON was created, those properties solved real problems. Bandwidth was the primary constraint, and token-based pricing did not exist.
Today, the constraint has changed. Take a single object
{
"id": 1,
"name": "Alice"
}
It uses ~26 tokens instead of the 6–8 that a human would count. Quotes, colons, commas, and braces each become separate subwords in modern BPE tokenizers.
When that object appears in a 500-row array, the key strings and surrounding punctuation repeat hundreds of times. Real-world benchmarks record 11,842 tokens for pretty-printed JSON and 4,617 tokens for the minified version. The language model receives no additional information from those repetitions; they exist solely for syntactic correctness in traditional parsers.
JSON remains the best choice for REST APIs, configuration files, and any system where token counting is irrelevant. Inside LLM prompts, however, the same syntax becomes unnecessary overhead, directly increasing costs and reducing available context.
What is TOON?
TOON (Token-Optimized Object Notation) is a drop-in text representation for structured data that preserves the full JSON data model, including objects, arrays, strings, numbers, booleans, and nulls. Still, it removes the punctuation and repetition that inflate token counts inside LLM prompts.
Rather than wrapping every object in braces and repeating keys on every row, TOON:
- Uses indentation instead of
{}and, - Declares array structure up front so fields don’t repeat
- Preserves ordering and schema explicitly
- Streams cleanly in line-based form for RAG pipelines
- Round-trip losslessly back to JSON
It is not a new database standard. It is not a compression algorithm. TOON gives models the data they need in the form they prefer: less syntax, more signal, fewer tokens.
How TOON Reduces Token Load Without Changing the Data
When JSON is used as model input, its syntax becomes a tax; the characters required for parsing increase the token count and reduce the available reasoning space.
TOON’s approach is to keep the full expressiveness of JSON while changing how the structure appears on the page. It focuses on the tokenizer as the primary consumer instead of the runtime environment.
Note: TOON optimizes repeated structure extremely well, but it isn’t a universal compressor. Highly nested or schema-less data will see smaller savings.
Below is a closer look at the mechanisms behind that change.
Indentation-Based Hierarchy Instead of Symbol-Based Delimiters
JSON depends on punctuation to express scope. Braces define objects. Brackets define arrays. Commas separate members. Tokenizers break each of these into its own subwords.
TOON moves this structural meaning into whitespace:\
- Two spaces represent one nesting level
- Each key begins a new line when introducing a child object
- Context defines interpretation, not braces
Example translation of nested objects:
{
"user": {
"profile": {
"city": "Paris"
}
}
}
becomes
user:
profile:
city: Paris
This reduces syntactic characters while preserving deterministic parseability. The parser tracks indentation levels instead of punctuation. This is a simpler signal for models to learn.
Header-Driven Arrays Replace Repetition With Declarative Structure
Uniform arrays are common in real data. JSON must repeat every field name and punctuation for every element. TOON compresses this by extracting shape into a single declaration:
items[<row count>]{<field order>}:
Then comes only the values:
items[3]{sku,qty,price}:
A12,4,19.99
B18,1,12.50
C22,3,9.25
Under the hood:
- Keys appear once
- Column order is guaranteed
- Rows are fixed-width logical tuples
On 500-row datasets, this structure often cuts the token count by more than half. The improvement scales linearly with array length.
Technical detection logic
The encoder collapses an array when:
- All elements are objects
- They share an identical key set
- Order of keys is stable
- Null fields remain valid inline values
Otherwise, TOON falls back to object-by-object expansion. No ambiguity or silent corruption.
Schema and Cardinality Propagated Into the Prompt
JSON implies structure. TOON exposes it. Models benefit from clearly defined boundaries.
Two design choices matter:
-
[N]explicitly sets expected row count
• {field1,field2,…} statically enforces column order
These guide extraction tasks in a way punctuation cannot. A model that invents an extra row contradicts the declared cardinality. A misplaced field becomes visibly misaligned.
This reduces hallucination in:
- Table reconstruction
- RAG answer grounding
- Tool responses requiring valid JSON output
Benchmarks show improvements in exact match metrics and fewer malformed outputs when LLMs decode TOON vs JSON.
Optimized for Tokenizers Rather Than Parsers
BPE and unigram tokenizers do not treat structural characters atomically:
- Quotes often tokenize as
", plus the first 1–2 characters of the key - Braces become unique token fragments not reused elsewhere
- Repeated key names are repeatedly segmented across the prompt
TOON leverages linguistic token merging:
- Alphanumeric keys tend to map to single tokens
- Indentation and line breaks fall into low-cost whitespace categories
- CSV-like patterns trigger high tokenizer reuse
Example token comparison for a 100-row table:
JSON minified: ~2,540 tokens
TOON equivalent: ~1,020 tokens
Same semantics, radically different tokenization behavior.
Deterministic Round-Trip and Streaming Support
The encoder is a pure transformation layer. It does not compress or interpret values. Decoding restores original JSON byte-for-byte (excluding whitespace variation in numbers and optional quotes).
Two primary APIs matter:
import { encode, decode, encodeLines } from '@toon-format/toon';
const text = encode(data); // Buffer → TOON text
const obj = decode(text); // TOON text → JSON structure
for await (const chunk of encodeLines(largeData)) {
// Suitable for incremental context injection in RAG
}
ˆLarge structured payloads can stream without materializing entire documents in memory. This benefits contexts where prompts change on the fly, such as agent pipelines.
Designed to Fail Loudly, Not Silently
JSON accepts constructs that can become fragile when interpreted by a model, missing commas, out-of-order fields, and trailing structure. Models sometimes look correct while outputting semantically broken JSON.
TOON’s strict format makes deviations more observable:
- Misindentation breaks structural parse
- Mismatched row counts surface immediately
- Field order mismatch is an error, not a tolerated reordering
Instead of debugging the LLM, the format itself catches the drift.
Why These Choices Matter
LLMs are probability engines, not parsers. They work best when the signal is strong and the requirements are explicit. TOON’s encoding strategy reduces the number of possible interpretations at every structural boundary, while reducing the token cost at the same time.
It is not a new data model. It is simply a more model-literate representation of the one we already use.
Benchmarks From Real Data
The most honest way to judge a data format is to see how it performs when real pipelines and real models are involved. The TOON benchmark suite focuses on everyday workloads that developers already push into prompts: employee directories, order histories, analytics logs, configuration objects, and nested product catalogs.
There are 209 structured extraction tasks in total. Testing covers four current model families: GPT 5 Nano, Gemini Flash, Claude Haiku, and Grok 4. Token counts are measured using the o200k base tokenizer so the results match real billing.
Here is the average outcome across mixed data shapes:
| Format | Accuracy | Tokens | Score* | Savings vs JSON |
|---|---|---|---|---|
| TOON | 73.9% | 2,744 | 26.9 | 39.6% fewer |
| JSON compact | 70.7% | 3,081 | 22.9 | none |
| YAML | 69.0% | 3,719 | 18.6 | N A |
| JSON | 69.7% | 4,545 | 15.3 | baseline |
| XML | 67.1% | 5,167 | 13.0 | N A |
Score shows correct extractions per 1,000 input tokens. It is a direct value for cost metric.
Uniform arrays show the biggest advantage. A 500 row e commerce orders dataset that required 11,842 tokens in JSON needed only 4,617 tokens in TOON. This represents a 61 percent reduction. At 1,000 GPT 4o prompts per day, that single workload saves roughly 1,740 dollars every month.
Accuracy improves as well. GPT 5 Nano reconstruction tests rose from 92.5 percent to 99.4 percent. The explicit field alignment and declared row counts help the model avoid dropped or invented entries. Nothing about the underlying information changes. The model simply has less noise to interpret and more room in the context window for data that matters.
How Teams Use TOON in Production
Adopting TOON rarely requires major changes. JSON remains the source of truth in databases and services. The only difference is that data is converted to TOON at the moment it becomes model input. This removes the token overhead that appears only in prompts, not in storage or APIs.
A typical retrieval augmented workflow looks like this:
from toon import encode
records = db.fetch_customers()
prompt = "Answer using this context:\n" + encode(records)
The model reads TOON as structured text without special instruction. If the response needs to return to typed objects, the same library converts it back into JSON. This keeps the rest of the stack untouched.
Agent systems also gain stability. When a tool returns a list of results, TOON’s explicit row counts and column order help the model avoid misalignment errors that would otherwise break the next step in the loop.
Streaming pipelines benefit, too. Because TOON is line oriented, prompts can be built incrementally without waiting for closing braces or bracket completion. The result is faster handoffs from retrieval to inference.
When TOON Helps and When JSON Still Makes Sense
TOON shows its strengths when models read large collections of records. In those prompts, much of the length comes from formatting rather than data. Removing that formatting gives the model the same information in a smaller space.
Some data does not benefit in the same way. Complex, irregular objects leave little structure that can be simplified, so token totals remain close to JSON. And outside of prompts, JSON continues to be a dependable standard for storage, APIs, and logging where token costs do not apply.
The right approach is to test with your own payloads. Measure how many tokens the model actually sees and how reliably it can reconstruct results. TOON is most helpful where structure repeats predictably and cost pressure is high.
Conclusion
Formats usually reflect the problems they were built to solve. JSON was created when the goal was to move data between browsers and servers with as little friction as possible. Its punctuation and repetition are part of that success story.
When that same format is aimed at a language model, the context changes. Models treat every character as a unit of computation, and punctuation becomes something they must process before they can reason about the information it describes. The result is more tokens consumed and less room for the details that matter.
TOON takes the data we already rely on and presents it in a way that models can read with less effort. It removes structure that exists only for traditional parsers while keeping the meaning intact. That difference shows up quickly in token use, in latency, and in the accuracy of structured extraction.
Better results without changing the data itself. That is the practical opportunity now in front of developers.






Top comments (0)