When building AI and LLM-based applications, one of the biggest hidden costs often comes from something simple β the format of your data.
Every {}, [], and " inside JSON counts as a token when you send it to a Large Language Model (LLM).
With big payloads or complex structured data, this can burn through tokens (and money) fast. β‘οΈ
That's where TOON (Token-Oriented Object Notation) steps in β a format designed specifically for LLMs to make structured data compact, readable, and token-efficient.
π‘ What Is TOON?
TOON stands for Token-Oriented Object Notation β a modern, lightweight data format optimized for LLMs.
Think of it as:
"JSON, reimagined for token efficiency and human readability."
It trims the excess β no curly braces, square brackets, or quotes β and uses indentation plus tabular patterns instead.
The result is a format that models (and humans) can parse easily, while using far fewer tokens.
βοΈ Why TOON Matters
When you send JSON to an LLM:
- Every punctuation mark adds to the token count.
- Repeated keys in long arrays multiply the cost.
- The verbosity doesn't actually help model understanding.
TOON solves this by:
- Declaring keys once per table-like block
- Replacing commas/braces with indentation
- Maintaining data clarity but cutting syntactic noise
π° The result: 30β60% fewer tokens on average.
π§ Example: TOON in Action
JSON
{
"users": [
{ "id": 1, "name": "Alice" },
{ "id": 2, "name": "Bob" }
]
}
TOON
users[2]{id,name}:
1,Alice
2,Bob
Same structure.
Same meaning.
Roughly half the tokens.
π§° Encode JSON β TOON in TypeScript
Try it yourself using the official TOON package.
Installation
npm install @toon-format/toon
# or
pnpm add @toon-format/toon
Example Code
import { encode, decode } from "@toon-format/toon";
const data = {
users: [
{ id: 1, name: "Alice", role: "admin" },
{ id: 2, name: "Bob", role: "user" },
],
};
const toon = encode(data);
console.log("TOON Format:\n", toon);
// Decode back to JSON if needed
const parsed = decode(toon);
console.log("Decoded JSON:\n", parsed);
Output
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
βοΈ JSON vs TOON
| Feature | JSON | TOON |
|---|---|---|
| Purpose | Universal data format (APIs, configs, storage) | Token-efficient format for LLMs |
| Syntax | Verbose {}, [], "
|
Compact indentation, tabular style |
| Readability | Moderate | High (human + model friendly) |
| Token Usage | High | π₯ Up to 60% fewer |
| Best Use Case | APIs, persistence | LLM prompts, structured outputs |
| Nested Objects | Excellent | β οΈ Inefficient for deep nesting |
| Ecosystem | Mature, universal | Emerging, growing fast |
β οΈ When Not to Use TOON
TOON shines for flat, tabular JSON objects, but it's not ideal for deeply nested structures.
In those cases, the extra indentation and context actually increase tokens.
Example:
{
"company": {
"departments": [
{
"name": "Engineering",
"employees": [{ "id": 1, "name": "Alice" }]
}
]
}
}
β‘ Converting this to TOON can be longer, not shorter.
β Best suited for
- Flat lists (users, products, messages)
- Prompt templates
- Model training or evaluation datasets
β Avoid for
- Deeply nested hierarchies
- Complex relational data
π Token Efficiency Snapshot
| Dataset | JSON Tokens | TOON Tokens | Savings |
|---|---|---|---|
| User list | 150 | 82 | β45% |
| Product catalog | 320 | 180 | β44% |
| Nested data | 410 | 435 | β +6% |
π§© TL;DR
TOON (Token-Oriented Object Notation) is a lightweight, token-efficient alternative to JSON β built for AI and LLM workloads.
β
Cleaner syntax
β
Human-readable
β
Up to 60% fewer tokens
But remember β it works best for flat JSON objects, not deeply nested structures.
If you're building LLM pipelines, prompt templates, or structured AI datasets, TOON can save tokens, reduce cost, and keep your data clean.
π§ͺ Bonus: Benchmark Token Count (JSON vs TOON)
Here's a quick Node.js script you can use to compare token usage between JSON and TOON using OpenAI's tiktoken tokenizer.
Install Dependencies
npm install @toon-format/toon tiktoken
Script
import { encode } from "@toon-format/toon";
import { encoding_for_model } from "tiktoken";
const data = {
users: [
{ id: 1, name: "Alice", role: "admin" },
{ id: 2, name: "Bob", role: "user" },
{ id: 3, name: "Charlie", role: "editor" },
],
};
const jsonData = JSON.stringify(data, null, 2);
const toonData = encode(data);
// Use GPT-4 tokenizer (you can change to "gpt-3.5-turbo" etc.)
const tokenizer = encoding_for_model("gpt-4o-mini");
const jsonTokens = tokenizer.encode(jsonData).length;
const toonTokens = tokenizer.encode(toonData).length;
console.log("π Token Comparison");
console.log("-------------------");
console.log("JSON tokens:", jsonTokens);
console.log("TOON tokens:", toonTokens);
console.log("Savings:", (((jsonTokens - toonTokens) / jsonTokens) * 100).toFixed(2) + "%");
tokenizer.free();
Example Output
π Token Comparison
-------------------
JSON tokens: 84
TOON tokens: 32
Savings: 61.90%
You can tweak this for your own datasets β you'll see consistent 30β60% token savings for flat, tabular data.
π¬ Final Thoughts
The ecosystem around LLMs is evolving fast, and even small optimizations β like switching from JSON to TOON β can create huge cost and performance improvements at scale.
Try it out, benchmark it, and see how many tokens (and dollars) you save! π
Tags: #AI #LLM #PromptEngineering #JSON #TOON #AIOptimization #OpenAI #DataCompression #DeveloperTools
Top comments (0)