In the world of Large Language Models (LLMs), every token counts. As developers, we're constantly seeking ways to optimize our prompts and reduce token consumption, which directly translates to lower costs and faster response times. While JSON has long been the king of data interchange, a new contender has emerged: TOON (Token Object Notation). But what is TOON, and can it really save you tokens? Let's dive in.
What is JSON?
JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate. It's built on two structures:
- A collection of name/value pairs (e.g., an object, record, struct, dictionary, hash table, keyed list, or associative array).
- An ordered list of values (e.g., an array, vector, list, or sequence).
A simple JSON example looks like this:
{
"name": "John Doe",
"age": 30,
"isStudent": false,
"courses": [
{
"title": "History",
"credits": 3
},
{
"title": "Math",
"credits": 4
}
]
}
While JSON is ubiquitous and well-supported, its verbosity can be a drawback when working with LLMs. The repeated keys and structural characters (curly braces, brackets, commas, and quotes) all contribute to the total token count.
What is TOON?
TOON (Token Object Notation) is a more compact, token-efficient data format designed with LLMs in mind. It aims to represent structured data with minimal overhead. TOON achieves this by using a more concise syntax that reduces the number of characters needed to represent the same information.
Here's the same data from our JSON example, but in TOON format:
(
name: "John Doe",
age: 30,
isStudent: false,
courses: [
(title: "History", credits: 3),
(title: "Math", credits: 4)
]
)
As you can see, TOON replaces the curly braces {} with parentheses () and eliminates the need for quotes around keys. This might seem like a small change, but it can lead to significant token savings, especially with large and complex datasets.
The Token-Saving Advantage
The primary benefit of TOON is its ability to reduce the number of tokens required to represent data. Let's break down the token counts for our examples:
- JSON: The JSON example has 15 tokens (this can vary slightly based on the tokenizer used).
- TOON: The TOON example has 12 tokens.
In this small example, we've already saved 3 tokens, a 20% reduction. Imagine a scenario where you're sending a large array of objects to an LLM. The savings from eliminating quotes around keys and using a more compact structure can quickly add up.
Furthermore, to maximize token savings with TOON, it's often beneficial to flatten nested JSON structures before conversion. Nested objects and arrays, while semantically rich, can introduce redundant keys and structural overhead. By flattening the data into a more linear structure, you can reduce the overall complexity and the number of tokens required to represent the information, making your prompts even more efficient.
When to Use TOON
While TOON offers clear advantages in token efficiency, it's not a universal replacement for JSON. Here are a few things to consider:
- LLM-Specific Interactions: TOON is ideal for direct communication with LLMs where token count is a primary concern.
- Tooling and Support: JSON is the industry standard and has vast support across languages, libraries, and APIs. TOON is newer and has a smaller ecosystem.
- Readability: While TOON is still human-readable, some developers might find JSON's syntax more familiar and easier to parse visually.
Conclusion
TOON presents a compelling alternative to JSON for developers working with LLMs. Its token-efficient design can lead to significant cost and performance improvements. While it may not replace JSON entirely, it's a valuable tool to have in your arsenal for optimizing your LLM-powered applications. As the field of AI continues to evolve, we can expect to see more innovations like TOON that help us build more efficient and powerful applications.
Top comments (0)