İsmail Kağan Acar

Posted on Nov 7

TOON: New data format for LLM Applications

#llm #rag #json #ai

Understanding the Token Economy Problem

Large Language Models (LLMs) work by “reading” information in small pieces called tokens.

Think of tokens like building blocks. These tokens are important because they control two main things:

Cost: How much you have to pay to API’s or GPU’s.
Memory: How much information the LLM can handle at one time (this is its “context window”).

When you give the LLM organized data like Tool/API outputs, the format you use (like JSON) changes how many tokens it takes up.

Common formats like JSON are widely used but aren’t token-efficient. They often use more tokens than necessary, which drives up costs and uses up the LLM’s limited memory.

This is where Token-Oriented Object Notation (TOON) enters the field. Created in 2025, TOON is a compact data serialization format specifically designed for passing structured data to LLMs claiming 30–60% token reduction compared to JSON.[1]

What is TOON?

TOON is a text-based data format that prioritizes token efficiency while maintaining human readability. Unlike JSON, which repeats field names for every object, TOON uses a tabular approach where field names are declared once in a header, and subsequent rows contain only values.[1]

The Core Syntax

TOON combines familiar patterns from YAML (indentation-based structure) and CSV (tabular format) into a cohesive system that LLMs can parse naturally.[1]

users[3]{id,name,role}:
  1,Alice Smith,admin
  2,Bob Jones,user
  3,Carol White,moderator

The syntax breaks down as:

users — Collection name
[3] — Number of items (optional but recommended)
{id,name,role} — Field names declared once
Subsequent lines contain comma-separated values
2-space indentation for nesting

Comparison with JSON

//JSON format (117 tokens):
{
  "items": [    {
      "sku": "A1",
      "name": "Widget",
      "qty": 2,
      "price": 9.99
    },
    {
      "sku": "B2",
      "name": "Gadget",
      "qty": 1,
      "price": 14.5
    },
    {
      "sku": "C3",
      "name": "Doohickey",
      "qty": 5,
      "price": 7.25
    }
  ]
}

// TOON format (49 tokens):
items[3]{sku,name,qty,price}:
  A1,Widget,2,9.99
  B2,Gadget,1,14.5
  C3,Doohickey,5,7.25

This example demonstrates a 58.1% token reduction (68 fewer tokens) for identical information.[1] The efficiency comes from eliminating repeated field names and reducing structural overhead.

How LLMs Understand TOON Without Training

A critical question emerges: if TOON was created in 2025, how can existing LLMs understand it when they weren’t trained on this format?

The answer lies in transfer learning. Even without specific training, LLMs can understand TOON because it borrows familiar patterns from formats they were trained on (specifically YAML and CSV).[1]

“TOON works best when you show the format instead of describing it. The structure is self-documenting — models parse it naturally once they see the pattern. Models treat it like familiar YAML or CSV.”[1]

This understanding capability means TOON can be deployed immediately without fine-tuning.

Using TOON in Practice

Implementation Libraries

TOON has many community driven production-ready implementations across multiple programming languages:

Python: python-toon by xaviviro[7]
Python: toon-llm by davidpirogov[8]

from toon import parse, stringify
# Parse TOON to Python objects
data = parse("""
products[2]{id,name,price}:
  101,Laptop,999.99
  102,Mouse,29.99
""")
# Access data
print(data['products'][0]['name'])  # Output: Laptop
# Convert Python objects to TOON
users = {
    'users': [        {'id': 1, 'name': 'Alice', 'role': 'admin'},
        {'id': 2, 'name': 'Bob', 'role': 'user'}
    ]
}
toon_string = stringify(users)

JavaScript/TypeScript: Official reference implementation[1]

import { parse, stringify } from 'toon-format'
// Parse TOON string
const data = parse(`
orders[3]{orderId,customer,total}:
  ORD-001,John Smith,150.00
  ORD-002,Jane Doe,275.50``
  ORD-003,Bob Wilson,89.99
`)
// Access data
console.log(data.orders[0].customer) // Output: John Smith
// Convert to TOON
const inventory = {
  items: [    { sku: 'A1', stock: 50, location: 'Warehouse-A' },
    { sku: 'B2', stock: 23, location: 'Warehouse-B' }
  ]
}
const toonString = stringify(inventory)

PHP: toon-php by HelgeSverre[9]

use Toon\Parser;
use Toon\Formatter;
// Parse TOON
$parser = new Parser();
$data = $parser->parse("
employees[2]{id,name,department}:
  1001,Alice Johnson,Engineering
  1002,Bob Smith,Marketing
");
// Access data
echo $data['employees'][0]['name']; // Output: Alice Johnson
// Format to TOON
$formatter = new Formatter();
$toonString = $formatter->format([    'tasks' => [        ['id' => 1, 'title' => 'Review PR', 'status' => 'pending'],
        ['id' => 2, 'title' => 'Deploy', 'status' => 'complete']
    ]
]);

Dart: toon by wisamidris77[10]

import 'package:toon_dart/toon_dart.dart';
void main() {
  final data = {
    'user': {
      'id': 123,
      'name': 'Ada',
      'tags': ['reading', 'gaming'],
      'active': true,
      'preferences': []
    }
  };
  print(toonEncode(data));
}

Go: gotoon by alpkeskin[11]

package main
import (
    "fmt"
    "log"
    "github.com/alpkeskin/gotoon"
)
func main() {
    data := map[string]interface{}{
        "users": []map[string]interface{}{
            {"id": 1, "name": "Alice", "role": "admin"},
            {"id": 2, "name": "Bob", "role": "user"},
        },
    }
    encoded, err := gotoon.Encode(data)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(encoded)
}

When to Use TOON

Tabular Data with Uniform Objects
RAG System Contexts [2]
Analytics Dashboards
AI-Powered Classification [2]

When NOT to Use TOON

1. Deep Nesting Required

2. Inconsistent Object Structures

# This is awkward in TOON
mixed_data[3]{id,name,email,phone,address}:
  1,Alice,alice@example.com,,,  # Missing phone and address
  2,Bob,,555-1234,,              # Missing email and address
  3,Carol,carol@example.com,,123 Main St

3. Tool Calling / Function Definitions

TOON is designed as an input format for data, not for defining tool schemas or function signatures.[3] All major LLM providers (OpenAI, Anthropic, Google) expect tool definitions in JSON format following OpenAPI specifications.[4][5]

Use JSON for:

Function calling schemas
API tool definitions
LLM output when integrating with existing systems

Conclusion

TOON represents a targeted optimization for a specific problem: efficiently passing structured, tabular data to LLMs. It doesn’t replace JSON across the board nor does it try to. Instead, it offers a specialized tool that, when applied to appropriate use cases, delivers measurable token reduction and cost savings.

The format’s compatibility by existing LLMs, combined with production-ready implementations across multiple languages, makes it accessible for rapid integration. As the AI ecosystem grows and token efficiency becomes increasingly important, formats like TOON demonstrate significant operational improvements.

For teams operating LLM-powered systems at scale, particularly those working with tabular data, RAG systems, or analytics applications, TOON deserves consideration as a optimization strategy.

References

[1]: Johann Schopplich, “TOON — Token-Oriented Object Notation”, GitHub Repository, https://github.com/johannschopplich/toon

[2]: “TOON Format Guide: Reduce LLM Token Usage by 50%”, Nihar Daily, https://www.nihardaily.com/131-token-oriented-object-notation-toon-your-path-to-50-token-savings

[3]: Abdulkader Safi, “TOON: The Token-Efficient Data Format for LLM Applications”, https://abdulkadersafi.com/blog/toon-the-token-efficient-data-format-for-llm-applications-complete-guide-2025

[4]: “Function Calling with LLMs”, Prompt Engineering Guide, https://www.promptingguide.ai/applications/function_calling

[5]: “Tool/function calling”, LangChain Documentation, https://python.langchain.com/v0.1/docs/modules/model_io/chat/function_calling/

[6]: Joyal Saji, “The Hidden Cost of Tokens in LLMs”, Medium, November 2025, https://medium.com/@joyalsaji/the-hidden-cost-of-tokens-in-llms-and-how-toon-smarter-strategies-can-shrink-it-827160c92787

[7]: Xavi Viró, “python-toon”, GitHub Repository, https://github.com/xaviviro/python-toon

[8]: David Pirogov, “toon-llm”, GitHub Repository, https://github.com/davidpirogov/toon-llm

[9]: Helge Sverre, “toon-php”, GitHub Repository, https://github.com/HelgeSverre/toon-php

[10]: Wisam Idris, “toon”, GitHub Repository, https://github.com/wisamidris77/toon

[11]: Alp Keskin, “gotoon”, GitHub Repository, https://github.com/alpkeskin/gotoon

DEV Community