DEV Community

Islam Hafez
Islam Hafez

Posted on

TOON: Token-Oriented Object Notation – A Complete Guide for LLM Data Efficiency

Token-Oriented Object Notation (TOON) is a compact, human-readable encoding of the JSON data model, specifically designed to minimize tokens and simplify structure for Large Language Models (LLMs). It acts as a drop-in, lossless representation of JSON, allowing developers to use familiar JSON programmatically while converting to TOON for efficient AI input.

TOON merges YAML’s indentation-based structure for nested objects with CSV-style tabular arrays for uniform data. Its primary strength is with uniform arrays of objects—multiple fields per row with consistent structure—achieving compactness similar to CSV, while maintaining explicit schema information for reliable LLM parsing. For deeply nested or non-uniform data, standard JSON may remain more efficient.

Why TOON?

With AI becoming more accessible, context windows are expanding, but tokens still cost money. Standard JSON is verbose:

{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {"id": 1, "name": "Blue Lake Trail", "distanceKm": 7.5, "elevationGain": 320, "companion": "ana", "wasSunny": true},
    {"id": 2, "name": "Ridge Overlook", "distanceKm": 9.2, "elevationGain": 540, "companion": "luis", "wasSunny": false},
    {"id": 3, "name": "Wildflower Loop", "distanceKm": 5.1, "elevationGain": 180, "companion": "sam", "wasSunny": true}
  ]
}
Enter fullscreen mode Exit fullscreen mode

TOON conveys the same information with fewer tokens, combining YAML-style indentation and CSV-style tabular arrays:

context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true
Enter fullscreen mode Exit fullscreen mode

Key Features

  • Token-Efficient & Accurate: TOON achieves up to 74% accuracy versus JSON’s 70% while using ~40% fewer tokens in mixed-structure benchmarks.
  • JSON-Compatible: Encodes objects, arrays, and primitives with deterministic, lossless round-trips.
  • LLM-Friendly: Explicit [N] lengths and {fields} headers provide clear schema information for reliable parsing.
  • Minimal Syntax: Indentation instead of braces, minimal quoting, YAML-like readability with CSV compactness.
  • Tabular Arrays: Uniform arrays collapse into tables, declaring fields once and streaming row values line by line.
  • Multi-Language Ecosystem: Implementations exist in TypeScript, Python, Go, Rust, .NET, and more.

Media Type & File Extension

  • File extension: .toon
  • Media type: text/toon
  • Always UTF-8 encoded.

When Not to Use TOON

  • Deeply nested or non-uniform structures: JSON compact may use fewer tokens.
  • Semi-uniform arrays: Token savings are reduced.
  • Pure tabular data: CSV may remain slightly smaller than TOON.
  • Latency-critical applications: Test performance against your specific model and setup.

Benchmarks

TOON consistently reduces token usage while improving comprehension across four major LLMs:

  • Efficiency Score: Accuracy % ÷ Tokens × 1,000
  • Mixed-Structure Track: TOON uses 39.6% fewer tokens while improving accuracy over standard JSON.
  • Flat-Only Track: TOON slightly exceeds CSV token count (+6%) for added structure and reliability.

Detailed per-model benchmarks show TOON outperforms JSON, YAML, and XML across varied datasets while remaining competitive with CSV on flat tabular data.

Installation & Quick Start

CLI (no installation required):

npx @toon-format/cli input.json -o output.toon
echo '{"name": "Ada", "role": "dev"}' | npx @toon-format/cli
Enter fullscreen mode Exit fullscreen mode

TypeScript Library:

npm install @toon-format/toon
Enter fullscreen mode Exit fullscreen mode

Example usage:

import { encode } from '@toon-format/toon'

const data = {
  users: [
    { id: 1, name: 'Alice', role: 'admin' },
    { id: 2, name: 'Bob', role: 'user' }
  ]
}

console.log(encode(data))
// users[2]{id,name,role}:
//   1,Alice,admin
//   2,Bob,user
Enter fullscreen mode Exit fullscreen mode

Playgrounds & Editor Support

  • Official Playground: Convert JSON to TOON in real time, compare token counts, share experiments.
  • Editor Support: VS Code extension, Tree-sitter grammar, Neovim plugin, and YAML highlighting for other editors.

Using TOON with LLMs

TOON’s structure is self-documenting. When prompting LLMs:

  • Wrap data in TOON code blocks.
  • Provide [N] lengths and {fields} headers.
  • Use tab delimiters for token efficiency.

Other Implementations

  • Official: .NET, Dart, Go, Java, Julia, Python, Rust, Swift
  • Community: Apex, C++, Clojure, Crystal, Elixir, Scala, Lua, OCaml, Perl, PHP, R, Ruby, Kotlin

TOON is stable, but still evolving. Contributions, feedback, and experimentation are encouraged.


Summary:
TOON provides a token-efficient, readable, LLM-friendly alternative to JSON, especially for uniform arrays of objects. It reduces token costs, increases parsing reliability, and is easy to integrate with existing JSON workflows. For developers working with LLMs at scale, TOON is a powerful addition to the toolset.

Top comments (0)