DEV Community

Cover image for TOON vs JSON: The New Format Designed for AI
Akash Thakur
Akash Thakur

Posted on

TOON vs JSON: The New Format Designed for AI

How a novel data format is saving developers 30-60% on LLM token costs

If you've been working with Large Language Models, you've probably noticed something: feeding data to AI isn't free. Every JSON object you pass through an API costs tokens, and those tokens add up fast. Enter TOON (Token-Oriented Object Notation), a new serialization format designed specifically to solve this problem.

The Token Tax Problem

Let's start with a real example. Imagine you're building an app that sends employee data to an LLM for analysis:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin", "salary": 75000 },
    { "id": 2, "name": "Bob", "role": "user", "salary": 65000 },
    { "id": 3, "name": "Charlie", "role": "user", "salary": 70000 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This JSON snippet consumes 257 tokens. Now look at the same data in TOON:

users[3]{id,name,role,salary}:
1,Alice,admin,75000
2,Bob,user,65000
3,Charlie,user,70000
Enter fullscreen mode Exit fullscreen mode

Just 166 tokens — a 35% reduction. For this small example, the savings might seem trivial. But scale that to hundreds of API calls with thousands of records, and suddenly you're looking at real cost reductions.

What Makes TOON Different?

TOON borrows the best ideas from existing formats and optimizes them for LLM consumption:

1. Tabular Arrays: Declare Once, Use Many

The core insight behind TOON is simple: when you have uniform arrays of objects (same fields, same types), why repeat the keys for every single object?

JSON's Approach (repetitive):

[
  { "sku": "A1", "qty": 2, "price": 9.99 },
  { "sku": "B2", "qty": 1, "price": 14.50 }
]
Enter fullscreen mode Exit fullscreen mode

TOON's Approach (efficient):

[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
Enter fullscreen mode Exit fullscreen mode

The schema is declared once in the header {sku,qty,price}, then each row is just CSV-style values. This is where TOON shines brightest.

2. Smart Quoting

TOON only quotes strings when absolutely necessary:

  • hello world → No quotes needed (inner spaces are fine)
  • hello 👋 world → No quotes (Unicode is safe)
  • "hello, world" → Quotes required (contains comma delimiter)
  • " padded " → Quotes required (leading/trailing spaces)

This minimal-quoting approach saves tokens while keeping the data unambiguous.

3. Indentation Over Brackets

Like YAML, TOON uses indentation instead of curly braces for nested structures:

JSON:

{
  "user": {
    "id": 123,
    "profile": {
      "name": "Ada"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

TOON:

user:
  id: 123
  profile:
    name: Ada
Enter fullscreen mode Exit fullscreen mode

Cleaner, more readable, and fewer tokens.

4. Explicit Array Lengths

TOON includes the array length in brackets ([N]), which actually helps LLMs understand and validate the structure:

tags[3]: admin,ops,dev
Enter fullscreen mode Exit fullscreen mode

This explicit metadata reduces parsing errors when LLMs are generating or interpreting structured data.

Real-World Benchmarks

The TOON project ran comprehensive benchmarks across different data types and LLM models. Here's what they found:

Token Savings by Dataset

Dataset JSON Tokens TOON Tokens Savings
GitHub Repos (100 records) 15,145 8,745 42.3%
Analytics (180 days) 10,977 4,507 58.9%
E-commerce Orders 257 166 35.4%

The sweet spot? Uniform tabular data — records with consistent schemas across many rows. The more repetitive your JSON keys, the more TOON can optimize.

LLM Comprehension

But token efficiency doesn't matter if the LLM can't understand the format. The benchmarks tested 4 different models (GPT-5 Nano, Claude Haiku, Gemini Flash, Grok) on 154 data retrieval questions:

  • TOON accuracy: 70.1%
  • JSON accuracy: 65.4%
  • Token reduction: 46.3%

TOON not only saves tokens but actually improves LLM accuracy. The explicit structure (array lengths, field declarations) helps models parse and validate data more reliably.

When Should You Use TOON?

TOON isn't meant to replace JSON everywhere. Think of it as a specialized tool for a specific job.

Use TOON When:

  • Sending large datasets to LLMs (hundreds or thousands of records)
  • Working with uniform data structures (database query results, CSV exports, analytics)
  • Token costs are a significant concern
  • You're making frequent LLM API calls with structured data

Stick With JSON When:

  • Building traditional REST APIs
  • Storing data in databases
  • Working with deeply nested or non-uniform data
  • You need universal compatibility with existing tools

As the TOON documentation puts it: "Use JSON programmatically, convert to TOON for LLM input."

How to Get Started

TOON is available as an npm package with a simple API:

import { encode, decode } from '@toon-format/toon'

const data = {
  items: [
    { sku: 'A1', qty: 2, price: 9.99 },
    { sku: 'B2', qty: 1, price: 14.5 }
  ]
}

// Convert to TOON
const toon = encode(data)
console.log(toon)
// items[2]{sku,qty,price}:
// A1,2,9.99
// B2,1,14.5

// Convert back to JSON
const restored = decode(toon)
Enter fullscreen mode Exit fullscreen mode

There's also a CLI tool for quick conversions:

# Encode JSON to TOON
npx @toon-format/cli data.json -o data.toon

# Decode TOON to JSON
npx @toon-format/cli data.toon -o data.json

# Show token savings
npx @toon-format/cli data.json --stats
Enter fullscreen mode Exit fullscreen mode

Alternative Delimiters

For even more token efficiency, you can use tab or pipe delimiters instead of commas:

// Tab-separated (often more token-efficient)
encode(data, { delimiter: '\t' })

// Pipe-separated
encode(data, { delimiter: '|' })
Enter fullscreen mode Exit fullscreen mode

The Growing Ecosystem

While TOON is relatively new, the community is already building implementations across multiple languages:

  • Official: JavaScript/TypeScript, Python (in dev), Rust (in dev)
  • Community: PHP, Ruby, Go, Swift, Elixir, C++, Java, and more

The project maintains a comprehensive specification and conformance test suite to ensure compatibility across implementations.

The Bottom Line

TOON represents a shift in thinking about data formats. For decades, we've optimized for human readability and machine interoperability. Now, with LLMs consuming massive amounts of structured data, we need formats optimized for token efficiency and AI comprehension.

Is TOON going to replace JSON? No. But for the specific use case of feeding structured data to LLMs, it offers compelling advantages:

  • 30-60% token savings on uniform tabular data
  • Better LLM accuracy thanks to explicit structure
  • Drop-in conversion from existing JSON workflows
  • Growing ecosystem with multi-language support

If you're building AI-powered applications that consume significant amounts of structured data, TOON is worth exploring. Your token budget will thank you.


Resources:

Have you tried TOON in your projects? What kind of token savings are you seeing? Share your experience in the comments.

Top comments (5)

Collapse
 
ti_li_4d842ff8c7d8d71 profile image
铁泪Tiě Lèi

Cobol is that you?

IDENTIFICATION DIVISION.
       PROGRAM-ID. PRODUCT-LIST.

       DATA DIVISION.
       WORKING-STORAGE SECTION.

       *> 1. Define the raw data for the table.
       *>    Each record is 18 bytes:
       *>    ID    (2) = 01
       *>    Name (10) = "Laptop    "
       *>    Price (6) = 399990  (for 9(4)V99)
       01 WS-PRODUCT-TABLE-DATA.
           05 FILLER PIC X(18) VALUE "01Laptop    399990".
           05 FILLER PIC X(18) VALUE "02Mouse     014990".
           05 FILLER PIC X(18) VALUE "03Headset   049900".

       *> 2. Redefine that block of memory as a structured COBOL table.
       *>    This maps the fields to the raw data above.
       01 WS-PRODUCT-TABLE REDEFINES WS-PRODUCT-TABLE-DATA.
           05 WS-PRODUCT-ENTRY OCCURS 3 TIMES.
               10 WS-PRODUCT-ID    PIC 9(02).
               10 WS-PRODUCT-NAME  PIC X(10).
               10 WS-PRODUCT-PRICE PIC 9(04)V99. *> Implied decimal

       *> 3. Define helper variables for looping and display
       01 WS-INDEX           PIC 9(01).
       01 WS-DISPLAY-PRICE   PIC Z,ZZ9.99. *> For formatting the output

       PROCEDURE DIVISION.
       MAIN-PROCEDURE.

           *> Loop through the table (from 1 to 3) and display each record
           PERFORM VARYING WS-INDEX FROM 1 BY 1 UNTIL WS-INDEX > 3

               DISPLAY "--------------------------"
               DISPLAY "Product Record: " WS-INDEX

               *> Access data using the index: WS-FIELD-NAME(WS-INDEX)
               DISPLAY "  ID:    " WS-PRODUCT-ID(WS-INDEX)
               DISPLAY "  Name:  " WS-PRODUCT-NAME(WS-INDEX)

               *> Move the computational price (9V9) to a display-ready
               *> field (Z,ZZ9.99) to add the decimal point and $
               MOVE WS-PRODUCT-PRICE(WS-INDEX) TO WS-DISPLAY-PRICE
               DISPLAY "  Price: " WS-DISPLAY-PRICE

           END-PERFORM.

           DISPLAY "--------------------------".
           STOP RUN.
Enter fullscreen mode Exit fullscreen mode
Collapse
 
xwero profile image
david duymelinck

Why not use YAML if you want a more condensed format?

Collapse
 
marcu_loreto_3cabce9877ee profile image
Marcu Loreto

I got that TOON is one more compact data serialization format, similar in purpose to YAML

Collapse
 
xwero profile image
david duymelinck

True but why the need to invent a new format?
Most languages have mature YAML and CSV libraries if you need to condense the text that is send to an AI.

The main reason for TOON is probably that you can feed it content that is better compacted by YAML and better compacted by CSV. Instead of creating a function to switch the output between the two formats yourself.
Out of curiosity I asked an AI to create that function.

/**
 * Detects whether a JSON object represents a tree‑like structure
 * or a tabular (array‑of‑objects) structure, then converts it to
 * YAML or CSV accordingly.
 *
 * @param {object|array} data - Parsed JSON data.
 * @returns {string} YAML string for tree‑like data or CSV string for tabular data.
 */
function convertJson(data) {
  // Helper: check if value is a plain object (not array, not null)
  const isObject = v => v && typeof v === 'object' && !Array.isArray(v);

  // Detect tabular: an array where every element is an object
  // and each object has the same set of primitive keys.
  const isTabular = arr => {
    if (!Array.isArray(arr) || arr.length === 0) return false;
    // All elements must be objects (no nested arrays/objects as values)
    const firstKeys = Object.keys(arr[0]);
    if (firstKeys.length === 0) return false;

    return arr.every(item => {
      if (!isObject(item)) return false;
      const keys = Object.keys(item);
      // same keys as first row
      if (keys.length !== firstKeys.length) return false;
      for (let k of firstKeys) {
        if (!keys.includes(k)) return false;
        // values should be primitive (string, number, boolean, null)
        const v = item[k];
        if (v && typeof v === 'object') return false;
      }
      return true;
    });
  };

  // Convert tabular data to CSV
  const toCsv = arr => {
    const headers = Object.keys(arr[0]);
    const escape = v => {
      if (v == null) return '';
      const s = String(v);
      return s.includes(',') || s.includes('"') || s.includes('\n')
        ? `"${s.replace(/"/g, '""')}"`
        : s;
    };
    const rows = arr.map(row => headers.map(h => escape(row[h])).join(','));
    return [headers.join(','), ...rows].join('\n');
  };

  // Convert any object/array to YAML (simple implementation)
  const toYaml = obj => {
    const yaml = require('js-yaml'); // assumes js-yaml is available
    return yaml.dump(obj, { noRefs: true, indent: 2 });
  };

  // Main logic
  if (Array.isArray(data) && isTabular(data)) {
    return toCsv(data);
  } else {
    // For tree‑like structures we fall back to YAML
    // If js-yaml is not available, a minimal serializer could be used.
    try {
      return toYaml(data);
    } catch (e) {
      // Minimal fallback YAML serializer
      const serialize = (value, indent = 0) => {
        const pad = ' '.repeat(indent);
        if (Array.isArray(value)) {
          return value.map(v => `${pad}- ${serialize(v, indent + 2).trimStart()}`).join('\n');
        } else if (isObject(value)) {
          return Object.entries(value)
            .map(([k, v]) => `${pad}${k}: ${serialize(v, indent + 2).trimStart()}`)
            .join('\n');
        } else {
          return `${value}`;
        }
      };
      return serialize(data);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
Collapse
 
ketan_gupta_8e07aabce5378 profile image
Ketan Gupta

Thanks for sharing this. Built a quick tool for anyone wanting to test TOON formatting: bestaitools.tech/tools/json-to-toon Runs client-side, no data sent to servers 🛠️