Akash Thakur

Posted on Nov 9

TOON vs JSON: The New Format Designed for AI

#ai #llm #json #programming

How a novel data format is saving developers 30-60% on LLM token costs

If you've been working with Large Language Models, you've probably noticed something: feeding data to AI isn't free. Every JSON object you pass through an API costs tokens, and those tokens add up fast. Enter TOON (Token-Oriented Object Notation), a new serialization format designed specifically to solve this problem.

The Token Tax Problem

Let's start with a real example. Imagine you're building an app that sends employee data to an LLM for analysis:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin", "salary": 75000 },
    { "id": 2, "name": "Bob", "role": "user", "salary": 65000 },
    { "id": 3, "name": "Charlie", "role": "user", "salary": 70000 }
  ]
}

This JSON snippet consumes 257 tokens. Now look at the same data in TOON:

users[3]{id,name,role,salary}:
1,Alice,admin,75000
2,Bob,user,65000
3,Charlie,user,70000

Just 166 tokens — a 35% reduction. For this small example, the savings might seem trivial. But scale that to hundreds of API calls with thousands of records, and suddenly you're looking at real cost reductions.

What Makes TOON Different?

TOON borrows the best ideas from existing formats and optimizes them for LLM consumption:

1. Tabular Arrays: Declare Once, Use Many

The core insight behind TOON is simple: when you have uniform arrays of objects (same fields, same types), why repeat the keys for every single object?

JSON's Approach (repetitive):

[
  { "sku": "A1", "qty": 2, "price": 9.99 },
  { "sku": "B2", "qty": 1, "price": 14.50 }
]

TOON's Approach (efficient):

[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5

The schema is declared once in the header {sku,qty,price}, then each row is just CSV-style values. This is where TOON shines brightest.

2. Smart Quoting

TOON only quotes strings when absolutely necessary:

hello world → No quotes needed (inner spaces are fine)
hello 👋 world → No quotes (Unicode is safe)
"hello, world" → Quotes required (contains comma delimiter)
" padded " → Quotes required (leading/trailing spaces)

This minimal-quoting approach saves tokens while keeping the data unambiguous.

3. Indentation Over Brackets

Like YAML, TOON uses indentation instead of curly braces for nested structures:

JSON:

{
  "user": {
    "id": 123,
    "profile": {
      "name": "Ada"
    }
  }
}

TOON:

user:
  id: 123
  profile:
    name: Ada

Cleaner, more readable, and fewer tokens.

4. Explicit Array Lengths

TOON includes the array length in brackets ([N]), which actually helps LLMs understand and validate the structure:

tags[3]: admin,ops,dev

This explicit metadata reduces parsing errors when LLMs are generating or interpreting structured data.

Real-World Benchmarks

The TOON project ran comprehensive benchmarks across different data types and LLM models. Here's what they found:

Token Savings by Dataset

Dataset	JSON Tokens	TOON Tokens	Savings
GitHub Repos (100 records)	15,145	8,745	42.3%
Analytics (180 days)	10,977	4,507	58.9%
E-commerce Orders	257	166	35.4%

The sweet spot? Uniform tabular data — records with consistent schemas across many rows. The more repetitive your JSON keys, the more TOON can optimize.

LLM Comprehension

But token efficiency doesn't matter if the LLM can't understand the format. The benchmarks tested 4 different models (GPT-5 Nano, Claude Haiku, Gemini Flash, Grok) on 154 data retrieval questions:

TOON accuracy: 70.1%
JSON accuracy: 65.4%
Token reduction: 46.3%

TOON not only saves tokens but actually improves LLM accuracy. The explicit structure (array lengths, field declarations) helps models parse and validate data more reliably.

When Should You Use TOON?

TOON isn't meant to replace JSON everywhere. Think of it as a specialized tool for a specific job.

✅ Use TOON When:

Sending large datasets to LLMs (hundreds or thousands of records)
Working with uniform data structures (database query results, CSV exports, analytics)
Token costs are a significant concern
You're making frequent LLM API calls with structured data

❌ Stick With JSON When:

Building traditional REST APIs
Storing data in databases
Working with deeply nested or non-uniform data
You need universal compatibility with existing tools

As the TOON documentation puts it: "Use JSON programmatically, convert to TOON for LLM input."

How to Get Started

TOON is available as an npm package with a simple API:

import { encode, decode } from '@toon-format/toon'

const data = {
  items: [
    { sku: 'A1', qty: 2, price: 9.99 },
    { sku: 'B2', qty: 1, price: 14.5 }
  ]
}

// Convert to TOON
const toon = encode(data)
console.log(toon)
// items[2]{sku,qty,price}:
// A1,2,9.99
// B2,1,14.5

// Convert back to JSON
const restored = decode(toon)

There's also a CLI tool for quick conversions:

# Encode JSON to TOON
npx @toon-format/cli data.json -o data.toon

# Decode TOON to JSON
npx @toon-format/cli data.toon -o data.json

# Show token savings
npx @toon-format/cli data.json --stats

Alternative Delimiters

For even more token efficiency, you can use tab or pipe delimiters instead of commas:

// Tab-separated (often more token-efficient)
encode(data, { delimiter: '\t' })

// Pipe-separated
encode(data, { delimiter: '|' })

The Growing Ecosystem

While TOON is relatively new, the community is already building implementations across multiple languages:

Official: JavaScript/TypeScript, Python (in dev), Rust (in dev)
Community: PHP, Ruby, Go, Swift, Elixir, C++, Java, and more

The project maintains a comprehensive specification and conformance test suite to ensure compatibility across implementations.

The Bottom Line

TOON represents a shift in thinking about data formats. For decades, we've optimized for human readability and machine interoperability. Now, with LLMs consuming massive amounts of structured data, we need formats optimized for token efficiency and AI comprehension.

Is TOON going to replace JSON? No. But for the specific use case of feeding structured data to LLMs, it offers compelling advantages:

30-60% token savings on uniform tabular data
Better LLM accuracy thanks to explicit structure
Drop-in conversion from existing JSON workflows
Growing ecosystem with multi-language support

If you're building AI-powered applications that consume significant amounts of structured data, TOON is worth exploring. Your token budget will thank you.

Resources:

Have you tried TOON in your projects? What kind of token savings are you seeing? Share your experience in the comments.

Top comments (20)

Ali Farhat • Nov 12 • Edited

Thank you for the article, would you mind adding our JSON to TOON tool in your article?
scalevise.com/json-toon-converter

铁泪Tiě Lèi • Nov 10

Cobol is that you?

IDENTIFICATION DIVISION.
       PROGRAM-ID. PRODUCT-LIST.

       DATA DIVISION.
       WORKING-STORAGE SECTION.

       *> 1. Define the raw data for the table.
       *>    Each record is 18 bytes:
       *>    ID    (2) = 01
       *>    Name (10) = "Laptop    "
       *>    Price (6) = 399990  (for 9(4)V99)
       01 WS-PRODUCT-TABLE-DATA.
           05 FILLER PIC X(18) VALUE "01Laptop    399990".
           05 FILLER PIC X(18) VALUE "02Mouse     014990".
           05 FILLER PIC X(18) VALUE "03Headset   049900".

       *> 2. Redefine that block of memory as a structured COBOL table.
       *>    This maps the fields to the raw data above.
       01 WS-PRODUCT-TABLE REDEFINES WS-PRODUCT-TABLE-DATA.
           05 WS-PRODUCT-ENTRY OCCURS 3 TIMES.
               10 WS-PRODUCT-ID    PIC 9(02).
               10 WS-PRODUCT-NAME  PIC X(10).
               10 WS-PRODUCT-PRICE PIC 9(04)V99. *> Implied decimal

       *> 3. Define helper variables for looping and display
       01 WS-INDEX           PIC 9(01).
       01 WS-DISPLAY-PRICE   PIC Z,ZZ9.99. *> For formatting the output

       PROCEDURE DIVISION.
       MAIN-PROCEDURE.

           *> Loop through the table (from 1 to 3) and display each record
           PERFORM VARYING WS-INDEX FROM 1 BY 1 UNTIL WS-INDEX > 3

               DISPLAY "--------------------------"
               DISPLAY "Product Record: " WS-INDEX

               *> Access data using the index: WS-FIELD-NAME(WS-INDEX)
               DISPLAY "  ID:    " WS-PRODUCT-ID(WS-INDEX)
               DISPLAY "  Name:  " WS-PRODUCT-NAME(WS-INDEX)

               *> Move the computational price (9V9) to a display-ready
               *> field (Z,ZZ9.99) to add the decimal point and $
               MOVE WS-PRODUCT-PRICE(WS-INDEX) TO WS-DISPLAY-PRICE
               DISPLAY "  Price: " WS-DISPLAY-PRICE

           END-PERFORM.

           DISPLAY "--------------------------".
           STOP RUN.

david duymelinck • Nov 9

Why not use YAML if you want a more condensed format?

Shawn Bullock • Nov 17

Our system processes billions of tokens each month from the database alone, we can't use YAML or JSON because they are too token heavy. For flat results (like a database query) we simply return it as CSV (which is even 30% fewer tokens than TOON). We haven't adopted TOON yet (we have our own for structured objects) but its definitely more token friendly than YAML. At the scale we operate, YAML is expensive.

david duymelinck • Nov 17

I based my statement on the examples I have seen at that moment. And I agree if you use YAML for tabular data it is expensive. That is why in my other comment I mentioned a switch based on the shape of the data. CSV for tabular data and YAML for hierarchical data.
You can even have CSV in YAML.

people: |
  id,name,age
  1,Alice,30
  2,Bob,25

If that wasn't possible, I would go all in for TOON for those mixed cases.

TOON is YAML with hierarchical data. So it doesn't reduces tokens. And as you mention, in the case of tabular data CSV is better.

If you can show where TOON is saving tokens over the smart use of YAML and CSV, I'm glad to stand corrected.

Marcu Loreto • Nov 10

I got that TOON is one more compact data serialization format, similar in purpose to YAML

david duymelinck • Nov 10

True but why the need to invent a new format?
Most languages have mature YAML and CSV libraries if you need to condense the text that is send to an AI.

The main reason for TOON is probably that you can feed it content that is better compacted by YAML and better compacted by CSV. Instead of creating a function to switch the output between the two formats yourself.
Out of curiosity I asked an AI to create that function.

/**
 * Detects whether a JSON object represents a tree‑like structure
 * or a tabular (array‑of‑objects) structure, then converts it to
 * YAML or CSV accordingly.
 *
 * @param {object|array} data - Parsed JSON data.
 * @returns {string} YAML string for tree‑like data or CSV string for tabular data.
 */
function convertJson(data) {
  // Helper: check if value is a plain object (not array, not null)
  const isObject = v => v && typeof v === 'object' && !Array.isArray(v);

  // Detect tabular: an array where every element is an object
  // and each object has the same set of primitive keys.
  const isTabular = arr => {
    if (!Array.isArray(arr) || arr.length === 0) return false;
    // All elements must be objects (no nested arrays/objects as values)
    const firstKeys = Object.keys(arr[0]);
    if (firstKeys.length === 0) return false;

    return arr.every(item => {
      if (!isObject(item)) return false;
      const keys = Object.keys(item);
      // same keys as first row
      if (keys.length !== firstKeys.length) return false;
      for (let k of firstKeys) {
        if (!keys.includes(k)) return false;
        // values should be primitive (string, number, boolean, null)
        const v = item[k];
        if (v && typeof v === 'object') return false;
      }
      return true;
    });
  };

  // Convert tabular data to CSV
  const toCsv = arr => {
    const headers = Object.keys(arr[0]);
    const escape = v => {
      if (v == null) return '';
      const s = String(v);
      return s.includes(',') || s.includes('"') || s.includes('\n')
        ? `"${s.replace(/"/g, '""')}"`
        : s;
    };
    const rows = arr.map(row => headers.map(h => escape(row[h])).join(','));
    return [headers.join(','), ...rows].join('\n');
  };

  // Convert any object/array to YAML (simple implementation)
  const toYaml = obj => {
    const yaml = require('js-yaml'); // assumes js-yaml is available
    return yaml.dump(obj, { noRefs: true, indent: 2 });
  };

  // Main logic
  if (Array.isArray(data) && isTabular(data)) {
    return toCsv(data);
  } else {
    // For tree‑like structures we fall back to YAML
    // If js-yaml is not available, a minimal serializer could be used.
    try {
      return toYaml(data);
    } catch (e) {
      // Minimal fallback YAML serializer
      const serialize = (value, indent = 0) => {
        const pad = ' '.repeat(indent);
        if (Array.isArray(value)) {
          return value.map(v => `${pad}- ${serialize(v, indent + 2).trimStart()}`).join('\n');
        } else if (isObject(value)) {
          return Object.entries(value)
            .map(([k, v]) => `${pad}${k}: ${serialize(v, indent + 2).trimStart()}`)
            .join('\n');
        } else {
          return `${value}`;
        }
      };
      return serialize(data);
    }
  }
}

Algis • Nov 17

I’ve been experimenting with this new “code execution with MCP” concept and recently implemented it in my OSS mcpproxy project. The current version can generate a JSON → TOON converter on the fly for any MCP tool within an agent session, which makes data format conversion and reuse incredibly easy.

Prompt example: Read this post: dev.to/akki907/toon-vs-json-the-new-format-designed-for-ai-nk5. Then implement and run code that converts the tool's JSON output to TOON format, using the tool. Output ONLY TOON data in code execution response

David Burton • Nov 11

Surely XML would do better in peak accuracy because a close tag explicitly matches to an opening tag?
Surely Markdown and YAML offer similar compactness and readability to TOON with better support?
It would be good to see the article amended to test against the JSON, XML, YAML, Markdown and CSV (or perhaps better still another delimited format like tab-delimited) data, to get a better idea as to how it compares to a more complete range of options

richardevcom • Nov 17

I'll just leave this here for people still taking this seriously 👀

Gaurav Chandra • Nov 17

How it is different from TONL github.com/tonl-dev/tonl ?

Salient PR • Nov 13

Thank you for the article)

Ketan Gupta • Nov 10

Thanks for sharing this. Built a quick tool for anyone wanting to test TOON formatting: bestaitools.tech/tools/json-to-toon Runs client-side, no data sent to servers 🛠️

JensenHolt • Nov 13

Thanks for sharing.

View full discussion (20 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.