Kumaravelu Saraboji Mahalingam

Posted on May 2 • Edited on May 3

TOON File Format Anatomy: Schema-Once, Data-Many for LLM Pipelines 🎯📄

#llm #dataengineering #promptengineering #ai

If you work with RAG pipelines, agent tools, or LLM APIs, you’ve probably noticed something frustrating: sometimes the biggest cost in a prompt is not the data itself — it’s the repeated JSON structure wrapped around it.

That is exactly the problem TOON tries to solve.

TOON (Token-Oriented Object Notation) is a compact, human-readable encoding of the JSON data model designed for LLM prompts. It keeps the same logical structure as JSON, but reduces token overhead by declaring structure once and streaming the data in a denser format.

In this post, we’ll break down the anatomy of the TOON format, explain where it fits in modern AI pipelines, and compare it with JSON, Arrow, and Parquet so you know when it is a smart choice — and when it is not.

Why TOON matters ⚡

In many LLM workflows, especially RAG, the bottleneck is not storage size on disk. It is prompt size, token cost, and how much useful context you can fit into the model window.

JSON is great for APIs and interoperability, but it becomes repetitive fast when you are passing arrays of objects. If every retrieved chunk repeats keys like id, title, source, score, and text, the model spends tokens reading syntax that carries very little new information.

TOON tackles that by using a simple idea: declare structure once, stream values many times.

Start with the big picture 🗂️

The easiest way to understand TOON is to think of it as a hybrid of:

JSON’s data model.
YAML-style readability and indentation.
CSV-style compact rows for uniform arrays.

That combination gives TOON a very specific sweet spot: uniform arrays of objects with primitive-valued fields.

So instead of writing something like this in JSON:

[
  {"id": 1, "name": "Alice", "role": "admin"},
  {"id": 2, "name": "Bob", "role": "user"}
]

TOON can express the same structure much more compactly like this:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

That is the core TOON mental model right there: length + fields + rows.

The core anatomy 🧱

At a high level, a TOON tabular section is made of three important parts:

Array length using [N].
Field declaration using {field1,field2,...}.
Data rows that follow the declared field order.

Here is a simplified Mermaid view:

This is one of the most important design ideas in TOON. Instead of repeating object keys on every row, the schema is declared once and every subsequent line becomes mostly pure data.

Array length: small detail, big impact 🔢

The [N] part is more useful than it first appears. TOON documentation explicitly notes that the array length helps models answer dataset-size questions and detect truncation or malformed output.

That makes TOON interesting not just for compactness, but also for LLM guardrails. If a model was supposed to emit 50 rows and only returns 32, the mismatch becomes immediately visible.

This is a subtle but powerful improvement over plain CSV snippets in prompts, because CSV usually has no built-in count or schema declaration at the array boundary.

Field declaration: schema once 🏷️

The {fields} header is where TOON behaves a little like a lightweight schema language. It defines the expected columns and the order in which row values must appear.

That matters for both humans and models. Humans can scan the header once and understand the shape of the data; models can use that header as a structural constraint when interpreting each row.

For uniform, tabular payloads, this gives TOON a “column header + rows” feel that is much denser than JSON without losing meaning.

Rows: where the token savings come from 🚀

The real token savings show up in the rows. Once the field names are declared, every additional object no longer needs repeated key names, braces, quotes, and punctuation-heavy JSON structure.

This is why TOON often reports savings in the 30% to 60% range compared with JSON for suitable payloads, especially arrays of similarly structured objects used in RAG or tool outputs.

It is not magic. It is just removing repeated syntax and shifting the payload closer to “schema once, values many.”

Why TOON fits RAG especially well 🧠

RAG systems often retrieve multiple chunks with repeated metadata fields like chunk id, document id, title, source, section, score, and text. That is exactly the kind of repeated-object structure where JSON becomes noisy and expensive.

A practical pattern is this:

Store source data in a durable format such as Parquet or a database table.
Retrieve relevant rows or chunks for a query.
Convert the final prompt payload from JSON-like objects into TOON before sending it to the LLM.

That means TOON is usually not your storage layer. It is your LLM-facing delivery layer.

A practical RAG example 🧪

Imagine your retriever returns five chunks like this:

[
  {"chunk_id": 101, "doc": "policy.pdf", "section": "refunds", "score": 0.93, "text": "Customers can request refunds within 30 days..."},
  {"chunk_id": 205, "doc": "policy.pdf", "section": "cancellations", "score": 0.90, "text": "Cancellation fees apply after processing..."}
]

The same payload in TOON could look like this:

chunks[2]{chunk_id,doc,section,score,text}:
  101,policy.pdf,refunds,0.93,"Customers can request refunds within 30 days..."
  205,policy.pdf,cancellations,0.90,"Cancellation fees apply after processing..."

Same information, less repeated scaffolding. That usually means you can fit more retrieved chunks inside the same context window, which is one of the most practical reasons TOON is interesting for RAG.

TOON is not Parquet or Arrow 🚫

This is the most important framing if you want to write about TOON alongside Parquet and Arrow.

TOON is not a binary analytical file format. It is not trying to replace Parquet for storage or Arrow for in-memory interchange. It is a prompt-optimized text representation for structured data.

That means TOON belongs closer to the LLM boundary, while Parquet and Arrow belong deeper in the data platform stack.

A simple mental model is:

Parquet stores analytical data efficiently on disk.
Arrow moves typed columnar data efficiently between systems and memory spaces.
TOON presents structured data efficiently to language models.

A useful pipeline mental model 🔄

For a data engineer, the most realistic production story is not “TOON everywhere.” It is something more like this:

This architecture lets each format do what it is best at. Parquet stays the durable analytical format, Arrow can still be the fast in-memory interchange layer inside your engine, and TOON becomes the compact final-mile representation sent to the model.

Comparison with JSON, Arrow, and Parquet ⚖️

Here is the practical difference between the formats:

Format	Primary goal	Best for	Strength	Limitation
JSON	General-purpose structured interchange	APIs, config, documents	Ubiquitous and flexible	Repeats keys heavily in prompt payloads.
TOON	Token-efficient structured prompt representation	RAG context, tool outputs, LLM inputs	Compact, human-readable, schema-once row encoding.	Best on uniform arrays; less compelling for irregular nested data.
Arrow	In-memory columnar interchange	Dataframes, engines, cross-language analytics	Typed, fast, buffer-oriented interchange.	Not human-readable; not meant as prompt text format.
Parquet	Compressed analytical storage	Data lake and warehouse storage	Efficient on-disk analytics and selective reads	Not prompt-friendly and not human-readable in raw form.

If you are explaining this to readers in one sentence, the short version is: JSON is universal, TOON is LLM-friendly, Arrow is execution-friendly, and Parquet is storage-friendly.

Where TOON shines most ✨

TOON shines when your payload is dominated by repeated records that share the same shape. That is common in retrieval results, catalog-like datasets, logs, evaluation samples, classification inputs, and agent tool outputs.

It is especially attractive when every token matters — either because of context window limits, API cost, or the need to fit more relevant examples into one prompt.

In other words, TOON is most compelling when the structure is repetitive and the consumer is an LLM.

Where TOON is weaker ⚠️

TOON is not a universal replacement for JSON. Its strongest form is the tabular encoding for uniform arrays, and that means its benefits are smaller for deeply nested, irregular, or highly heterogeneous payloads.

It is also still early in ecosystem maturity compared with JSON, Arrow, or Parquet. That means you should think of it as a targeted optimization layer rather than a default foundation for every application format.

Final mental model 🧠

If you only remember one thing, remember this:

JSON repeats structure with every object.
TOON declares structure once and streams rows compactly.
Arrow optimizes typed in-memory interchange.
Parquet optimizes durable analytical storage.

That is why TOON is interesting. It is not competing with Parquet at the storage layer or Arrow at the execution layer. It is optimizing the final stretch where structured data becomes prompt context for a model.

If your stack already stores data in Parquet and processes it with Arrow-backed tools, TOON can be a neat final-mile format for presenting retrieved rows to an LLM with less token overhead and clearer structure.

Want to convert between JSON & TOON file formats? Check out my tool here - https://databro.dev/tools/toon-json-converter/

DEV Community