Andrei Fedoseev

Posted on Apr 5

I Replaced JSON with TOON in My LLM Prompts and Saved 40% on Tokens.

#ai #json #llm #webdev

Hey! I'm Andrey, a frontend developer at Cloud.ru, and I write about frontend and AI on my blog and Telegram channel.

I work with LLM APIs every day. And every day I send structured data into context: product lists, logs, users, metrics. All of it - JSON. All of it - money.

At some point I calculated how many tokens in my prompts go to curly braces, quotes, and repeated keys. Turns out - a lot. Way too much.

Then I tried TOON. Here's what happened.

The Problem: JSON Is a Generous Format

Take a typical case. You're building a RAG system or an AI assistant that analyzes data. Your prompt pulls in a list of 50 records. Here's one record in JSON:

{"id": 2001, "timestamp": "2025-11-18T08:14:23Z", "level": "error", "service": "auth-api", "ip": "172.16.4.21", "message": "Auth failed for user", "code": "AUTH_401"}

Now multiply by 50. Each record repeats 7 keys: "id", "timestamp", "level", "service", "ip", "message", "code". Plus quotes around every key and string value. Plus curly braces. Plus commas.

Across 50 records that's ~350 redundant key repetitions and hundreds of characters of syntactic overhead. The model tokenizes all of it. You pay for all of it.

The Fix: TOON in 30 Seconds

TOON (Token-Oriented Object Notation) encodes the same data with the same structure, but without repetition. Keys are declared once in a header, then it's values only:

logs[3]{id,timestamp,level,service,ip,message,code}:
 2001,2025-11-18T08:14:23Z,error,auth-api,172.16.4.21,Auth failed for user,AUTH_401
 2002,2025-11-18T08:14:24Z,warn,payment,172.16.4.22,Timeout on payment gateway,PAY_TIMEOUT
 2003,2025-11-18T08:14:25Z,info,user-svc,172.16.4.23,User profile updated,USR_200

The header logs[3]{id,timestamp,level,service,ip,message,code}: says: array of 3 elements, fields are these. That's it. Then rows of comma-separated values. No quotes around keys, no {} per object, no duplication.

JSON -> TOON -> JSON conversion is lossless, 1:1. It's not a different data model - it's a different encoding of the same model.

Counting Tokens: A Real Test

I took a dataset of 50 log entries (7 fields each) and ran it through a tokenizer:

Python:

import json
import toon_format  # pip install toon-format
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

with open("logs.json") as f:
    data = json.load(f)

json_str = json.dumps(data, indent=2)
json_compact = json.dumps(data)
toon_str = toon_format.encode(data)

print(f"JSON (formatted):  {len(enc.encode(json_str))} tokens")
print(f"JSON (compact):    {len(enc.encode(json_compact))} tokens")
print(f"TOON:              {len(enc.encode(toon_str))} tokens")

TypeScript:

import { encode as toToon } from "@toon-format/toon";
import { encode as tokenize } from "gpt-3-encoder";
import fs from "fs";

const data = JSON.parse(fs.readFileSync("./logs.json", "utf8"));

const jsonFormatted = JSON.stringify(data, null, 2);
const jsonCompact = JSON.stringify(data);
const toonStr = toToon(data);

console.log(`JSON (formatted):  ${tokenize(jsonFormatted).length} tokens`);
console.log(`JSON (compact):    ${tokenize(jsonCompact).length} tokens`);
console.log(`TOON:              ${tokenize(toonStr).length} tokens`);

Results on real data (from TOON benchmarks):

Format	Tokens	Savings vs JSON
JSON (formatted)	379	-
JSON (compact)	236	-37.7%
TOON	150	-60.4%

60% savings. On a single prompt. Not hypothetically - measured by tokenizer.

Counting Money: How Much You're Overpaying

Now the fun part. Current API prices (April 2026):

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o	$2.50	$10.00
GPT-4.1	$2.00	$8.00
Claude Sonnet 4.6	$3.00	$15.00
Claude Opus 4.6	$5.00	$25.00

Say you make 10,000 requests per day, each containing an array of 100 objects (typical RAG/analytics). Let's calculate for GPT-4o:

	JSON	TOON	Difference
Tokens per request	~3,200	~1,850	-42%
Tokens per day	32M	18.5M	-13.5M
Cost per day	$80	$46.25	-$33.75
Per month	$2,400	$1,387	-$1,013
Per year	$28,800	$16,650	-$12,150

On Claude Opus 4.6 (input $5/1M) the savings are even bigger:

	JSON	TOON	Difference
Per month	$4,800	$2,775	-$2,025
Per year	$57,600	$33,300	-$24,300

$12-24K per year - on input tokens alone, on a single endpoint. If you have multiple pipelines - multiply accordingly.

Integration: 5 Minutes, 4 Lines of Code

You don't need to rewrite your architecture. TOON plugs in as a layer before sending to the API:

Python + OpenAI:

import openai
import toon_format

def analyze_with_llm(data: list[dict]) -> str:
    toon_str = toon_format.encode({"records": data})  # JSON -> TOON

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": f"Analyze this data and find anomalies:\n\n{toon_str}"
        }]
    )
    return response.choices[0].message.content

TypeScript + Anthropic:

import Anthropic from "@anthropic-ai/sdk";
import { encode as toToon } from "@toon-format/toon";

async function analyzeData(records: any[]) {
  const toonData = toToon({ records });

  const response = await anthropic.messages.create({
    model: "claude-sonnet-4-6-20250514",
    max_tokens: 1024,
    messages: [{
      role: "user",
      content: `Analyze this data and find anomalies:\n\n${toonData}`
    }]
  });
  return response.content[0].text;
}

One line - toon_format.encode() - and you save 40-60% of tokens. The model responds in its usual format, nothing to change on the output side.

The Big Comparison: TOON vs Everything Else

No format is perfect for every case. Here's an honest breakdown:

Criteria	JSON	JSON compact	YAML	CSV	TOON	TRON
Tokens (tabular data)	100%	~63%	~72%	~38%	~40%	~55%
Tokens (nested data)	100%	~78%	~85%	n/a	~67%	~75%
LLM accuracy	75.0%	73.7%	74.5%	~72%	76.4%	-
Nested structures	excellent	excellent	good	none	medium	good
Pipeline compatibility	everywhere	everywhere	wide	wide	needs SDK	JSON-compatible
LLM familiarity (training data)	huge	huge	large	large	minimal	minimal
Lossless round-trip with JSON	yes	yes	caveats	no	yes	yes

Key takeaways:

TOON vs CSV: CSV is ~5-6% more compact for flat tables but doesn't support nesting or types. TOON adds minimal overhead but the model parses data more accurately.
TOON vs YAML: TOON saves 48% tokens on tabular data. YAML is better for deeply nested configs.
TOON vs JSON compact: Even minified JSON loses to TOON by 35% on tables. On nested data the gap is smaller (~15%).
TOON vs TRON: TRON is JSON-compatible (parseable with any JSON parser). TOON is more compact but requires a dedicated parser. Choose TRON if you don't want to change your toolchain.

When to Use TOON (and When Not To)

Use TOON when:

Uniform arrays of objects - user lists, products, logs, metrics. 40-60% savings.
RAG pipelines - dozens of same-structure documents pulled into context.
Batch processing - thousands of requests per day, every percentage of savings = real money.
Long contexts - when data doesn't fit the context window and shrinking it is critical.

Don't use TOON when:

Deeply nested structures (4+ levels). LLM accuracy drops to 43% on nested data. JSON is more reliable.
Data goes to a regular service, not an LLM. TOON is a prompt format, not for REST APIs or databases.
Flat tables with no nesting. CSV is 5-6% more compact and needs no SDK.
You need JSON Schema validation. TOON is a different syntax - existing validators won't work.

Ecosystem: What Already Works

Language	Package	Status
TypeScript	`@toon-format/toon`	Reference implementation
Python	`toon-format` / `python-toon`	Stable
Go	`toon-format/go-toon`	In development
Rust	`toon-format/toon-rs`	In development
.NET	`toon-format/toon-dotnet`	In development
CLI	`npx @toon-format/cli`	Works

Quick start:

# TypeScript
npm install @toon-format/toon

# Python
pip install toon-format

# Convert a file via CLI
npx @toon-format/cli data.json -o data.toon
npx @toon-format/cli data.toon -o data.json  # and back

The spec is open, ABNF grammar is documented, test fixtures are available: toon-format/spec.

Bottom Line

TOON is not a JSON replacement. JSON will remain the standard for APIs, configs, and storage. But if you're sending structured data to an LLM - you're literally throwing money at syntactic overhead.

Four lines of code. Five minutes to integrate. Minus 40-60% tokens. Minus $12-24K per year at moderate load.

Try it on one endpoint. Measure. Calculate. Your API budget will thank you. Or at least stop quietly sobbing at night.

If this was useful - I write about frontend, AI, and practical dev stuff on my blog and Telegram channel. Come say hi!

Links:

TOON Spec | Benchmarks | TypeScript SDK | Python SDK
Official site | TOON vs TRON comparison

DEV Community