DEV Community

Cover image for I Replaced JSON with TOON in My LLM Prompts and Saved 40% on Tokens.
Andrei Fedoseev
Andrei Fedoseev

Posted on

I Replaced JSON with TOON in My LLM Prompts and Saved 40% on Tokens.

Hey! I'm Andrey, a frontend developer at Cloud.ru, and I write about frontend and AI on my blog and Telegram channel.

I work with LLM APIs every day. And every day I send structured data into context: product lists, logs, users, metrics. All of it - JSON. All of it - money.

At some point I calculated how many tokens in my prompts go to curly braces, quotes, and repeated keys. Turns out - a lot. Way too much.

Then I tried TOON. Here's what happened.

The Problem: JSON Is a Generous Format

Take a typical case. You're building a RAG system or an AI assistant that analyzes data. Your prompt pulls in a list of 50 records. Here's one record in JSON:

{"id": 2001, "timestamp": "2025-11-18T08:14:23Z", "level": "error", "service": "auth-api", "ip": "172.16.4.21", "message": "Auth failed for user", "code": "AUTH_401"}
Enter fullscreen mode Exit fullscreen mode

Now multiply by 50. Each record repeats 7 keys: "id", "timestamp", "level", "service", "ip", "message", "code". Plus quotes around every key and string value. Plus curly braces. Plus commas.

Across 50 records that's ~350 redundant key repetitions and hundreds of characters of syntactic overhead. The model tokenizes all of it. You pay for all of it.

The Fix: TOON in 30 Seconds

TOON (Token-Oriented Object Notation) encodes the same data with the same structure, but without repetition. Keys are declared once in a header, then it's values only:

logs[3]{id,timestamp,level,service,ip,message,code}:
 2001,2025-11-18T08:14:23Z,error,auth-api,172.16.4.21,Auth failed for user,AUTH_401
 2002,2025-11-18T08:14:24Z,warn,payment,172.16.4.22,Timeout on payment gateway,PAY_TIMEOUT
 2003,2025-11-18T08:14:25Z,info,user-svc,172.16.4.23,User profile updated,USR_200
Enter fullscreen mode Exit fullscreen mode

The header logs[3]{id,timestamp,level,service,ip,message,code}: says: array of 3 elements, fields are these. That's it. Then rows of comma-separated values. No quotes around keys, no {} per object, no duplication.

JSON -> TOON -> JSON conversion is lossless, 1:1. It's not a different data model - it's a different encoding of the same model.

Counting Tokens: A Real Test

I took a dataset of 50 log entries (7 fields each) and ran it through a tokenizer:

Python:

import json
import toon_format  # pip install toon-format
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

with open("logs.json") as f:
    data = json.load(f)

json_str = json.dumps(data, indent=2)
json_compact = json.dumps(data)
toon_str = toon_format.encode(data)

print(f"JSON (formatted):  {len(enc.encode(json_str))} tokens")
print(f"JSON (compact):    {len(enc.encode(json_compact))} tokens")
print(f"TOON:              {len(enc.encode(toon_str))} tokens")
Enter fullscreen mode Exit fullscreen mode

TypeScript:

import { encode as toToon } from "@toon-format/toon";
import { encode as tokenize } from "gpt-3-encoder";
import fs from "fs";

const data = JSON.parse(fs.readFileSync("./logs.json", "utf8"));

const jsonFormatted = JSON.stringify(data, null, 2);
const jsonCompact = JSON.stringify(data);
const toonStr = toToon(data);

console.log(`JSON (formatted):  ${tokenize(jsonFormatted).length} tokens`);
console.log(`JSON (compact):    ${tokenize(jsonCompact).length} tokens`);
console.log(`TOON:              ${tokenize(toonStr).length} tokens`);
Enter fullscreen mode Exit fullscreen mode

Results on real data (from TOON benchmarks):

Format Tokens Savings vs JSON
JSON (formatted) 379 -
JSON (compact) 236 -37.7%
TOON 150 -60.4%

60% savings. On a single prompt. Not hypothetically - measured by tokenizer.

Counting Money: How Much You're Overpaying

Now the fun part. Current API prices (April 2026):

Model Input (per 1M tokens) Output (per 1M tokens)
GPT-4o $2.50 $10.00
GPT-4.1 $2.00 $8.00
Claude Sonnet 4.6 $3.00 $15.00
Claude Opus 4.6 $5.00 $25.00

Say you make 10,000 requests per day, each containing an array of 100 objects (typical RAG/analytics). Let's calculate for GPT-4o:

JSON TOON Difference
Tokens per request ~3,200 ~1,850 -42%
Tokens per day 32M 18.5M -13.5M
Cost per day $80 $46.25 -$33.75
Per month $2,400 $1,387 -$1,013
Per year $28,800 $16,650 -$12,150

On Claude Opus 4.6 (input $5/1M) the savings are even bigger:

JSON TOON Difference
Per month $4,800 $2,775 -$2,025
Per year $57,600 $33,300 -$24,300

$12-24K per year - on input tokens alone, on a single endpoint. If you have multiple pipelines - multiply accordingly.

Integration: 5 Minutes, 4 Lines of Code

You don't need to rewrite your architecture. TOON plugs in as a layer before sending to the API:

Python + OpenAI:

import openai
import toon_format

def analyze_with_llm(data: list[dict]) -> str:
    toon_str = toon_format.encode({"records": data})  # JSON -> TOON

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": f"Analyze this data and find anomalies:\n\n{toon_str}"
        }]
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

TypeScript + Anthropic:

import Anthropic from "@anthropic-ai/sdk";
import { encode as toToon } from "@toon-format/toon";

async function analyzeData(records: any[]) {
  const toonData = toToon({ records });

  const response = await anthropic.messages.create({
    model: "claude-sonnet-4-6-20250514",
    max_tokens: 1024,
    messages: [{
      role: "user",
      content: `Analyze this data and find anomalies:\n\n${toonData}`
    }]
  });
  return response.content[0].text;
}
Enter fullscreen mode Exit fullscreen mode

One line - toon_format.encode() - and you save 40-60% of tokens. The model responds in its usual format, nothing to change on the output side.

The Big Comparison: TOON vs Everything Else

No format is perfect for every case. Here's an honest breakdown:

Criteria JSON JSON compact YAML CSV TOON TRON
Tokens (tabular data) 100% ~63% ~72% ~38% ~40% ~55%
Tokens (nested data) 100% ~78% ~85% n/a ~67% ~75%
LLM accuracy 75.0% 73.7% 74.5% ~72% 76.4% -
Nested structures excellent excellent good none medium good
Pipeline compatibility everywhere everywhere wide wide needs SDK JSON-compatible
LLM familiarity (training data) huge huge large large minimal minimal
Lossless round-trip with JSON yes yes caveats no yes yes

Key takeaways:

  • TOON vs CSV: CSV is ~5-6% more compact for flat tables but doesn't support nesting or types. TOON adds minimal overhead but the model parses data more accurately.
  • TOON vs YAML: TOON saves 48% tokens on tabular data. YAML is better for deeply nested configs.
  • TOON vs JSON compact: Even minified JSON loses to TOON by 35% on tables. On nested data the gap is smaller (~15%).
  • TOON vs TRON: TRON is JSON-compatible (parseable with any JSON parser). TOON is more compact but requires a dedicated parser. Choose TRON if you don't want to change your toolchain.

When to Use TOON (and When Not To)

Use TOON when:

  • Uniform arrays of objects - user lists, products, logs, metrics. 40-60% savings.
  • RAG pipelines - dozens of same-structure documents pulled into context.
  • Batch processing - thousands of requests per day, every percentage of savings = real money.
  • Long contexts - when data doesn't fit the context window and shrinking it is critical.

Don't use TOON when:

  • Deeply nested structures (4+ levels). LLM accuracy drops to 43% on nested data. JSON is more reliable.
  • Data goes to a regular service, not an LLM. TOON is a prompt format, not for REST APIs or databases.
  • Flat tables with no nesting. CSV is 5-6% more compact and needs no SDK.
  • You need JSON Schema validation. TOON is a different syntax - existing validators won't work.

Ecosystem: What Already Works

Language Package Status
TypeScript @toon-format/toon Reference implementation
Python toon-format / python-toon Stable
Go toon-format/go-toon In development
Rust toon-format/toon-rs In development
.NET toon-format/toon-dotnet In development
CLI npx @toon-format/cli Works

Quick start:

# TypeScript
npm install @toon-format/toon

# Python
pip install toon-format

# Convert a file via CLI
npx @toon-format/cli data.json -o data.toon
npx @toon-format/cli data.toon -o data.json  # and back
Enter fullscreen mode Exit fullscreen mode

The spec is open, ABNF grammar is documented, test fixtures are available: toon-format/spec.

Bottom Line

TOON is not a JSON replacement. JSON will remain the standard for APIs, configs, and storage. But if you're sending structured data to an LLM - you're literally throwing money at syntactic overhead.

Four lines of code. Five minutes to integrate. Minus 40-60% tokens. Minus $12-24K per year at moderate load.

Try it on one endpoint. Measure. Calculate. Your API budget will thank you. Or at least stop quietly sobbing at night.


If this was useful - I write about frontend, AI, and practical dev stuff on my blog and Telegram channel. Come say hi!


Links:

Top comments (0)