DEV Community

SEN LLC
SEN LLC

Posted on

Try the Tech Radar #1 — TOON Cuts JSON Token Cost by 71% for LLM Context

Thoughtworks Technology Radar Vol 34 (April 2026) put TOON (Token-Oriented Object Notation) in the Assess ring. It's a JSON alternative designed for the moments when "fewer tokens" matters more than "more conventional" — typically LLM context windows. I built a 500-line vanilla JS JSON ⇔ TOON converter with a side-by-side token estimator to see what's actually doing the work. Spoiler: for typical API-response shapes, −70% is normal. Here's the breakdown.

🌐 Demo: https://sen.ltd/portfolio/toon-converter/
📦 GitHub: https://github.com/sen-ltd/toon-converter

Screenshot

What's the problem?

When you feed an API response to an LLM ("parse this and tell me what changed"), the JSON token cost is worse than you'd guess. Ten users:

{
  "results": [
    { "id": 101, "name": "Alice Tanaka", "role": "admin", "active": true },
    { "id": 102, "name": "Bob Yamada",   "role": "user",  "active": true },
    // ... 8 more
  ]
}
Enter fullscreen mode Exit fullscreen mode

The thing chewing tokens isn't the data — it's "id", "name", "role", "active" showing up ten times. Each key repetition costs the BPE tokenizer 4–8 tokens, so 40+ tokens go to column names that you, the LLM, and the reader could all have agreed on once.

What TOON does

TOON formats uniform arrays of objects as a CSV-like table:

results[10]{id,name,role,active}:
  101,Alice Tanaka,admin,true
  102,Bob Yamada,user,true
  ...
Enter fullscreen mode Exit fullscreen mode
  • A header line declares columns once: results[10]{id,name,role,active}:
  • Each row is comma-separated raw values
  • Strings get quoted only when they have to (commas inside, special chars)

545 JSON tokens → 159 TOON tokens. −71% on this payload. Scale to 1000 rows and the ratio gets sharper, not worse.

The implementation hinge — isUniformObjectArray

The whole conversion gates on one predicate: can this array be rendered as a table?

function isUniformObjectArray(arr) {
  if (arr.length < 1) return false;
  if (!arr.every((v) => v !== null && typeof v === "object" && !Array.isArray(v))) {
    return false;
  }
  const cols = Object.keys(arr[0]);
  if (cols.length === 0) return false;
  for (const row of arr) {
    const k = Object.keys(row);
    if (k.length !== cols.length) return false;
    for (let i = 0; i < cols.length; i++) {
      if (k[i] !== cols[i]) return false;            // same keys, same order
      const v = row[cols[i]];
      if (v !== null && typeof v === "object") return false;  // scalars only
    }
  }
  return true;
}
Enter fullscreen mode Exit fullscreen mode

Three rules:

  1. Every element is an object (no nulls, no arrays mixed in).
  2. Every row has the exact same keys in the exact same order.
  3. Every cell value is a scalar — you can't fit a nested object into a CSV row.

If any rule fails, fall back to a regular indented block per element. In practice the typical API response sails through.

Table render

function tableArray(key, arr, indent) {
  const pad = INDENT.repeat(indent);
  const cols = Object.keys(arr[0]);
  const head = key
    ? `${pad}${formatKey(key)}[${arr.length}]{${cols.join(",")}}:`
    : `${pad}[${arr.length}]{${cols.join(",")}}:`;
  const rowPad = INDENT.repeat(indent + 1);
  const rows = arr.map((row) => {
    const cells = cols.map((c) => formatCell(row[c]));
    return `${rowPad}${cells.join(",")}`;
  });
  return [head, ...rows].join("\n");
}
Enter fullscreen mode Exit fullscreen mode

{col1,col2,...} is the schema declaration, the [N] count is a hint to the LLM that N rows follow. Every row is just cells.join(",") — the structural noise of {, }, ", : is gone.

Cell-level quote elision

For string cells, drop the quotes if the value looks safe:

function formatCell(v) {
  if (v === null) return "";
  if (typeof v === "string") {
    if (v === "") return '""';
    if (/^[A-Za-z0-9_\-./@+ ]+$/.test(v) && !v.includes(",")) return v;
    return JSON.stringify(v);
  }
  return formatScalar(v);
}
Enter fullscreen mode Exit fullscreen mode

"admin"admin, "Alice Tanaka"Alice Tanaka. With BPE, dropping the opening and closing " saves 2 tokens per quoted value. Ten rows × two string columns × 2 tokens = 40 tokens. The "hello, world" case stays quoted because the comma would split the row.

null cells become empty (,,). JSON's literal null is 4 chars (~1 token); empty is 0.

Token estimation without bundling a tokenizer

A real BPE counter like gpt-tokenizer needs ~1 MB of vocabulary data. That's the wrong cost profile for a "paste JSON, see the savings" tool. So I went with a heuristic:

export function estimateTokens(text) {
  let total = 0;
  let i = 0;
  while (i < text.length) {
    const c = text[i];
    if (/[A-Za-z0-9]/.test(c)) {
      // Alphanumeric run: roughly 1 token per 4 chars
      let j = i;
      while (j < text.length && /[A-Za-z0-9]/.test(text[j])) j++;
      total += Math.max(1, Math.ceil((j - i) / 4));
      i = j;
    } else if (c === " " || c === "\t") {
      // Whitespace run: 1 token
      while (i < text.length && (text[i] === " " || text[i] === "\t")) i++;
      total += 1;
    } else if (c === "\n") {
      total += 1; i++;
    } else {
      total += 1; i++; // punctuation: usually 1 token each
    }
  }
  return total;
}
Enter fullscreen mode Exit fullscreen mode

Against real GPT-4o / Claude tokenizers on JSON-shaped text this is accurate to about ±5–10%. The verdict to surface isn't "your prompt will cost exactly N cents" — it's "this format is ~70% cheaper than the other for the same payload." That comparison stays reliable inside the error bar.

When TOON doesn't win

The tool ships a "Mixed types" preset to show the failure mode:

{
  "title": "Mixed type sample",
  "counts": [1, 2, 3, 5, 8, 13, 21],
  "flags": { "ready": true, "locked": false }
}
Enter fullscreen mode Exit fullscreen mode

Savings on this: 2–5%. The reasons are clear once you see the algorithm:

  • Arrays are short → key repetition wasn't the cost
  • Nesting is shallow → no structural noise to compress
  • No object arrays → no table form to use

TOON's leverage is uniform arrays of length ≥ ~5. Enterprise API responses, log streams, extraction results, search hits — all great. App config, package.json, CMS settings — barely worth converting.

Architecture

toon.js     ← Pure JSON → TOON converter (22 tests)
tokens.js   ← Heuristic token estimator (8 tests)
presets.js  ← 5 sample payloads
app.js      ← UI glue
Enter fullscreen mode Exit fullscreen mode

Neither toon.js nor tokens.js touches document or window. 30 unit tests under node --test cover scalars, flat objects, nested objects, primitive arrays (inline + multiline), uniform table arrays (basic, embedded commas, null cells, non-uniform fallback), complex API-shape payloads, and the estimator itself.

Try it

Pick "User list" or "Log lines" to see the big savings. Pick "Mixed types" to see why TOON isn't a universal answer.

Takeaways

  • JSON's token cost is dominated by key repetition in arrays of records. Cut the repetition and most of the bill goes away.
  • TOON wins by declaring columns once. Uniform arrays of objects compress 50–80%. Everything else compresses a bit or not at all.
  • Cell-level quote elision is a small but real additional win (~2 tokens per safe string).
  • Heuristic token counting beats bundling a 1 MB BPE vocabulary if all you need is relative comparison.
  • Test the pure code at the seam. 30 Node tests gave a deterministic floor under the converter before any browser rendering existed.

This is OSS portfolio #247 from SEN LLC (Tokyo), the first entry in our "Try the Tech Radar" series — picking blips from the Thoughtworks Technology Radar and shipping a small demo for each. Next up: Typst (Trial). We ship continuously: https://sen.ltd/portfolio/

Top comments (0)