Thoughtworks Technology Radar Vol 34 (April 2026) put TOON (Token-Oriented Object Notation) in the Assess ring. It's a JSON alternative designed for the moments when "fewer tokens" matters more than "more conventional" — typically LLM context windows. I built a 500-line vanilla JS JSON ⇔ TOON converter with a side-by-side token estimator to see what's actually doing the work. Spoiler: for typical API-response shapes, −70% is normal. Here's the breakdown.
🌐 Demo: https://sen.ltd/portfolio/toon-converter/
📦 GitHub: https://github.com/sen-ltd/toon-converter
What's the problem?
When you feed an API response to an LLM ("parse this and tell me what changed"), the JSON token cost is worse than you'd guess. Ten users:
{
"results": [
{ "id": 101, "name": "Alice Tanaka", "role": "admin", "active": true },
{ "id": 102, "name": "Bob Yamada", "role": "user", "active": true },
// ... 8 more
]
}
The thing chewing tokens isn't the data — it's "id", "name", "role", "active" showing up ten times. Each key repetition costs the BPE tokenizer 4–8 tokens, so 40+ tokens go to column names that you, the LLM, and the reader could all have agreed on once.
What TOON does
TOON formats uniform arrays of objects as a CSV-like table:
results[10]{id,name,role,active}:
101,Alice Tanaka,admin,true
102,Bob Yamada,user,true
...
- A header line declares columns once:
results[10]{id,name,role,active}: - Each row is comma-separated raw values
- Strings get quoted only when they have to (commas inside, special chars)
545 JSON tokens → 159 TOON tokens. −71% on this payload. Scale to 1000 rows and the ratio gets sharper, not worse.
The implementation hinge — isUniformObjectArray
The whole conversion gates on one predicate: can this array be rendered as a table?
function isUniformObjectArray(arr) {
if (arr.length < 1) return false;
if (!arr.every((v) => v !== null && typeof v === "object" && !Array.isArray(v))) {
return false;
}
const cols = Object.keys(arr[0]);
if (cols.length === 0) return false;
for (const row of arr) {
const k = Object.keys(row);
if (k.length !== cols.length) return false;
for (let i = 0; i < cols.length; i++) {
if (k[i] !== cols[i]) return false; // same keys, same order
const v = row[cols[i]];
if (v !== null && typeof v === "object") return false; // scalars only
}
}
return true;
}
Three rules:
- Every element is an object (no nulls, no arrays mixed in).
- Every row has the exact same keys in the exact same order.
- Every cell value is a scalar — you can't fit a nested object into a CSV row.
If any rule fails, fall back to a regular indented block per element. In practice the typical API response sails through.
Table render
function tableArray(key, arr, indent) {
const pad = INDENT.repeat(indent);
const cols = Object.keys(arr[0]);
const head = key
? `${pad}${formatKey(key)}[${arr.length}]{${cols.join(",")}}:`
: `${pad}[${arr.length}]{${cols.join(",")}}:`;
const rowPad = INDENT.repeat(indent + 1);
const rows = arr.map((row) => {
const cells = cols.map((c) => formatCell(row[c]));
return `${rowPad}${cells.join(",")}`;
});
return [head, ...rows].join("\n");
}
{col1,col2,...} is the schema declaration, the [N] count is a hint to the LLM that N rows follow. Every row is just cells.join(",") — the structural noise of {, }, ", : is gone.
Cell-level quote elision
For string cells, drop the quotes if the value looks safe:
function formatCell(v) {
if (v === null) return "";
if (typeof v === "string") {
if (v === "") return '""';
if (/^[A-Za-z0-9_\-./@+ ]+$/.test(v) && !v.includes(",")) return v;
return JSON.stringify(v);
}
return formatScalar(v);
}
"admin" → admin, "Alice Tanaka" → Alice Tanaka. With BPE, dropping the opening and closing " saves 2 tokens per quoted value. Ten rows × two string columns × 2 tokens = 40 tokens. The "hello, world" case stays quoted because the comma would split the row.
null cells become empty (,,). JSON's literal null is 4 chars (~1 token); empty is 0.
Token estimation without bundling a tokenizer
A real BPE counter like gpt-tokenizer needs ~1 MB of vocabulary data. That's the wrong cost profile for a "paste JSON, see the savings" tool. So I went with a heuristic:
export function estimateTokens(text) {
let total = 0;
let i = 0;
while (i < text.length) {
const c = text[i];
if (/[A-Za-z0-9]/.test(c)) {
// Alphanumeric run: roughly 1 token per 4 chars
let j = i;
while (j < text.length && /[A-Za-z0-9]/.test(text[j])) j++;
total += Math.max(1, Math.ceil((j - i) / 4));
i = j;
} else if (c === " " || c === "\t") {
// Whitespace run: 1 token
while (i < text.length && (text[i] === " " || text[i] === "\t")) i++;
total += 1;
} else if (c === "\n") {
total += 1; i++;
} else {
total += 1; i++; // punctuation: usually 1 token each
}
}
return total;
}
Against real GPT-4o / Claude tokenizers on JSON-shaped text this is accurate to about ±5–10%. The verdict to surface isn't "your prompt will cost exactly N cents" — it's "this format is ~70% cheaper than the other for the same payload." That comparison stays reliable inside the error bar.
When TOON doesn't win
The tool ships a "Mixed types" preset to show the failure mode:
{
"title": "Mixed type sample",
"counts": [1, 2, 3, 5, 8, 13, 21],
"flags": { "ready": true, "locked": false }
}
Savings on this: 2–5%. The reasons are clear once you see the algorithm:
- Arrays are short → key repetition wasn't the cost
- Nesting is shallow → no structural noise to compress
- No object arrays → no table form to use
TOON's leverage is uniform arrays of length ≥ ~5. Enterprise API responses, log streams, extraction results, search hits — all great. App config, package.json, CMS settings — barely worth converting.
Architecture
toon.js ← Pure JSON → TOON converter (22 tests)
tokens.js ← Heuristic token estimator (8 tests)
presets.js ← 5 sample payloads
app.js ← UI glue
Neither toon.js nor tokens.js touches document or window. 30 unit tests under node --test cover scalars, flat objects, nested objects, primitive arrays (inline + multiline), uniform table arrays (basic, embedded commas, null cells, non-uniform fallback), complex API-shape payloads, and the estimator itself.
Try it
Pick "User list" or "Log lines" to see the big savings. Pick "Mixed types" to see why TOON isn't a universal answer.
Takeaways
- JSON's token cost is dominated by key repetition in arrays of records. Cut the repetition and most of the bill goes away.
- TOON wins by declaring columns once. Uniform arrays of objects compress 50–80%. Everything else compresses a bit or not at all.
- Cell-level quote elision is a small but real additional win (~2 tokens per safe string).
- Heuristic token counting beats bundling a 1 MB BPE vocabulary if all you need is relative comparison.
- Test the pure code at the seam. 30 Node tests gave a deterministic floor under the converter before any browser rendering existed.
This is OSS portfolio #247 from SEN LLC (Tokyo), the first entry in our "Try the Tech Radar" series — picking blips from the Thoughtworks Technology Radar and shipping a small demo for each. Next up: Typst (Trial). We ship continuously: https://sen.ltd/portfolio/

Top comments (0)