My own benchmark across three Claude tiers (Haiku, Sonnet, Opus): 120 data files, 8 real-world scenarios, 5 formats. Tokens, cost, and accuracy – numbers, not opinions.
You Are Overpaying for Prompts
Every time you send data to the Claude API, the format of that data determines how many tokens you spend. The same 200-product catalog in JSON costs 15,879 tokens. In Markdown, it costs 7,814. In TOON, 6,088. That is a 62% difference.
A 120-task list? JSON consumes 8,500 tokens. TOON uses 2,267. Savings: 73%.
The problem is that every existing benchmark focuses on GPT, Gemini, and Llama. There has not been a public benchmark for Claude. I decided to fix that.
I ran 450 API calls on Claude Haiku 4.5, tested Sonnet 4.6 and Opus 4.6, and counted tokens across 120 files using Anthropic’s production tokenizer. Eight real-world scenarios, five formats. In this article – the results, the conclusions, and specific recommendations.
Five Formats at a Glance
JSON (JavaScript Object Notation)
- Year created: 2001; ECMA-404 standard (2013)
- Author: Douglas Crockford
- Primary use case: APIs, data exchange between systems, configuration files
-
Key characteristic: strict typing, nesting via
{}and[], mandatory quotes
JSON is the lingua franca of programmatic interfaces. Every API speaks JSON, and every language can parse it. But that universality comes at a price in an LLM context: quotes, braces, and commas all consume tokens. They carry syntactic weight, but not semantic meaning.
{"products": [{"id": 1, "name": "Mouse", "price": 29.99, "in_stock": true}]}
YAML (YAML Ain't Markup Language)
- Year created: 2001; YAML 1.2 standard (2009)
- Authors: Clark Evans, Ingy döt Net, Oren Ben-Kiki
- Primary use case: configuration files (Docker Compose, Kubernetes, GitHub Actions)
- Key characteristic: indentation-based structure, minimal punctuation
YAML is the de facto standard of the DevOps world. It reads like pseudocode and usually does not require quotes. The trade-off is that repeating keys for every array item eats up much of the punctuation savings.
products:
- id: 1
name: Mouse
price: 29.99
in_stock: true
Markdown
- Year created: 2004
- Author: John Gruber (with Aaron Swartz)
- Primary use case: documentation, READMEs, blogs, wikis
-
Key characteristic: human-first syntax – headings
#, tables|, lists-
Markdown is the most “native” format for LLMs. Models have been trained on billions of READMEs and wiki pages. GitHub, Notion, Obsidian – all rely on Markdown. It is a communication format, not a data format.
## Products
| ID | Name | Price | In Stock |
|----|-------|-------|----------|
| 1 | Mouse | 29.99 | Yes |
Plain Text
- Primary use case: human communication – emails, notes, instructions
- Key characteristic: no syntax, no markup, maximum flexibility
Plain text with no markup. It minimizes token overhead, but it provides no explicit structure for programmatic data extraction.
Products: Mouse (ID 1, $29.99, in stock)
TOON (Token-Oriented Object Notation)
- Year created: 2025 (v1.0 – November 2025, MIT license)
- Author: open-source community (GitHub)
- Primary use case: token optimization in LLM prompts, replacing JSON in AI workflows
- Key characteristic: a YAML + CSV hybrid (indentation for objects, row-style encoding for arrays)
The newest format in this comparison. TOON was created for one purpose: minimize tokens while preserving lossless JSON round-tripping. For arrays of homogeneous objects, field names are declared once and values are written as CSV-style rows. On GPT-5 Nano, it showed 99.4% accuracy with 46% token savings. Before this benchmark, it had not been tested on Claude.
products[1]{id,name,price,in_stock}:
1,Mouse,29.99,true
Methodology
What I Tested
Eight scenarios, each in three sizes (S / M / L), each in five formats. Total: 120 data files.
| # | Scenario | Data type | S | M | L |
|---|---|---|---|---|---|
| 1 | System prompt / instructions | Rules, sections | 10 rules | 30 rules | 60 rules |
| 2 | Product catalog | Tabular data | 20 products | 100 products | 200 products |
| 3 | Roadmap / tasks | Statuses, dependencies | 15 tasks | 50 tasks | 120 tasks |
| 4 | Business rules | Conditional logic | 8 rules | 25 rules | 50 rules |
| 5 | Few-shot classification | Input-output examples | 5 examples | 15 examples | 40 examples |
| 6 | Organizational hierarchy | 3 levels of nesting | 12 people | 60 people | 150 people |
| 7 | API documentation | Endpoints, parameters | 5 endpoints | 15 endpoints | 30 endpoints |
| 8 | Output format | Requesting data in a given format | 10 countries | 50 countries | 100 countries |
Few-shot (scenario 5) is a prompting technique in which several “input → output” examples are included directly in the prompt so the model can infer the task from a pattern. For example:
"Great product!" → positive,"Terrible quality" → negative, then the question"Love it!" → ?. Zero examples is zero-shot, one example is one-shot, several examples is few-shot. The format of those examples directly affects cost: 40 pairs in JSON take 2,131 tokens; in TOON, 996.
For scenarios 2, 3, 6, and 7, I prepared questions with precomputed correct answers (ground truth). For scenarios 1, 4, and 5, scoring was manual and rubric-based. For scenario 8, I measured output tokens and format compliance.
Models and Pricing
| Model | Tier | Input ($/1M) | Output ($/1M) |
|---|---|---|---|
| Claude Haiku 4.5 | Fast | $0.80 | $4 |
| Claude Sonnet 4.6 | Mid | $3 | $15 |
| Claude Opus 4.6 | Premium | $15 | $75 |
Accuracy was measured across all three tiers. Sizes S and M were tested for accuracy. L-size was used only for token counts.
Clean-Test Principle
All requests were sent directly via the anthropic Python SDK: plain client.messages.create() with temperature=0. No MCP servers, IDE plugins, or agent frameworks.
Token counting was done with client.messages.count_tokens() – Anthropic’s production tokenizer, i.e. the same numbers used for billing. The tokenizer is the same across all Claude tiers – so the token-count data applies to all Claude models.
Benchmark code: github.com/webmaster-ramos/yaml-vs-md-benchmark
Input-Token Efficiency
These numbers apply to all Claude tiers – Haiku, Sonnet, and Opus all use the same tokenizer. The only cost difference comes from the price per token.
Summary Table: Average Input Tokens Across All Scenarios
| Format | Average tokens | vs JSON |
|---|---|---|
| JSON | 3,252 | baseline |
| YAML | 2,208 | -32% |
| Markdown | 1,514 | -53% |
| Plain Text | 1,391 | -57% |
| TOON | 1,226 | -62% |
TOON saves 62% of input tokens on average versus JSON. Markdown saves 53%. YAML, despite its minimal punctuation, saves only 32% – because of repeated keys and indentation overhead.
Breakdown by Scenario (% Savings vs JSON, L-size)
| Scenario | YAML | MD | TXT | TOON |
|---|---|---|---|---|
| Instructions | -22% | -29% | -24% | -24% |
| Products | -29% | -51% | -53% | -62% |
| Tasks | -35% | -63% | -69% | -73% |
| Business Rules | -28% | -52% | -48% | -63% |
| Few-shot | -31% | -45% | -37% | -53% |
| Hierarchy | -37% | -61% | -67% | -68% |
| API Docs | -35% | -45% | -59% | -53% |
YAML Savings vs JSON (%, L-size)
MD Savings vs JSON (%, L-size)
TXT Savings vs JSON (%, L-size)
TOON Savings vs JSON (%, L-size)
Detailed Charts by Scenario
Input tokens by scenario: Instructions
Input tokens by scenario: Products
Input tokens by scenario: Tasks
Input tokens by scenario: Rules
Input tokens by scenario: Few-shot
Input tokens by scenario: Hierarchy
Input tokens by scenario: API Docs
Key Observations
TOON is the clear leader for tabular data. Product catalogs, task lists, few-shot examples – anything that looks like an array of homogeneous objects. Savings: 62–73% versus JSON.
Markdown is the best all-purpose format. A stable 50–65% reduction across all data types. It is the only format that performs consistently well across tables, instructions, and hierarchies.
YAML is underwhelming. Many people expect YAML to be much more compact than JSON. In practice, the savings are only 14–41%. The reason is repeated keys for every array element.
Plain Text wins on API docs. For technical specifications, plain text is more efficient than TOON (59% vs 53%). Without extra syntax, descriptive text compresses better.
Scale barely affects the percentage savings. The difference between S and L is under 2 percentage points. Format drives efficiency more than data volume does.
Haiku 4.5: When Format Matters
Haiku is the most format-sensitive tier. In 35% of questions, it produced different answers depending on the input format. Accuracy spread reached as high as 36 percentage points between the best and worst format within the same scenario.
Accuracy by Scenario
Accuracy Haiku: Products (product catalog)
Accuracy Haiku: Tasks (tasks / roadmap)
Accuracy Haiku: Hierarchy (organizational hierarchy)
Accuracy Haiku: API Docs (documentation)
| Scenario | JSON | YAML | MD | TXT | TOON | Best |
|---|---|---|---|---|---|---|
| Products | 63.4% | 61.4% | 69.2% | 70.2% | 66.2% | TXT |
| Tasks | 71.0% | 65.7% | 66.7% | 56.7% | 65.3% | JSON |
| Hierarchy | 85.7% | 92.9% | 85.7% | 78.2% | 85.7% | YAML |
| API Docs | 85.7% | 85.7% | 57.1% | 78.6% | 85.7% | JSON/YAML/TOON |
Hierarchy shows the sharpest gap: YAML (92.9%) vs Markdown (57.1%) – a 36-point difference. Tree-like structures are clearly easier for Haiku to parse in an indentation-based format.
API Docs: Markdown performs unexpectedly poorly – 57.1% vs 85.7% for JSON. For technical specifications with parameters and types, explicit structure matters more than compactness.
Accuracy by Size (Haiku)
| Size | Accuracy |
|---|---|
| S (small data) | 80.3% |
| M (medium data) | 67.2% |
Scale matters more than format. Accuracy drops by 13 points when moving from S to M – more than the average difference between formats (5.7 points). The implication is straightforward: reduce data volume first, then optimize format.
Cost: Haiku
| Format | Avg tokens | Cost / request | 100K requests / month |
|---|---|---|---|
| JSON | 3,252 | $0.0026 | $260 |
| YAML | 2,208 | $0.0018 | $177 |
| MD | 1,514 | $0.0012 | $121 |
| TXT | 1,391 | $0.0011 | $111 |
| TOON | 1,226 | $0.0010 | $98 |
| JSON -> TOON | - | -62% | $162/month |
Output Format: Haiku
Output tokens: S-size (10 countries) – Haiku, Sonnet, Opus
Output tokens: M-size (50 countries) – Haiku, Sonnet, Opus
| Requested format | S (10 countries) | M (50 countries) | Savings vs JSON |
|---|---|---|---|
| JSON | 465 | 1,985 | baseline |
| YAML | 296 | 1,352 | -32..36% |
| Markdown | 165 | 1,125 | -43..65% |
| Plain Text | 294 | 1,381 | -30..37% |
| TOON | 342 | 1,369 | -26..31% |
Markdown is the cheapest output format on Haiku. 165 vs 465 tokens on S-size – a 65% reduction. At $4 per 1M output tokens, that matters.
Important: TOON loses on output. Haiku does not know the TOON format and, instead of producing compact CSV-like rows, tends to emit verbose plain text that only vaguely resembles TOON. A few-shot example improves TOON output quality, but it still trails Markdown in efficiency.
Output-Format Choice: Technical Requirements
Output cost is not the only thing that matters. Often, Claude’s response must be processed programmatically – parsed, inserted into a database, or passed to another service. The best output format depends on who or what is going to read it.
| Usage scenario | Recommendation | Why |
|---|---|---|
| User-facing answer in UI | Markdown | Renders natively, lowest token cost |
| Backend parsing | JSON | Reliable, universal, guaranteed structure |
| Config / YAML pipeline | YAML | Human-readable + machine-parsable |
| Rows for CSV / spreadsheet | TXT | Minimal overhead, structure via delimiters |
| Compact output for TOON SDK | TOON | Only if using Opus, or with a few-shot example |
Rule of thumb: if a human reads the output, use Markdown. If code reads it, use JSON or YAML. Do not optimize output cost at the expense of parsing reliability in production.
Recommendations for Haiku
| Data type | Best input | Accuracy | Best output |
|---|---|---|---|
| System prompts | MD | stable | MD |
| Catalogs, lists | TXT | 70.2% | MD |
| Tasks / roadmap | JSON | 71.0% | MD or JSON |
| Hierarchies | YAML | 92.9% | YAML |
| API documentation | JSON or YAML | 85.7% | JSON |
| Few-shot examples | TOON | 65.3% (-0.5% vs JSON) | MD |
On Haiku, format matters – especially for hierarchies and API documentation. Use TOON on input where token savings are worth a small accuracy trade-off, but do not use TOON on output without a few-shot example.
Sonnet 4.6: Format Affects Cost, Not Quality
Sonnet 4.6 produced identical answers across all five formats. In 100% of questions, the result was the same regardless of how the data was represented. For Sonnet, format optimization is pure cost reduction with no quality trade-off.
Accuracy: Format-Invariant
Accuracy by model and format
| Format | Sonnet 4.6 |
|---|---|
| JSON | 89.4% |
| YAML | 89.4% |
| Markdown | 89.4% |
| Plain Text | 89.4% |
| TOON | 89.4% |
The answers are completely identical across all formats. Switching from JSON to TOON saves 62% of input tokens while preserving the same output.
Cost: Sonnet
| Format | Avg tokens | Cost / request | 100K requests / month |
|---|---|---|---|
| JSON | 3,252 | $0.0098 | $975 |
| YAML | 2,208 | $0.0066 | $663 |
| MD | 1,514 | $0.0045 | $454 |
| TXT | 1,391 | $0.0042 | $417 |
| TOON | 1,226 | $0.0037 | $368 |
| JSON -> TOON | - | -62% | $607/month |
At 100K requests per month, switching from JSON to TOON saves $607/month. On Sonnet, output costs $15 per 1M tokens, so output optimization also matters.
Output Format: Sonnet
Output tokens for Sonnet (estimated as characters ÷ 3.5 chars/token):
| Format | S (10 countries) | M (50 countries) |
|---|---|---|
| JSON | ~210 | ~1,120 |
| YAML | ~195 | ~1,023 |
| Markdown | ~143 | ~746 |
| Plain Text | ~103 | ~549 |
| TOON | ~86 | ~414 |
Comparison of output tokens across all three models (S-size):
M-size (50 countries):
On Sonnet, TOON output requires a few-shot example. Without extra context, Sonnet interprets “TOON format” literally – as an abbreviation connected to cartoons – and returns an irrelevant answer. With a format example in the prompt, it generates correct TOON.
Technical requirements for output on Sonnet are the same as on Haiku: if a downstream system parses the response programmatically, use JSON or YAML. If a human is going to read it, use Markdown.
Recommendations for Sonnet
On Sonnet, format choice is a pure cost optimization. The logic is simple:
- Input data: use TOON (for tables) or MD (for instructions / hierarchies)
- Human-readable output: Markdown (-65% vs JSON)
- Machine-parsed output: JSON (most reliable) or YAML (more compact, still parseable)
- TOON output: add a few-shot example to the prompt; otherwise the answer may be incorrect
Optimal prompt design: MD for instructions + TOON for data + a request for MD/JSON output.
Opus 4.6: Maximum Capability, Also Format-Invariant
Opus 4.6 is the strongest model and the most expensive one. Like Sonnet, it is completely insensitive to input format. But Opus has one unique advantage: it knows TOON “out of the box.”
Accuracy: Format-Invariant
| Format | Opus 4.6 |
|---|---|
| JSON | 93.5% |
| YAML | 93.5% |
| Markdown | 93.5% |
| Plain Text | 93.5% |
| TOON | 93.5% |
The answers are 100% identical across all formats. Changing format affects only cost.
Cost: Opus
| Format | Avg tokens | Cost / request | 100K requests / month |
|---|---|---|---|
| JSON | 3,252 | $0.0488 | $4,878 |
| YAML | 2,208 | $0.0331 | $3,312 |
| MD | 1,514 | $0.0227 | $2,271 |
| TXT | 1,391 | $0.0209 | $2,087 |
| TOON | 1,226 | $0.0184 | $1,839 |
| JSON -> TOON | - | -62% | $3,039/month |
On Opus, switching from JSON to TOON saves over $3,000/month at 100K requests. Output costs $75 per 1M tokens – so format optimization has the largest financial impact here.
Output Format: Opus
Output tokens for Opus (estimated as characters ÷ 3.5 chars/token):
| Format | S (10 countries) | M (50 countries) |
|---|---|---|
| JSON | ~254 | ~1,271 |
| YAML | ~286 | ~1,414 |
| Markdown | ~177 | ~814 |
| Plain Text | ~194 | ~986 |
| TOON | ~106 | ~543 |
Comparison of output tokens across all three models (S-size):
M-size (50 countries):
Opus generates TOON without hints. That is the key difference from Sonnet and Haiku. Opus knows the format and produces valid TOON output on the first try.
Can Claude generate valid TOON output?
| Model | Without example in prompt | With few-shot example |
|---|---|---|
| Opus 4.6 | Valid TOON | Valid TOON |
| Sonnet 4.6 | Cartoon / irrelevant | Valid TOON |
| Haiku 4.5 | Verbose plain text | Closer to TOON, but still inaccurate |
In practical terms, this means: if you need TOON output and want it to work reliably without prompt scaffolding, use Opus.
Technical Requirements for Output: When Parsing Matters More Than Cost
On Opus, output costs $75 per 1M tokens – so output-format savings are highly relevant. But the requirements of the downstream system still take priority:
Scenarios where output must be parsed programmatically:
- The response goes into a database or structured store – use JSON
- Another LLM or service consumes the response through an API – use JSON or YAML
- The response is part of a pipeline (the next step processes the data) – use JSON
- The response is rendered in the UI as text or a document – use Markdown (lowest token cost)
- You need compact machine-readable output and already have a TOON SDK – use TOON (only Opus works reliably without prompt help)
The key point: output on Opus costs $75 per 1M – five times more than input. A 65% output reduction (Markdown vs JSON) can matter even more than input savings. But do not trade away parse reliability just to cut cost.
Recommendations for Opus
- Input: TOON for tabular data (-62%), MD for instructions (-53%)
- Human-readable output: Markdown (-65% output tokens)
- Machine-parsed output: JSON – reliable and universal
- TOON output: works without few-shot – Opus’s unique advantage
- Do not use JSON on input: it is the most expensive format with no accuracy benefit
Summary Results
Accuracy Across All Models and Formats
| Format | Haiku 4.5 | Sonnet 4.6 | Opus 4.6 |
|---|---|---|---|
| JSON | 75.3% | 89.4% | 93.5% |
| YAML | 75.1% | 89.4% | 93.5% |
| Markdown | 69.6% | 89.4% | 93.5% |
| Plain Text | 70.6% | 89.4% | 93.5% |
| TOON | 74.8% | 89.4% | 93.5% |
For Sonnet and Opus, format does not affect accuracy. For Haiku, it matters materially – especially for hierarchies and documentation.
Decision Matrix: Input Format
| Data type | Haiku | Sonnet / Opus |
|---|---|---|
| System prompts / instructions | MD (-29%) | TOON or MD |
| Catalogs, lists | TXT (70.2%) | TOON (-62%) |
| Tasks / roadmap | JSON (71.0%) | TOON (-73%) |
| Business rules | JSON (stable) | TOON (-63%) |
| Few-shot examples | TOON (≈JSON) | TOON (-53%) |
| Hierarchies | YAML (92.9%) | TOON or MD |
| API documentation | JSON/YAML (85.7%) | TXT (-59%) |
Decision Matrix: Output Format
| Output consumer | Recommendation | Haiku | Sonnet | Opus |
|---|---|---|---|---|
| UI / end user | Markdown | native | native | native |
| API / JSON parser | JSON | reliable | reliable | reliable |
| YAML pipeline | YAML | reliable | reliable | reliable |
| TOON SDK | TOON | with few-shot* | with few-shot* | native |
| CSV / spreadsheet | TXT | with template | with template | with template |
*Requires a few-shot example in the prompt
Benchmark Limitations
- Accuracy was measured only on S+M sizes. L-size includes token counts only. Accuracy may degrade more sharply on larger data.
- The data is synthetic. Catalogs and tasks were script-generated. Real-world data may be messier (missing fields, Unicode, long descriptions).
- Automatic scoring covers 4 of 8 cases. Cases 1, 4, and 5 require rubric-based evaluation. The accuracy numbers here cover cases 2, 3, 6, and 7.
- Sonnet / Opus were tested via subscription (subagents). Output-token counts are estimated, not directly measured. Haiku was tested via API.
- No A/B test on live traffic. This is a laboratory benchmark. The impact on a production product must be validated separately.
The code and data are open – reproduce it, extend it, challenge it.
What Surprised Me
Opus and Sonnet are completely insensitive to format. I expected a 3–5% gap. I got 0%. For the higher tiers, format is pure cost optimization.
YAML is not as efficient as many assume. The expectation is usually “YAML is more compact than JSON.” In practice, the savings are only 32%. Repeated keys wipe out much of the benefit from removing braces.
TOON works on Claude without special training. Claude may not have seen much TOON in training data, yet all three tiers parse it correctly – essentially on par with JSON.
Opus knows TOON; Sonnet does not. Opus generates valid TOON output without hints. Sonnet interpreted “TOON format” as “cartoon” and produced an irrelevant answer. With a few-shot example, both work correctly.
Markdown is the best output format. The gap in output tokens between JSON and Markdown is 65%. At $75 per 1M on Opus, that is significant. It is also the only format every tier generates natively without extra prompting.
On Haiku, scale matters more than format. Accuracy drops from 80.3% (S) to 67.2% (M) – a 13-point drop. The average difference between formats is 5.7 points. On Sonnet and Opus, scale is much less of an issue.
FAQ
Q: Do these results apply to other models (GPT, Gemini)?
The trends are similar, but the numbers differ. Every model has its own tokenizer. On GPT-5 Nano, YAML shows 62% accuracy on nested data (ImprovingAgents); on Claude Haiku, it reaches 93%. Use these results for Claude, and other benchmarks for other models.
Q: How were tokens counted?
Using client.messages.count_tokens() – the standard Anthropic SDK method and production tokenizer. These are the same numbers used for billing. The tokenizer is the same across all tiers.
Q: Why not test XML?
XML is rarely used in modern LLM workflows. Existing benchmarks (ShShell) suggest that XML is significantly more expensive than Markdown in token terms, with comparable or worse accuracy.
Q: Is TOON a serious format or just hype?
TOON v1.0 was released in November 2025 under MIT, and there are SDKs in 6+ languages. For tabular data, the savings are real – 62% on Claude with JSON-level accuracy. Opus generates TOON output without prompting. Other tiers require a few-shot example.
Q: Does the input format affect the output format?
Partially. If you provide data in YAML, Claude is more likely to structure its answer with indentation. But an explicit instruction such as “Return as a Markdown table” overrides that tendency.
Q: Is it worth converting all prompts away from JSON?
At 100K requests/month on Sonnet, moving from JSON to TOON saves $607/month. On Opus, it saves $3,039/month. For hobby projects with 1K requests, the difference is around $6. Run the math on your own usage.
Q: Can you combine formats in one prompt?
Yes – and that is usually the recommended approach. Markdown for instructions + TOON for data + a request for output in the format you need. Claude handles multi-format prompts well.
Q: Where is the benchmark source code?
github.com/webmaster-ramos/yaml-vs-md-benchmark. All 120 data files, 51 questions, ground truth, runner, and scorer are open for reproduction.
Conclusion
Data format in a prompt is not a cosmetic choice. On the Claude API, the gap between JSON and TOON is 62% on input tokens. Markdown saves 65% on output tokens. At 100K requests/month on Opus, that means $3,039 saved on input and even more on output.
But the main finding is not about tokens. Claude Sonnet 4.6 and Opus 4.6 are completely insensitive to format. They produced 100% identical answers on JSON, YAML, Markdown, Plain Text, and TOON. For the higher tiers, format optimization is pure savings with no quality trade-off.
Only Haiku 4.5 is meaningfully format-sensitive – and only there does the choice of format affect accuracy (by up to 36 percentage points). On Haiku, format should be matched to data type: YAML for hierarchies, JSON for tasks with dependencies.
Beyond cost, there are technical requirements: if the output must be parsed programmatically, JSON is more reliable than Markdown. If a human reads the answer, Markdown is cheaper. Opus is the only tier that generates TOON natively; Sonnet and Haiku require a few-shot example.
TL;DR by tier:
| Haiku 4.5 | Sonnet 4.6 | Opus 4.6 | |
|---|---|---|---|
| Does format affect accuracy? | Yes, by up to 36 points | No | No |
| Best input (data) | YAML/JSON/TXT by data type | TOON | TOON |
| Best input (instructions) | MD | MD | MD |
| Best output (human-readable) | MD | MD | MD |
| Best output (parsing) | JSON | JSON | JSON |
| TOON output without prompt help | No | No | Yes |
| JSON -> TOON savings | $162 / 100K | $607 / 100K | $3,039 / 100K |
Benchmark run in April 2026 on Claude Opus 4.6, Sonnet 4.6, and Haiku 4.5.
120 data files, 8 scenarios, 3 sizes, 5 formats, 3 models.
All code and data: github.com/webmaster-ramos/yaml-vs-md-benchmark



















Top comments (0)