DEV Community

Webmaster Ramos
Webmaster Ramos

Posted on • Originally published at webmaster-ramos.com

YAML vs Markdown vs JSON vs TOON: Which Format Is Most Efficient for the Claude API

My own benchmark across three Claude tiers (Haiku, Sonnet, Opus): 120 data files, 8 real-world scenarios, 5 formats. Tokens, cost, and accuracy – numbers, not opinions.


You Are Overpaying for Prompts

Every time you send data to the Claude API, the format of that data determines how many tokens you spend. The same 200-product catalog in JSON costs 15,879 tokens. In Markdown, it costs 7,814. In TOON, 6,088. That is a 62% difference.

A 120-task list? JSON consumes 8,500 tokens. TOON uses 2,267. Savings: 73%.

The problem is that every existing benchmark focuses on GPT, Gemini, and Llama. There has not been a public benchmark for Claude. I decided to fix that.

I ran 450 API calls on Claude Haiku 4.5, tested Sonnet 4.6 and Opus 4.6, and counted tokens across 120 files using Anthropic’s production tokenizer. Eight real-world scenarios, five formats. In this article – the results, the conclusions, and specific recommendations.


Five Formats at a Glance

JSON (JavaScript Object Notation)

  • Year created: 2001; ECMA-404 standard (2013)
  • Author: Douglas Crockford
  • Primary use case: APIs, data exchange between systems, configuration files
  • Key characteristic: strict typing, nesting via {} and [], mandatory quotes

JSON is the lingua franca of programmatic interfaces. Every API speaks JSON, and every language can parse it. But that universality comes at a price in an LLM context: quotes, braces, and commas all consume tokens. They carry syntactic weight, but not semantic meaning.

{"products": [{"id": 1, "name": "Mouse", "price": 29.99, "in_stock": true}]}
Enter fullscreen mode Exit fullscreen mode

YAML (YAML Ain't Markup Language)

  • Year created: 2001; YAML 1.2 standard (2009)
  • Authors: Clark Evans, Ingy döt Net, Oren Ben-Kiki
  • Primary use case: configuration files (Docker Compose, Kubernetes, GitHub Actions)
  • Key characteristic: indentation-based structure, minimal punctuation

YAML is the de facto standard of the DevOps world. It reads like pseudocode and usually does not require quotes. The trade-off is that repeating keys for every array item eats up much of the punctuation savings.

products:
  - id: 1
    name: Mouse
    price: 29.99
    in_stock: true
Enter fullscreen mode Exit fullscreen mode

Markdown

  • Year created: 2004
  • Author: John Gruber (with Aaron Swartz)
  • Primary use case: documentation, READMEs, blogs, wikis
  • Key characteristic: human-first syntax – headings #, tables |, lists -

Markdown is the most “native” format for LLMs. Models have been trained on billions of READMEs and wiki pages. GitHub, Notion, Obsidian – all rely on Markdown. It is a communication format, not a data format.

## Products

| ID | Name  | Price | In Stock |
|----|-------|-------|----------|
| 1  | Mouse | 29.99 | Yes      |
Enter fullscreen mode Exit fullscreen mode

Plain Text

  • Primary use case: human communication – emails, notes, instructions
  • Key characteristic: no syntax, no markup, maximum flexibility

Plain text with no markup. It minimizes token overhead, but it provides no explicit structure for programmatic data extraction.

Products: Mouse (ID 1, $29.99, in stock)
Enter fullscreen mode Exit fullscreen mode

TOON (Token-Oriented Object Notation)

  • Year created: 2025 (v1.0 – November 2025, MIT license)
  • Author: open-source community (GitHub)
  • Primary use case: token optimization in LLM prompts, replacing JSON in AI workflows
  • Key characteristic: a YAML + CSV hybrid (indentation for objects, row-style encoding for arrays)

The newest format in this comparison. TOON was created for one purpose: minimize tokens while preserving lossless JSON round-tripping. For arrays of homogeneous objects, field names are declared once and values are written as CSV-style rows. On GPT-5 Nano, it showed 99.4% accuracy with 46% token savings. Before this benchmark, it had not been tested on Claude.

products[1]{id,name,price,in_stock}:
1,Mouse,29.99,true
Enter fullscreen mode Exit fullscreen mode

Methodology

What I Tested

Eight scenarios, each in three sizes (S / M / L), each in five formats. Total: 120 data files.

# Scenario Data type S M L
1 System prompt / instructions Rules, sections 10 rules 30 rules 60 rules
2 Product catalog Tabular data 20 products 100 products 200 products
3 Roadmap / tasks Statuses, dependencies 15 tasks 50 tasks 120 tasks
4 Business rules Conditional logic 8 rules 25 rules 50 rules
5 Few-shot classification Input-output examples 5 examples 15 examples 40 examples
6 Organizational hierarchy 3 levels of nesting 12 people 60 people 150 people
7 API documentation Endpoints, parameters 5 endpoints 15 endpoints 30 endpoints
8 Output format Requesting data in a given format 10 countries 50 countries 100 countries

Few-shot (scenario 5) is a prompting technique in which several “input → output” examples are included directly in the prompt so the model can infer the task from a pattern. For example: "Great product!" → positive, "Terrible quality" → negative, then the question "Love it!" → ?. Zero examples is zero-shot, one example is one-shot, several examples is few-shot. The format of those examples directly affects cost: 40 pairs in JSON take 2,131 tokens; in TOON, 996.

For scenarios 2, 3, 6, and 7, I prepared questions with precomputed correct answers (ground truth). For scenarios 1, 4, and 5, scoring was manual and rubric-based. For scenario 8, I measured output tokens and format compliance.

Models and Pricing

Model Tier Input ($/1M) Output ($/1M)
Claude Haiku 4.5 Fast $0.80 $4
Claude Sonnet 4.6 Mid $3 $15
Claude Opus 4.6 Premium $15 $75

Accuracy was measured across all three tiers. Sizes S and M were tested for accuracy. L-size was used only for token counts.

Clean-Test Principle

All requests were sent directly via the anthropic Python SDK: plain client.messages.create() with temperature=0. No MCP servers, IDE plugins, or agent frameworks.

Token counting was done with client.messages.count_tokens() – Anthropic’s production tokenizer, i.e. the same numbers used for billing. The tokenizer is the same across all Claude tiers – so the token-count data applies to all Claude models.

Benchmark code: github.com/webmaster-ramos/yaml-vs-md-benchmark


Input-Token Efficiency

These numbers apply to all Claude tiers – Haiku, Sonnet, and Opus all use the same tokenizer. The only cost difference comes from the price per token.

Summary Table: Average Input Tokens Across All Scenarios

Format Average tokens vs JSON
JSON 3,252 baseline
YAML 2,208 -32%
Markdown 1,514 -53%
Plain Text 1,391 -57%
TOON 1,226 -62%

TOON saves 62% of input tokens on average versus JSON. Markdown saves 53%. YAML, despite its minimal punctuation, saves only 32% – because of repeated keys and indentation overhead.

Breakdown by Scenario (% Savings vs JSON, L-size)

Scenario YAML MD TXT TOON
Instructions -22% -29% -24% -24%
Products -29% -51% -53% -62%
Tasks -35% -63% -69% -73%
Business Rules -28% -52% -48% -63%
Few-shot -31% -45% -37% -53%
Hierarchy -37% -61% -67% -68%
API Docs -35% -45% -59% -53%

YAML Savings vs JSON (%, L-size)

YAML savings vs JSON by scenario

MD Savings vs JSON (%, L-size)

MD savings vs JSON by scenario

TXT Savings vs JSON (%, L-size)

TXT savings vs JSON by scenario

TOON Savings vs JSON (%, L-size)

TOON savings vs JSON by scenario

Detailed Charts by Scenario

Input tokens by scenario: Instructions

Input tokens: Instructions)

Input tokens by scenario: Products

Input tokens: Products

Input tokens by scenario: Tasks

Input tokens: Tasks

Input tokens by scenario: Rules

Input tokens: Rules

Input tokens by scenario: Few-shot

Input tokens: Few-shot

Input tokens by scenario: Hierarchy

Input tokens: Hierarchy

Input tokens by scenario: API Docs

Input tokens: API Docs

Key Observations

  1. TOON is the clear leader for tabular data. Product catalogs, task lists, few-shot examples – anything that looks like an array of homogeneous objects. Savings: 62–73% versus JSON.

  2. Markdown is the best all-purpose format. A stable 50–65% reduction across all data types. It is the only format that performs consistently well across tables, instructions, and hierarchies.

  3. YAML is underwhelming. Many people expect YAML to be much more compact than JSON. In practice, the savings are only 14–41%. The reason is repeated keys for every array element.

  4. Plain Text wins on API docs. For technical specifications, plain text is more efficient than TOON (59% vs 53%). Without extra syntax, descriptive text compresses better.

  5. Scale barely affects the percentage savings. The difference between S and L is under 2 percentage points. Format drives efficiency more than data volume does.


Haiku 4.5: When Format Matters

Haiku is the most format-sensitive tier. In 35% of questions, it produced different answers depending on the input format. Accuracy spread reached as high as 36 percentage points between the best and worst format within the same scenario.

Accuracy by Scenario

Accuracy Haiku: Products (product catalog)

Accuracy Haiku: Products

Accuracy Haiku: Tasks (tasks / roadmap)

Accuracy Haiku: Tasks

Accuracy Haiku: Hierarchy (organizational hierarchy)

Accuracy Haiku: Hierarchy

Accuracy Haiku: API Docs (documentation)

Accuracy Haiku: API Docs

Scenario JSON YAML MD TXT TOON Best
Products 63.4% 61.4% 69.2% 70.2% 66.2% TXT
Tasks 71.0% 65.7% 66.7% 56.7% 65.3% JSON
Hierarchy 85.7% 92.9% 85.7% 78.2% 85.7% YAML
API Docs 85.7% 85.7% 57.1% 78.6% 85.7% JSON/YAML/TOON

Hierarchy shows the sharpest gap: YAML (92.9%) vs Markdown (57.1%) – a 36-point difference. Tree-like structures are clearly easier for Haiku to parse in an indentation-based format.

API Docs: Markdown performs unexpectedly poorly – 57.1% vs 85.7% for JSON. For technical specifications with parameters and types, explicit structure matters more than compactness.

Accuracy by Size (Haiku)

Size Accuracy
S (small data) 80.3%
M (medium data) 67.2%

Scale matters more than format. Accuracy drops by 13 points when moving from S to M – more than the average difference between formats (5.7 points). The implication is straightforward: reduce data volume first, then optimize format.

Cost: Haiku

Format Avg tokens Cost / request 100K requests / month
JSON 3,252 $0.0026 $260
YAML 2,208 $0.0018 $177
MD 1,514 $0.0012 $121
TXT 1,391 $0.0011 $111
TOON 1,226 $0.0010 $98
JSON -> TOON - -62% $162/month

Output Format: Haiku

Output tokens: S-size (10 countries) – Haiku, Sonnet, Opus

Output tokens S-size, all 3 models

Output tokens: M-size (50 countries) – Haiku, Sonnet, Opus

Output tokens M-size, all 3 models

Requested format S (10 countries) M (50 countries) Savings vs JSON
JSON 465 1,985 baseline
YAML 296 1,352 -32..36%
Markdown 165 1,125 -43..65%
Plain Text 294 1,381 -30..37%
TOON 342 1,369 -26..31%

Markdown is the cheapest output format on Haiku. 165 vs 465 tokens on S-size – a 65% reduction. At $4 per 1M output tokens, that matters.

Important: TOON loses on output. Haiku does not know the TOON format and, instead of producing compact CSV-like rows, tends to emit verbose plain text that only vaguely resembles TOON. A few-shot example improves TOON output quality, but it still trails Markdown in efficiency.

Output-Format Choice: Technical Requirements

Output cost is not the only thing that matters. Often, Claude’s response must be processed programmatically – parsed, inserted into a database, or passed to another service. The best output format depends on who or what is going to read it.

Usage scenario Recommendation Why
User-facing answer in UI Markdown Renders natively, lowest token cost
Backend parsing JSON Reliable, universal, guaranteed structure
Config / YAML pipeline YAML Human-readable + machine-parsable
Rows for CSV / spreadsheet TXT Minimal overhead, structure via delimiters
Compact output for TOON SDK TOON Only if using Opus, or with a few-shot example

Rule of thumb: if a human reads the output, use Markdown. If code reads it, use JSON or YAML. Do not optimize output cost at the expense of parsing reliability in production.

Recommendations for Haiku

Data type Best input Accuracy Best output
System prompts MD stable MD
Catalogs, lists TXT 70.2% MD
Tasks / roadmap JSON 71.0% MD or JSON
Hierarchies YAML 92.9% YAML
API documentation JSON or YAML 85.7% JSON
Few-shot examples TOON 65.3% (-0.5% vs JSON) MD

On Haiku, format matters – especially for hierarchies and API documentation. Use TOON on input where token savings are worth a small accuracy trade-off, but do not use TOON on output without a few-shot example.


Sonnet 4.6: Format Affects Cost, Not Quality

Sonnet 4.6 produced identical answers across all five formats. In 100% of questions, the result was the same regardless of how the data was represented. For Sonnet, format optimization is pure cost reduction with no quality trade-off.

Accuracy: Format-Invariant

Accuracy by model and format

Accuracy by model and format

Format Sonnet 4.6
JSON 89.4%
YAML 89.4%
Markdown 89.4%
Plain Text 89.4%
TOON 89.4%

The answers are completely identical across all formats. Switching from JSON to TOON saves 62% of input tokens while preserving the same output.

Cost: Sonnet

Format Avg tokens Cost / request 100K requests / month
JSON 3,252 $0.0098 $975
YAML 2,208 $0.0066 $663
MD 1,514 $0.0045 $454
TXT 1,391 $0.0042 $417
TOON 1,226 $0.0037 $368
JSON -> TOON - -62% $607/month

At 100K requests per month, switching from JSON to TOON saves $607/month. On Sonnet, output costs $15 per 1M tokens, so output optimization also matters.

Output Format: Sonnet

Output tokens for Sonnet (estimated as characters ÷ 3.5 chars/token):

Format S (10 countries) M (50 countries)
JSON ~210 ~1,120
YAML ~195 ~1,023
Markdown ~143 ~746
Plain Text ~103 ~549
TOON ~86 ~414

Comparison of output tokens across all three models (S-size):

Output tokens S-size, all 3 models

M-size (50 countries):

Output tokens M-size, all 3 models

On Sonnet, TOON output requires a few-shot example. Without extra context, Sonnet interprets “TOON format” literally – as an abbreviation connected to cartoons – and returns an irrelevant answer. With a format example in the prompt, it generates correct TOON.

Technical requirements for output on Sonnet are the same as on Haiku: if a downstream system parses the response programmatically, use JSON or YAML. If a human is going to read it, use Markdown.

Recommendations for Sonnet

On Sonnet, format choice is a pure cost optimization. The logic is simple:

  • Input data: use TOON (for tables) or MD (for instructions / hierarchies)
  • Human-readable output: Markdown (-65% vs JSON)
  • Machine-parsed output: JSON (most reliable) or YAML (more compact, still parseable)
  • TOON output: add a few-shot example to the prompt; otherwise the answer may be incorrect

Optimal prompt design: MD for instructions + TOON for data + a request for MD/JSON output.


Opus 4.6: Maximum Capability, Also Format-Invariant

Opus 4.6 is the strongest model and the most expensive one. Like Sonnet, it is completely insensitive to input format. But Opus has one unique advantage: it knows TOON “out of the box.”

Accuracy: Format-Invariant

Format Opus 4.6
JSON 93.5%
YAML 93.5%
Markdown 93.5%
Plain Text 93.5%
TOON 93.5%

The answers are 100% identical across all formats. Changing format affects only cost.

Cost: Opus

Format Avg tokens Cost / request 100K requests / month
JSON 3,252 $0.0488 $4,878
YAML 2,208 $0.0331 $3,312
MD 1,514 $0.0227 $2,271
TXT 1,391 $0.0209 $2,087
TOON 1,226 $0.0184 $1,839
JSON -> TOON - -62% $3,039/month

On Opus, switching from JSON to TOON saves over $3,000/month at 100K requests. Output costs $75 per 1M tokens – so format optimization has the largest financial impact here.

Output Format: Opus

Output tokens for Opus (estimated as characters ÷ 3.5 chars/token):

Format S (10 countries) M (50 countries)
JSON ~254 ~1,271
YAML ~286 ~1,414
Markdown ~177 ~814
Plain Text ~194 ~986
TOON ~106 ~543

Comparison of output tokens across all three models (S-size):

Output tokens S-size, all 3 models

M-size (50 countries):

Output tokens M-size, all 3 models

Opus generates TOON without hints. That is the key difference from Sonnet and Haiku. Opus knows the format and produces valid TOON output on the first try.

Can Claude generate valid TOON output?

TOON output generation across models

Model Without example in prompt With few-shot example
Opus 4.6 Valid TOON Valid TOON
Sonnet 4.6 Cartoon / irrelevant Valid TOON
Haiku 4.5 Verbose plain text Closer to TOON, but still inaccurate

In practical terms, this means: if you need TOON output and want it to work reliably without prompt scaffolding, use Opus.

Technical Requirements for Output: When Parsing Matters More Than Cost

On Opus, output costs $75 per 1M tokens – so output-format savings are highly relevant. But the requirements of the downstream system still take priority:

Scenarios where output must be parsed programmatically:

  • The response goes into a database or structured store – use JSON
  • Another LLM or service consumes the response through an API – use JSON or YAML
  • The response is part of a pipeline (the next step processes the data) – use JSON
  • The response is rendered in the UI as text or a document – use Markdown (lowest token cost)
  • You need compact machine-readable output and already have a TOON SDK – use TOON (only Opus works reliably without prompt help)

The key point: output on Opus costs $75 per 1M – five times more than input. A 65% output reduction (Markdown vs JSON) can matter even more than input savings. But do not trade away parse reliability just to cut cost.

Recommendations for Opus

  • Input: TOON for tabular data (-62%), MD for instructions (-53%)
  • Human-readable output: Markdown (-65% output tokens)
  • Machine-parsed output: JSON – reliable and universal
  • TOON output: works without few-shot – Opus’s unique advantage
  • Do not use JSON on input: it is the most expensive format with no accuracy benefit

Summary Results

Accuracy Across All Models and Formats

Format Haiku 4.5 Sonnet 4.6 Opus 4.6
JSON 75.3% 89.4% 93.5%
YAML 75.1% 89.4% 93.5%
Markdown 69.6% 89.4% 93.5%
Plain Text 70.6% 89.4% 93.5%
TOON 74.8% 89.4% 93.5%

For Sonnet and Opus, format does not affect accuracy. For Haiku, it matters materially – especially for hierarchies and documentation.

Decision Matrix: Input Format

Data type Haiku Sonnet / Opus
System prompts / instructions MD (-29%) TOON or MD
Catalogs, lists TXT (70.2%) TOON (-62%)
Tasks / roadmap JSON (71.0%) TOON (-73%)
Business rules JSON (stable) TOON (-63%)
Few-shot examples TOON (≈JSON) TOON (-53%)
Hierarchies YAML (92.9%) TOON or MD
API documentation JSON/YAML (85.7%) TXT (-59%)

Decision Matrix: Output Format

Output consumer Recommendation Haiku Sonnet Opus
UI / end user Markdown native native native
API / JSON parser JSON reliable reliable reliable
YAML pipeline YAML reliable reliable reliable
TOON SDK TOON with few-shot* with few-shot* native
CSV / spreadsheet TXT with template with template with template

*Requires a few-shot example in the prompt


Benchmark Limitations

  • Accuracy was measured only on S+M sizes. L-size includes token counts only. Accuracy may degrade more sharply on larger data.
  • The data is synthetic. Catalogs and tasks were script-generated. Real-world data may be messier (missing fields, Unicode, long descriptions).
  • Automatic scoring covers 4 of 8 cases. Cases 1, 4, and 5 require rubric-based evaluation. The accuracy numbers here cover cases 2, 3, 6, and 7.
  • Sonnet / Opus were tested via subscription (subagents). Output-token counts are estimated, not directly measured. Haiku was tested via API.
  • No A/B test on live traffic. This is a laboratory benchmark. The impact on a production product must be validated separately.

The code and data are open – reproduce it, extend it, challenge it.


What Surprised Me

  1. Opus and Sonnet are completely insensitive to format. I expected a 3–5% gap. I got 0%. For the higher tiers, format is pure cost optimization.

  2. YAML is not as efficient as many assume. The expectation is usually “YAML is more compact than JSON.” In practice, the savings are only 32%. Repeated keys wipe out much of the benefit from removing braces.

  3. TOON works on Claude without special training. Claude may not have seen much TOON in training data, yet all three tiers parse it correctly – essentially on par with JSON.

  4. Opus knows TOON; Sonnet does not. Opus generates valid TOON output without hints. Sonnet interpreted “TOON format” as “cartoon” and produced an irrelevant answer. With a few-shot example, both work correctly.

  5. Markdown is the best output format. The gap in output tokens between JSON and Markdown is 65%. At $75 per 1M on Opus, that is significant. It is also the only format every tier generates natively without extra prompting.

  6. On Haiku, scale matters more than format. Accuracy drops from 80.3% (S) to 67.2% (M) – a 13-point drop. The average difference between formats is 5.7 points. On Sonnet and Opus, scale is much less of an issue.


FAQ

Q: Do these results apply to other models (GPT, Gemini)?

The trends are similar, but the numbers differ. Every model has its own tokenizer. On GPT-5 Nano, YAML shows 62% accuracy on nested data (ImprovingAgents); on Claude Haiku, it reaches 93%. Use these results for Claude, and other benchmarks for other models.

Q: How were tokens counted?

Using client.messages.count_tokens() – the standard Anthropic SDK method and production tokenizer. These are the same numbers used for billing. The tokenizer is the same across all tiers.

Q: Why not test XML?

XML is rarely used in modern LLM workflows. Existing benchmarks (ShShell) suggest that XML is significantly more expensive than Markdown in token terms, with comparable or worse accuracy.

Q: Is TOON a serious format or just hype?

TOON v1.0 was released in November 2025 under MIT, and there are SDKs in 6+ languages. For tabular data, the savings are real – 62% on Claude with JSON-level accuracy. Opus generates TOON output without prompting. Other tiers require a few-shot example.

Q: Does the input format affect the output format?

Partially. If you provide data in YAML, Claude is more likely to structure its answer with indentation. But an explicit instruction such as “Return as a Markdown table” overrides that tendency.

Q: Is it worth converting all prompts away from JSON?

At 100K requests/month on Sonnet, moving from JSON to TOON saves $607/month. On Opus, it saves $3,039/month. For hobby projects with 1K requests, the difference is around $6. Run the math on your own usage.

Q: Can you combine formats in one prompt?

Yes – and that is usually the recommended approach. Markdown for instructions + TOON for data + a request for output in the format you need. Claude handles multi-format prompts well.

Q: Where is the benchmark source code?

github.com/webmaster-ramos/yaml-vs-md-benchmark. All 120 data files, 51 questions, ground truth, runner, and scorer are open for reproduction.


Conclusion

Data format in a prompt is not a cosmetic choice. On the Claude API, the gap between JSON and TOON is 62% on input tokens. Markdown saves 65% on output tokens. At 100K requests/month on Opus, that means $3,039 saved on input and even more on output.

But the main finding is not about tokens. Claude Sonnet 4.6 and Opus 4.6 are completely insensitive to format. They produced 100% identical answers on JSON, YAML, Markdown, Plain Text, and TOON. For the higher tiers, format optimization is pure savings with no quality trade-off.

Only Haiku 4.5 is meaningfully format-sensitive – and only there does the choice of format affect accuracy (by up to 36 percentage points). On Haiku, format should be matched to data type: YAML for hierarchies, JSON for tasks with dependencies.

Beyond cost, there are technical requirements: if the output must be parsed programmatically, JSON is more reliable than Markdown. If a human reads the answer, Markdown is cheaper. Opus is the only tier that generates TOON natively; Sonnet and Haiku require a few-shot example.

TL;DR by tier:

Haiku 4.5 Sonnet 4.6 Opus 4.6
Does format affect accuracy? Yes, by up to 36 points No No
Best input (data) YAML/JSON/TXT by data type TOON TOON
Best input (instructions) MD MD MD
Best output (human-readable) MD MD MD
Best output (parsing) JSON JSON JSON
TOON output without prompt help No No Yes
JSON -> TOON savings $162 / 100K $607 / 100K $3,039 / 100K

Benchmark run in April 2026 on Claude Opus 4.6, Sonnet 4.6, and Haiku 4.5.
120 data files, 8 scenarios, 3 sizes, 5 formats, 3 models.
All code and data: github.com/webmaster-ramos/yaml-vs-md-benchmark

Top comments (0)