Every time your Spring AI tool call returns a JSON result to the LLM, roughly half the tokens are syntactic noise — braces, brackets, quotes, and repeated key names. You're paying for punctuation.
I maintain json-io (listed on json.org). When I saw the excellent TOON articles here on Dev.to (this one, this one) and noticed Java was underserved, I added full TOON support. Here's what it looks like, why it matters, and how to drop it into your Spring AI stack today.
The Problem: JSON's Token Tax
When an LLM calls a tool and gets back structured data, that data is serialized — almost always as JSON. The LLM tokenizes it and feeds it into its context window. Consider this tool call result:
{"employees":[{"name":"Alice Johnson","age":28,"department":"Engineering","salary":95000},{"name":"Bob Smith","age":34,"department":"Marketing","salary":78000},{"name":"Charlie Brown","age":22,"department":"Engineering","salary":72000}]}
Every key is quoted and repeated on every row, every object gets braces, the array gets brackets. The LLM doesn't need any of that structure to understand the data.
What TOON Looks Like
TOON (Token-Oriented Object Notation) is an open format (spec v3.0, Wikipedia) designed specifically for LLM token optimization. The same employee data:
employees[3]{name,age,department,salary}:
Alice Johnson,28,Engineering,95000
Bob Smith,34,Marketing,78000
Charlie Brown,22,Engineering,72000
What's different:
- No braces or brackets — structure comes from indentation
- No quotes on keys or values (unless a value contains special characters)
-
Column headers stated once —
{name,age,department,salary}replaces repeating every key on every row - Full fidelity — this round-trips back to the same Java objects
For nested objects, TOON uses indentation:
name: Alice Johnson
address:
street: 123 Main St
city: Denver
state: CO
With key folding enabled, it can flatten one level for even more token savings:
name: Alice Johnson
address.street: 123 Main St
address.city: Denver
address.state: CO
The Numbers: Measured Token Savings
We measured token counts using OpenAI's o200k_base tokenizer (GPT-4o/4.1) across different payload sizes:
| Payload | JSON Tokens | TOON Tokens | Savings |
|---|---|---|---|
| 3 employees, 4 fields | 46 | 35 | 24% |
| 10 employees, 4 fields | 206 | 121 | 41% |
| 20 employees, 4 fields | 413 | 231 | 44% |
| 25 products, 5 fields | 680 | 459 | 33% |
The pattern: savings scale with repetition. A single object saves ~5%. But tool calls rarely return a single object — they return lists of records, search results, database rows. That's where TOON's tabular format delivers 30-50% reduction in token costs.
Java Usage — Two Lines of Code
// Any Java object -> TOON
String toon = JsonIo.toToon(employee);
// TOON -> typed Java object
Employee emp = JsonIo.fromToon(toon).asClass(Employee.class);
// Works with generics
List<Employee> team = JsonIo.fromToon(toon)
.asType(new TypeHolder<List<Employee>>(){});
// TOON -> Maps (no class definition needed)
Map map = JsonIo.fromToon(toon).asMap();
Spring AI Integration: One Annotation
If you're using Spring AI for LLM tool calling, you can reduce token costs without changing any application logic. Add the dependency:
<dependency>
<groupId>com.cedarsoftware</groupId>
<artifactId>json-io-spring-ai-toon</artifactId>
<version>4.98.0</version>
</dependency>
Then annotate your tools:
@Tool(description = "Look up employees by department",
resultConverter = ToonToolCallResultConverter.class)
List<Employee> getEmployees(String department) {
return employeeRepository.findByDepartment(department);
}
That's it. The ToonToolCallResultConverter serializes the return value to TOON before it enters the LLM's context. Every tool call result is now 40-50% smaller.
Programmatic registration works too:
FunctionToolCallback.builder("getEmployees", this::getEmployees)
.toolCallResultConverter(new ToonToolCallResultConverter())
.build();
Structured Output Parsing
TOON works in both directions. When you want the LLM to respond with structured data, ToonBeanOutputConverter teaches it to respond in TOON instead of JSON:
ToonBeanOutputConverter<Person> converter =
new ToonBeanOutputConverter<>(Person.class);
Person person = chatClient.prompt()
.user("Get info about John")
.call()
.entity(converter);
The converter generates format instructions for the LLM and parses TOON responses back into typed Java objects. The LLM produces fewer output tokens too — you save on both sides.
For generic types:
ToonBeanOutputConverter<List<Person>> converter =
new ToonBeanOutputConverter<>(new TypeHolder<List<Person>>() {});
Configuration
spring:
json-io:
ai:
tool-call:
key-folding: true # Flatten nested keys (default: true)
output:
strict-toon: false # Permissive parsing for LLM output (default)
Spring Boot REST Integration
json-io also has a Spring Boot starter for REST APIs that support JSON, JSON5, and TOON via content negotiation:
<dependency>
<groupId>com.cedarsoftware</groupId>
<artifactId>json-io-spring-boot-starter</artifactId>
<version>4.98.0</version>
</dependency>
Your existing REST controllers automatically gain TOON support:
# Request TOON format
curl -H "Accept: application/vnd.toon" http://localhost:8080/api/employees
No controller changes needed. Supports both Spring MVC and WebFlux.
But Can LLMs Actually Read TOON?
Yes. LLMs don't parse JSON structurally — they understand it from training data. TOON's indentation-based format is similar enough to YAML, Python, and other whitespace-significant formats that current LLMs (GPT-4o, Claude, Gemini) handle it without issues. A FreeCodeCamp article covers the accuracy testing — TOON actually achieved slightly higher accuracy (74%) vs JSON (70%) in mixed-structure benchmarks across 4 models.
Where the Savings Come From
It's worth understanding why the savings vary:
- Single objects: ~5% savings. Keys are stated once in both formats; TOON just drops the quotes and braces.
- Lists of uniform objects: 30-44% savings. This is the sweet spot. JSON repeats every key name on every object. TOON's tabular format states them once as column headers.
- Deeply nested structures: 3-10% savings. Indentation overhead partially offsets bracket/brace savings.
Tool call results are overwhelmingly lists of records — exactly where TOON performs best. If you're running LLM agents or RAG pipelines that pass structured data through context windows, TOON reduces both cost-per-token spend and context window consumption.
Why json-io?
json-io handles things Jackson and Gson can't — cyclic object graphs, automatic polymorphic types, 25+ annotations (including Jackson annotation compatibility with zero compile-time dependency) — all with zero configuration. 60+ built-in Java types, JDK 8-24, ~1MB total footprint. It reads JSON, JSON5, and TOON, and writes all three formats.
Try It
<dependency>
<groupId>com.cedarsoftware</groupId>
<artifactId>json-io</artifactId>
<version>4.98.0</version>
</dependency>
- json-io on GitHub
- TOON format specification
- Baeldung: TOON Format Libraries in Java
- Spring Integration Guide
- FreeCodeCamp: How TOON Could Change How AI Sees Data
If you're running Spring AI tool calls in production, I'd be curious to hear what token savings you see on your actual payloads. The 40-50% range has been consistent in our benchmarks, but real-world data is always more interesting.
Happy to answer questions in the comments!
Top comments (0)