You're not paying for what your LLM thinks. You're paying for what your enterprise API sends.
The Token Crisis Nobody Talks About
Everyone talks about prompt engineering, context windows, and model selection when optimizing LLM costs. But there's a silent killer hiding in plain sight — the raw JSON payloads from enterprise APIs.
When you connect an AI agent to systems like IBM Maximo, ServiceNow, or SAP, you're not just getting data back. You're getting infrastructure noise dressed up as data.
Here's what a single incident record from a Maximo OSLC API actually looks like:
{
"owner": "PRIYA.N",
"status_description": "In Progress",
"slarecords_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/slarecords",
"labtrans_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/labtrans",
"ticketprop_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/ticketprop",
"relatedrecord_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/relatedrecord",
"_rowstamp": "26338737",
"accumulatedholdtime": 0,
"class_description": "Service Request",
"class": "SR",
"changeby": "PRIYA.N",
"createdby": "MAXUSER1",
"ownergroup": "PLANTOPS",
"origfromalert": false,
...
}
That's one record. Now multiply it by 15–20 records per page. The pattern becomes painfully clear:
-
slarecords_collectionref,labtrans_collectionref,ticketprop_collectionref— pagination handles the LLM can never call -
_rowstamp— a database concurrency token the LLM has no use for -
PRIYA.N,SR,MAXUSER1,PLANTOPS— repeated identically across every single record -
class_description: "\"Service Request\"alongsideclass: \"SR\"— the same thing, twice, on every row"
On a real-world test of 15 Maximo incident records: ~5,400 tokens consumed. The LLM asked for incident data. It got database plumbing.
The Insight: Enterprise APIs Weren't Built for LLMs
Enterprise systems like Maximo were designed for browser UIs and system integrations — not token-efficient AI consumption. Their REST/OSLC APIs are built to be complete and self-describing. That's great for a UI developer. It's expensive for an LLM.
The data your agent actually needs is maybe 40–50% of what gets sent. The rest is:
- Repeated strings — owner names, status codes, class labels duplicated on every record
- Collection refs — internal pagination handles the LLM can't follow
- Internal metadata — rowstamps, localrefs, origfromalert flags
-
Verbose field names —
description_longdescription,status_description,class_descriptioneating characters - HTML markup — embedded in long description fields, consuming tokens on angle brackets
Every token of this noise costs money. On high-frequency agentic workflows — think a help desk AI processing hundreds of incident queries per hour — this adds up fast.
Introducing lean-normalizer
We built lean-normalizer — a pre-processing layer that sits between your enterprise API and your LLM tool call. It encodes the response into LEAN format: a compact, human-readable, fully reversible wire format specifically designed for LLM consumption.
LEAN stands for Lossless Enterprise API Normalization.
The key word is lossless. Nothing is dropped. Every field value is preserved. The LLM can reconstruct any original value without any client-side code.
What It Does
LEAN encoding applies three compression strategies in two passes:
1. Dictionary deduplication
Any string that repeats across records gets stored once in a ### DICT block and referenced as a *N pointer inline. PRIYA.N appearing 13 times across 15 records? Stored once. Referenced as *0 everywhere else.
2. Schema key shortening
Long field names like status_description, description_longdescription, accumulatedholdtime get replaced with short base-36 keys (d, i, 0). The mapping lives in a ### SCHEMA block the LLM reads once.
3. Noise suppression
_rowstamp, *_collectionref fields, localref, empty strings, HTML markup — stripped entirely via adapter-specific rules. The LLM never asked for them.
Two things are never compressed: href values and ISO dates. These are always emitted raw so agents can pass them directly to follow-up tool calls (PATCH, GET) without any decoding step.
The Output Format
### LEAN FORMAT v1
### DICT
*0=PRIYA.N
*1=SR
*2=Service Request
*3=MAXUSER1
*4=QUEUED
### SCHEMA
0=accumulatedholdtime
1=changeby
c=status
d=status_description
f=ticketid
h=description
i=description_longdescription
### DATA: member
_id:0 0:0 1:"*0" 3:"*1" 4:"*2" 5:"*3" c:"*4" d:"Queued" f:100171 ...
_id:1 0:0 1:"*0" 3:"*1" 4:"*2" 5:"*3" h:"High Priority: Cooling Tower..." c:"*9" ...
The LLM reads ### SCHEMA and ### DICT once, then processes each ### DATA row. Nested child objects (like relatedrecord arrays) are decomposed into named child tables linked by _p parent references — flat, indexed, readable.
Real-World Test Results
We tested against a real IBM Maximo instance, querying active incidents via the MXAPIINCIDENT OSLC Object Structure.
Dataset: 13 incident records, full oslc.select=* payload including doclinks and related records.
| Mode | Payload Size | Tokens (approx) |
|---|---|---|
Raw JSON (useLean=false) |
18,407 chars | ~4,600 |
LEAN encoded (useLean=true) |
9,921 chars | ~2,480 |
| Saving | 8,486 chars | ~2,120 tokens (~46%) |
Compression ratio: 0.539 — consistently in the 46–54% range across multiple runs on the same dataset.
The circuit breaker was not triggered (it fires only when encoding would make the payload larger, which can happen with very small or highly unique datasets).
Was Any Data Lost?
We cross-validated both responses field by field:
- All 13 records returned in both modes ✅
- All ticket IDs, descriptions, priorities, statuses matched ✅
- Related records and linked work orders preserved ✅
- Attachment/doclink metadata intact ✅
-
actualfinishdates on resolved tickets present in both ✅
Zero data loss. 46% token reduction.
LLM Compatibility
A natural question: can modern LLMs actually decode this format reliably?
We tested with Claude (Anthropic) and OpenAI tool calling. Both models handle LEAN decoding correctly — they read ### SCHEMA to resolve short keys, read ### DICT to expand *N pointers, and process ### DATA rows without confusion.
The format is intentionally designed to be self-documenting. There's no magic — it's structured text with a two-block header. Any enterprise-grade LLM that can follow instructions can decode it. The tool description tells the model the format exists upfront:
description:
'Returns open Maximo incidents. When the response contains "### LEAN FORMAT v1", ' +
'it is LEAN-encoded. Use ### SCHEMA to map short keys back to field names, ' +
'and ### DICT to expand *N pointer values. ' +
'href values and ISO dates are always emitted raw.'
That's all the LLM needs. No client SDK. No parsing library on the model side.
Using It in Your MCP Tool
Installation is a single package:
npm install @soumyaprasadrana/lean-normalizer
Drop it into any MCP tool that retrieves enterprise data:
import { LeanEncoder, MaximoAdapter } from '@soumyaprasadrana/lean-normalizer';
const encoder = new LeanEncoder({ adapter: new MaximoAdapter() });
server.tool('get_incidents', { ... }, async ({ status }) => {
const raw = await maximo.getIncidents({ status });
const result = encoder.encode(raw);
return {
content: [{ type: 'text', text: result.encoded }],
_meta: {
lean_compressed: result.compressed,
lean_ratio: result.ratio,
lean_original_bytes: result.originalSize,
lean_encoded_bytes: result.encodedSize,
},
};
});
The _meta block is optional but useful for monitoring compression stats in tool call traces.
The circuit breaker handles edge cases automatically — if encoding a small payload would make it larger, the library returns raw JSON unchanged with compressed: false. Your agent code doesn't need to handle this specially.
Built-in Adapters
The library ships adapters for three enterprise systems out of the box:
IBM Maximo (OSLC / REST API)
Detects the root array at payload.member, derives table names from OSLC Object Structure hrefs, suppresses _rowstamp / *_collectionref fields, strips spi: / rdf: namespace prefixes, strips HTML from long descriptions.
ServiceNow (Table API)
Detects root at payload.result, drops the link half of reference objects ({ link, value } → keeps value), suppresses sys_class_name, sys_domain, sys_domain_path.
SAP OData (v2 and v4)
Detects root at payload.d.results or payload.value, strips __metadata and __deferred, converts /Date(ms)/ timestamps to ISO-8601.
Writing a new adapter is ~30 lines of TypeScript — implement the LeanAdapter interface, add a fixture JSON, add a test.
The Economics
Let's make this concrete. Suppose your AI agent processes 500 Maximo incident queries per day — a moderate load for a help desk automation.
At ~4,600 tokens per raw query vs ~2,480 tokens with LEAN:
| Raw JSON | LEAN | |
|---|---|---|
| Tokens per query | ~4,600 | ~2,480 |
| Daily tokens (500 queries) | ~2,300,000 | ~1,240,000 |
| Daily saving | — | ~1,060,000 tokens |
At typical enterprise LLM pricing, that's a meaningful cost line — and it compounds directly with query volume. The heavier your agentic workload, the more LEAN pays for itself.
And this is just the input token side. Smaller context also means faster responses — the model processes less before it can start reasoning.
When to Use It (and When Not To)
Good fit:
- MCP tools connecting to Maximo, ServiceNow, SAP
- Any enterprise API that returns repetitive, field-heavy records
- Agentic workflows with high query volume
- Situations where you're approaching context window limits
Not the right tool:
- Very small payloads (1–3 records) — the circuit breaker will likely return raw JSON anyway
- Streaming responses — LEAN is a complete-payload format
- Cases where the LLM needs raw JSON for downstream tool arguments (though
hrefand dates are always raw)
What's Next
The library is experimental and actively maintained. A few things on the roadmap:
- Selective field encoding — let adapters specify which fields are agent-critical vs suppressible, giving finer control than the current skip/keep binary
- Streaming-safe chunked mode — for APIs that support server-sent events
- More adapters — Oracle EBS, Salesforce, Dynamics 365 are obvious next targets
- Benchmarks dashboard — a public comparison of compression ratios across different enterprise API response shapes
Contributions are welcome. If you're connecting an AI agent to any enterprise system and hitting token ceilings, this library might be worth a look.
Links
- 📦 npm: npmjs.com/package/@soumyaprasadrana/lean-normalizer
- 💻 GitHub: github.com/soumyaprasadrana/lean-normalizer
Built while wiring up an IBM Maximo incident planning agent. The token bills were the motivation. The 46% reduction was the result.
Top comments (0)