DEV Community

Cover image for Stop Bleeding Tokens: How We Cut Enterprise API Costs by 46% with a Lossless JSON Encoder
Soumya Prasad
Soumya Prasad

Posted on

Stop Bleeding Tokens: How We Cut Enterprise API Costs by 46% with a Lossless JSON Encoder

You're not paying for what your LLM thinks. You're paying for what your enterprise API sends.


The Token Crisis Nobody Talks About

Everyone talks about prompt engineering, context windows, and model selection when optimizing LLM costs. But there's a silent killer hiding in plain sight — the raw JSON payloads from enterprise APIs.

When you connect an AI agent to systems like IBM Maximo, ServiceNow, or SAP, you're not just getting data back. You're getting infrastructure noise dressed up as data.

Here's what a single incident record from a Maximo OSLC API actually looks like:

{
  "owner": "PRIYA.N",
  "status_description": "In Progress",
  "slarecords_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/slarecords",
  "labtrans_collectionref":   "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/labtrans",
  "ticketprop_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/ticketprop",
  "relatedrecord_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/relatedrecord",
  "_rowstamp": "26338737",
  "accumulatedholdtime": 0,
  "class_description": "Service Request",
  "class": "SR",
  "changeby": "PRIYA.N",
  "createdby": "MAXUSER1",
  "ownergroup": "PLANTOPS",
  "origfromalert": false,
  ...
}
Enter fullscreen mode Exit fullscreen mode

That's one record. Now multiply it by 15–20 records per page. The pattern becomes painfully clear:

  • slarecords_collectionref, labtrans_collectionref, ticketprop_collectionref — pagination handles the LLM can never call
  • _rowstamp — a database concurrency token the LLM has no use for
  • PRIYA.N, SR, MAXUSER1, PLANTOPS — repeated identically across every single record
  • class_description: "\"Service Request\" alongside class: \"SR\" — the same thing, twice, on every row"

On a real-world test of 15 Maximo incident records: ~5,400 tokens consumed. The LLM asked for incident data. It got database plumbing.


The Insight: Enterprise APIs Weren't Built for LLMs

Enterprise systems like Maximo were designed for browser UIs and system integrations — not token-efficient AI consumption. Their REST/OSLC APIs are built to be complete and self-describing. That's great for a UI developer. It's expensive for an LLM.

The data your agent actually needs is maybe 40–50% of what gets sent. The rest is:

  • Repeated strings — owner names, status codes, class labels duplicated on every record
  • Collection refs — internal pagination handles the LLM can't follow
  • Internal metadata — rowstamps, localrefs, origfromalert flags
  • Verbose field namesdescription_longdescription, status_description, class_description eating characters
  • HTML markup — embedded in long description fields, consuming tokens on angle brackets

Every token of this noise costs money. On high-frequency agentic workflows — think a help desk AI processing hundreds of incident queries per hour — this adds up fast.


Introducing lean-normalizer

We built lean-normalizer — a pre-processing layer that sits between your enterprise API and your LLM tool call. It encodes the response into LEAN format: a compact, human-readable, fully reversible wire format specifically designed for LLM consumption.

LEAN stands for Lossless Enterprise API Normalization.

The key word is lossless. Nothing is dropped. Every field value is preserved. The LLM can reconstruct any original value without any client-side code.

What It Does

LEAN encoding applies three compression strategies in two passes:

1. Dictionary deduplication
Any string that repeats across records gets stored once in a ### DICT block and referenced as a *N pointer inline. PRIYA.N appearing 13 times across 15 records? Stored once. Referenced as *0 everywhere else.

2. Schema key shortening
Long field names like status_description, description_longdescription, accumulatedholdtime get replaced with short base-36 keys (d, i, 0). The mapping lives in a ### SCHEMA block the LLM reads once.

3. Noise suppression
_rowstamp, *_collectionref fields, localref, empty strings, HTML markup — stripped entirely via adapter-specific rules. The LLM never asked for them.

Two things are never compressed: href values and ISO dates. These are always emitted raw so agents can pass them directly to follow-up tool calls (PATCH, GET) without any decoding step.

The Output Format

### LEAN FORMAT v1

### DICT
*0=PRIYA.N
*1=SR
*2=Service Request
*3=MAXUSER1
*4=QUEUED

### SCHEMA
0=accumulatedholdtime
1=changeby
c=status
d=status_description
f=ticketid
h=description
i=description_longdescription

### DATA: member
_id:0 0:0 1:"*0" 3:"*1" 4:"*2" 5:"*3" c:"*4" d:"Queued" f:100171 ...
_id:1 0:0 1:"*0" 3:"*1" 4:"*2" 5:"*3" h:"High Priority: Cooling Tower..." c:"*9" ...
Enter fullscreen mode Exit fullscreen mode

The LLM reads ### SCHEMA and ### DICT once, then processes each ### DATA row. Nested child objects (like relatedrecord arrays) are decomposed into named child tables linked by _p parent references — flat, indexed, readable.


Real-World Test Results

We tested against a real IBM Maximo instance, querying active incidents via the MXAPIINCIDENT OSLC Object Structure.

Dataset: 13 incident records, full oslc.select=* payload including doclinks and related records.

Mode Payload Size Tokens (approx)
Raw JSON (useLean=false) 18,407 chars ~4,600
LEAN encoded (useLean=true) 9,921 chars ~2,480
Saving 8,486 chars ~2,120 tokens (~46%)

Compression ratio: 0.539 — consistently in the 46–54% range across multiple runs on the same dataset.

The circuit breaker was not triggered (it fires only when encoding would make the payload larger, which can happen with very small or highly unique datasets).

Was Any Data Lost?

We cross-validated both responses field by field:

  • All 13 records returned in both modes ✅
  • All ticket IDs, descriptions, priorities, statuses matched ✅
  • Related records and linked work orders preserved ✅
  • Attachment/doclink metadata intact ✅
  • actualfinish dates on resolved tickets present in both ✅

Zero data loss. 46% token reduction.


LLM Compatibility

A natural question: can modern LLMs actually decode this format reliably?

We tested with Claude (Anthropic) and OpenAI tool calling. Both models handle LEAN decoding correctly — they read ### SCHEMA to resolve short keys, read ### DICT to expand *N pointers, and process ### DATA rows without confusion.

The format is intentionally designed to be self-documenting. There's no magic — it's structured text with a two-block header. Any enterprise-grade LLM that can follow instructions can decode it. The tool description tells the model the format exists upfront:

description:
  'Returns open Maximo incidents. When the response contains "### LEAN FORMAT v1", ' +
  'it is LEAN-encoded. Use ### SCHEMA to map short keys back to field names, ' +
  'and ### DICT to expand *N pointer values. ' +
  'href values and ISO dates are always emitted raw.'
Enter fullscreen mode Exit fullscreen mode

That's all the LLM needs. No client SDK. No parsing library on the model side.


Using It in Your MCP Tool

Installation is a single package:

npm install @soumyaprasadrana/lean-normalizer
Enter fullscreen mode Exit fullscreen mode

Drop it into any MCP tool that retrieves enterprise data:

import { LeanEncoder, MaximoAdapter } from '@soumyaprasadrana/lean-normalizer';

const encoder = new LeanEncoder({ adapter: new MaximoAdapter() });

server.tool('get_incidents', { ... }, async ({ status }) => {
  const raw    = await maximo.getIncidents({ status });
  const result = encoder.encode(raw);

  return {
    content: [{ type: 'text', text: result.encoded }],
    _meta: {
      lean_compressed:     result.compressed,
      lean_ratio:          result.ratio,
      lean_original_bytes: result.originalSize,
      lean_encoded_bytes:  result.encodedSize,
    },
  };
});
Enter fullscreen mode Exit fullscreen mode

The _meta block is optional but useful for monitoring compression stats in tool call traces.

The circuit breaker handles edge cases automatically — if encoding a small payload would make it larger, the library returns raw JSON unchanged with compressed: false. Your agent code doesn't need to handle this specially.


Built-in Adapters

The library ships adapters for three enterprise systems out of the box:

IBM Maximo (OSLC / REST API)
Detects the root array at payload.member, derives table names from OSLC Object Structure hrefs, suppresses _rowstamp / *_collectionref fields, strips spi: / rdf: namespace prefixes, strips HTML from long descriptions.

ServiceNow (Table API)
Detects root at payload.result, drops the link half of reference objects ({ link, value } → keeps value), suppresses sys_class_name, sys_domain, sys_domain_path.

SAP OData (v2 and v4)
Detects root at payload.d.results or payload.value, strips __metadata and __deferred, converts /Date(ms)/ timestamps to ISO-8601.

Writing a new adapter is ~30 lines of TypeScript — implement the LeanAdapter interface, add a fixture JSON, add a test.


The Economics

Let's make this concrete. Suppose your AI agent processes 500 Maximo incident queries per day — a moderate load for a help desk automation.

At ~4,600 tokens per raw query vs ~2,480 tokens with LEAN:

Raw JSON LEAN
Tokens per query ~4,600 ~2,480
Daily tokens (500 queries) ~2,300,000 ~1,240,000
Daily saving ~1,060,000 tokens

At typical enterprise LLM pricing, that's a meaningful cost line — and it compounds directly with query volume. The heavier your agentic workload, the more LEAN pays for itself.

And this is just the input token side. Smaller context also means faster responses — the model processes less before it can start reasoning.


When to Use It (and When Not To)

Good fit:

  • MCP tools connecting to Maximo, ServiceNow, SAP
  • Any enterprise API that returns repetitive, field-heavy records
  • Agentic workflows with high query volume
  • Situations where you're approaching context window limits

Not the right tool:

  • Very small payloads (1–3 records) — the circuit breaker will likely return raw JSON anyway
  • Streaming responses — LEAN is a complete-payload format
  • Cases where the LLM needs raw JSON for downstream tool arguments (though href and dates are always raw)

What's Next

The library is experimental and actively maintained. A few things on the roadmap:

  • Selective field encoding — let adapters specify which fields are agent-critical vs suppressible, giving finer control than the current skip/keep binary
  • Streaming-safe chunked mode — for APIs that support server-sent events
  • More adapters — Oracle EBS, Salesforce, Dynamics 365 are obvious next targets
  • Benchmarks dashboard — a public comparison of compression ratios across different enterprise API response shapes

Contributions are welcome. If you're connecting an AI agent to any enterprise system and hitting token ceilings, this library might be worth a look.


Links


Built while wiring up an IBM Maximo incident planning agent. The token bills were the motivation. The 46% reduction was the result.


Top comments (0)