Makroumi

Posted on Jun 1

Why JSON is Becoming a Bottleneck for AI Agents

#ai #llm #agents #machinelearning

The AI industry is racing toward larger context windows.

Models now accept hundreds of thousands or even millions of tokens. Agent frameworks coordinate dozens of specialized workers. Memory systems store increasingly large traces. Tool execution histories continue to grow.

Yet almost all of this infrastructure still relies on JSON.

JSON was designed for web applications.

It was not designed for autonomous AI systems.

The Problem

Consider a typical agent workflow.

A planner creates tasks.

An executor calls tools.

A memory layer stores observations.

A retrieval system injects context.

Every step serializes and deserializes structured data.

In many systems, JSON is processed thousands of times during a single workflow.

The format works.

The cost accumulates.

Repeated keys increase payload size.

Token counts grow unnecessarily.

Semantic mistakes pass validation because JSON only validates syntax.

The result is more bandwidth, more storage, more context consumption, and more opportunity for agent failures.

Existing Formats Solve Different Problems

Protocol Buffers solve schema contracts.

MessagePack reduces binary size.

Apache Arrow optimizes analytics.

Parquet optimizes storage.

None were designed around the constraints of modern agent systems:

Context budgets.

Token efficiency.

Agent communication.

Semantic validation.

Model generated structured output.

These are increasingly important workloads.

Introducing ULMEN

ULMEN, Ultra Lightweight Minimal Encoding Notation, was built specifically for AI and agent infrastructure.

Instead of treating LLMs as an afterthought, ULMEN treats them as a primary consumer of data.

The format provides four complementary surfaces:

LUMB for highly compact binary transport.

ULMEN Text for human readable workflows.

ULMEN LLM for token efficient model interaction.

ULMEN AGENT for semantically validated agent communication.

The goal is not simply to compress data.

The goal is to reduce the cost and complexity of moving structured information through AI systems.

What Makes It Different

ULMEN applies several techniques that traditional formats typically do not combine:

Shared string pools.

Column aware encoding.

Typed self describing headers.

Exact token counting.

Context compression primitives.

Semantic validation for agent traces.

For example, ULMEN AGENT can reject invalid workflows such as:

Tool calls without matching results.

Invalid record types.

Broken step ordering.

Malformed agent traces.

JSON will happily serialize all of these.

ULMEN can detect them before they reach the model.

Benchmark Results

In benchmark workloads consisting of 1,000 mixed records:

ULMEN LLM reduced token usage by approximately 44 percent compared to compact JSON.

ULMEN Binary reduced payload size to roughly 22 percent of JSON.

The Rust implementation delivered performance competitive with optimized JSON libraries while producing significantly smaller outputs.

More importantly, the savings compound.

In large scale agent deployments, serialization efficiency directly affects model costs, storage requirements, network traffic, and latency.

Example

from ulmen import encode_ulmen_llm

payload = encode_ulmen_llm(records)

The resulting payload contains a typed schema header and compact record representation designed for efficient model consumption.

Why This Matters

The industry has spent years optimizing models.

The next wave of gains may come from optimizing the infrastructure around them.

As agent systems become more complex, serialization is no longer just a storage concern.

It becomes part of the intelligence stack itself.

That is the problem ULMEN was built to solve.

DEV Community