Martina Zrnec

Posted on May 18

We benchmarked an 84% token reduction. Then we open sourced the protocol.

#ai #agents #productivity

Why agents are reading your HTML wrong, and what we did about it.

I was watching an agent answer a simple question.

The question was small. Three sentences would have covered it. The agent loaded the page, parsed the HTML, walked through nav bars, footer links, cookie banners, a sticky "subscribe to our newsletter" modal, three paragraphs of preamble, and finally, found the part it needed.

Twenty thousand tokens.

For three sentences.

This is happening everywhere right now. Quietly. Constantly. Every agent, every query, every page. We've handed agents a web that was built for human eyeballs and asked them to make it work.

It does. Expensively.

The shape is wrong

The web was built for browsers. Humans scroll, scan, skip the boilerplate. Our eyes know what nav bars look like.

Agents don't get that for free.

They read the whole thing. There's no "give me the relevant part" channel, scaffolding and all. Every header, every analytics script, every footer link in twelve languages. The cost gets paid in tokens, latency, and the slightly absurd reality that an agent might burn more compute parsing your nav menu than thinking about your content.

This is a shape problem. And no amount of optimization fixes a shape mismatch.

ACP: a shape, not a framework

ACP — Atomic Content Protocol, is an open spec for structured content envelopes.

Not a framework. Not a platform. A shape.

You pre-compute a compact, enriched representation of your content. You persist it. You serve it first. The envelope sits in front of the body, it doesn't replace it. The body is still there if anyone needs it. Most of the time, agents don't.

Built on top of MCP. Open spec, MIT licensed, npm package shipped. Designed to complement protocols already in motion, not replace them.

Here's roughly what an envelope looks like:

{
  "id": "atom_7f3a...",
  "summary": "AI is the capability of computational systems to perform tasks associated with human intelligence...",
  "classification": "reference",
  "language": "en",
  "tags": ["artificial-intelligence", "machine-learning", "deep-learning", "neural-networks"],
  "key_entities": ["OpenAI", "Google DeepMind", "Transformer Architecture", "AGI", "NLP"],
  "confidence": 0.85,
  "provenance": {
    "tool": "acp-enricher",
    "version": "0.4.2",
    "generated_at": "2026-05-14T09:12:33Z"
  },
  "agent_discoverable": true,
  "body_ref": "https://en.wikipedia.org/wiki/Artificial_intelligence"
}

Content gets broken into atoms - discrete units with stable IDs. An agent that needs one specific atom asks for that atom. Not the page. Not the full body. The atom.

That's the whole idea. The simplicity is doing a lot of the work.

The pipeline

The enrichment runs async. We're not blocking the write path or paying a real-time tax. The flow:

Content changes → trigger flips a dirty flag on the row
Queue worker picks it up out-of-band
Enricher generates the envelope (summary, tags, entities, classification)
Envelope is persisted to the database
Agent requests come in → envelope served from cache, body fetched only if asked

By the time an agent shows up, the envelope is waiting. No on-demand computation, no LLM call in the request path. Just a read.

Three modes when the agent queries:

Mode	Cost	What you get
`aco`	619 tokens	Envelope only
`full`	3,043 tokens	Envelope + scraped body
`both`	3,043 tokens	Same as `full`

The full version costs 80% more for the same query. Most of the time, that extra 80% is paying for HTML the agent didn't need.

And then we rebuilt our product around it

We didn't just publish the spec.

We rewrote Stacklist around it.

Stacky, our MCP server, now serves ACO envelopes by default. Every card in Stacklist has an envelope sitting in front of it. The dirty-flag → queue-worker → persist pipeline runs in production. By the time an agent queries Stacky, the envelope is ready.

We did this because we needed to feel it. A spec describes a shape. A product has one. Those are different things, and you only learn the difference when you're staring at a migration deciding whether the envelope is one column or its own table. (It's its own table. We tried both.)

So Stacky now talks to agents the way we wished the web talked to agents. And we can actually measure what that costs, or doesn't.

The numbers

Go ask Stacky about Wikipedia's article on Artificial Intelligence.

Full body read: ~25,000 tokens
ACO envelope read: ~350 tokens
Savings: ~99%

That's not a benchmark we ran in a notebook. That's a real query against a real page through the real product, right now.

The pattern holds across content. On a broader 13-item set:

Full bodies: ~65,000 tokens
Envelopes: ~2,800 tokens
Reduction: 84–93%, depending on the document

The savings aren't marginal. They're not 10% wins you have to graph to see. They're the kind of difference where the question stops being "is this worth doing" and becomes "why isn't everything shaped like this already."

The part we haven't solved

Here's where I have to be honest.

Every envelope is stamped. Tool, version, timestamp. You can see what produced an envelope and when. The provenance layer is real and it's working.

But the envelope claims to faithfully represent the content underneath it. And "faithfully represents" is partly a technical statement and partly a social one.

What stops someone from publishing an envelope that says one thing while the body says another? What does adversarial enrichment look like? Who watches the enrichers? When an agent reads the envelope and skips the body, which is exactly the efficiency we want, what happens when the envelope is lying?

I don't have a clean answer. There are partial ones. Signed envelopes. Verifiable enrichment chains. Reputation layers. Each of those is real work, and each shifts the problem rather than solving it.

The honest version: we built a shape that makes the agent web meaningfully more efficient. We did not solve trust. We made it more visible, which is something, but visible isn't the same as solved.

That's the part I keep sitting with.

The efficiency is real. The shape works. The numbers hold up in benchmarks and in our own product. And underneath all of it is a question — what does "faithfully represents" mean when the reader has stopped checking? — that I think is the actual hard problem of the agent web, and I don't think any of us have answered it yet.

So we're going to keep building.

And keep sitting with it.

Both at the same time.

DEV Community