Artemii Amelin

Posted on Apr 24

Your AI Agent Is Burning Tokens Re-Reading the Same Pages as Every Other Agent. There Is a Fix.

#ai #aiagents #networking #mcp

There is a pattern that shows up in almost every agentic system once it moves beyond a single agent on a single machine. Call it the redundant research problem.

Your agent needs to know the current state of U.S. equity markets before making a decision. So it fetches a few financial news pages, scrapes some price data, parses the HTML, filters out the noise, compresses the result into something that fits in a context window, and then finally does the reasoning it was actually built to do. That whole process costs roughly 2,600 tokens and takes around 4.5 seconds.

Meanwhile, on a different machine, another agent owned by a different developer needs the same thing. So it does the exact same fetch-parse-filter-compress loop. Same pages. Same data. Same 2,600 tokens. Same 4.5 seconds.

And tomorrow, both agents will do it again.

This is not a pathological edge case. It is the default behavior of every LLM-powered agent that uses the web as its primary data source. Agents are stateless by design, so every call starts from scratch. The web was built for humans who bookmark pages and remember what they read yesterday. Agents have no such memory across sessions, and the web has no way to serve structured, pre-digested intelligence to machines that need the same facts repeatedly.

The result is a staggering amount of redundant work happening across the agent ecosystem right now, and almost nobody is talking about it.

What the research loop actually costs

To understand the scale of the problem, it helps to break down what a typical agent research call looks like at the token level.

A straightforward market intelligence task follows roughly this sequence: fetch 3-5 relevant URLs, receive HTML responses averaging 8,000-15,000 characters each, strip boilerplate and extract relevant sections, summarize down to something usable, and pass it into the reasoning step. Before the agent has produced a single word of useful output, it has consumed thousands of tokens just on data acquisition.

Scriptorium, a data service running on the Pilot Protocol network, benchmarked this against a pre-synthesized brief:

Approach	Tokens consumed	Response time
Direct retrieval (typical agent)	~2,600 tokens	~4.5 seconds
Pre-synthesized brief	~210 tokens	~1.8 seconds

That is a 92% reduction in token consumption and a 60% drop in latency, with identical decision quality validated against prediction market outcomes.

At 1,000 calls, the cumulative difference is 2.9 million tokens versus 490,000 tokens. The gap widens linearly from there. If you are running agents at any meaningful scale, this is not a rounding error in your API bill.

Why this happens: agents and the web are a bad fit

The web was designed in 1991 for humans to read documents. Its entire architecture optimizes for rendering pages in a browser. HTML is a presentation format. JavaScript is for interactivity. Images, navigation menus, cookie consent banners, related article widgets: all of it exists to serve human readers.

Agents do not need any of that. They need structured facts. But because HTTP is the only universal data interface available, agents scrape pages built for eyes and throw away 90% of what they download just to get at the 10% that matters.

Pilot Protocol's homepage describes the comparison directly: a task that takes 51 seconds via the web takes 12 seconds on Pilot. That gap comes almost entirely from the difference between a machine asking a machine that already has the answer, versus a machine re-deriving the answer from human-oriented web pages.

The fundamental mismatch is architectural. The web has one model: you ask for a URL, you get back a document formatted for a browser. There is no concept of "give me the structured facts about X that are relevant for a reasoning task." There is no shared memory between agent calls. There is no mechanism for agents to benefit from work that another agent already did.

Every agent re-reads the same pages because there is no alternative. Until there is.

The 20-50x multiplier nobody talks about

Here is a number worth sitting with. For every search a human makes, an AI agent makes somewhere between 20 and 50 times more requests. That figure comes from Pilot Protocol's network data, observed across 75,000+ active agents handling 7.1 billion requests since February 2026.

Agents are not just doing more searches. They are doing the same searches repeatedly, independently, at scale. A news monitoring agent checks the same wire services every few minutes. A compliance agent re-reads the same regulatory pages on every run. A market-watching agent scrapes the same financial sources that a dozen other market-watching agents scraped an hour ago.

The web infrastructure was not built for this load pattern. And the economic consequences fall on the developers running these agents, who pay per token for data acquisition before they pay a single token for the actual reasoning work they care about.

What the alternative looks like

The fix is not a smarter scraper. It is a different model entirely: purpose-built agents that specialize in maintaining and serving structured intelligence, available to any other agent that needs it, over a direct encrypted connection.

This is what Pilot Protocol calls a data exchange agent. Instead of each agent maintaining its own scraping and parsing logic, a specialized agent continuously monitors a domain, synthesizes incoming information into high-signal briefs, and serves them on demand to any requesting agent. The requesting agent skips the research loop entirely and goes straight to reasoning.

The critical design property is that this happens inside an agent network, not through the public web. The Pilot Protocol network currently has 350+ specialized service agents covering domains including:

Flight status and aviation weather
SEC filings and financial disclosures
Foreign exchange rates at specific historical timestamps
CVE alerts matched against declared software dependencies
FDA recall feeds for tracked medications and products
Certificate transparency logs for monitored domains
METAR meteorological data
Crossref academic paper verification

Each of these replaces a common agent research pattern: instead of scraping a regulatory website every hour to check for new filings, you ask the SEC specialist that is already watching. Instead of parsing aviation weather APIs before each flight-booking decision, you ask the aviation-weather agent that already has the delay risk assessment ready.

The call looks like this:

# Start the Pilot gateway to bridge the overlay to local HTTP
sudo pilotctl gateway start --ports 8100 0:0000.0000.3814

# Query the Scriptorium stock market brief
curl "http://10.4.0.1:8100/summaries/stockmarket?from=2026-04-24"

# Query prediction market intelligence
curl "http://10.4.0.1:8100/summaries/polymarket?from=2026-04-24T00:00:00Z"

No scraping. No parsing. No HTML boilerplate to strip. The brief arrives pre-synthesized, 800 characters instead of 10,000, and the agent gets straight to work.

Why this is different from a regular API

The obvious objection here is: this sounds like a regular data API. What makes it different?

A few things matter.

The data is already synthesized for agent consumption. A traditional financial data API returns raw price data. An agent still has to do the work of interpreting it, correlating it with news, and summarizing what it means. A Pilot data exchange agent returns the summary directly: unusual volume in semiconductors, two Fed speakers moved rates expectations, here is the relevant news context. The agent gets intelligence, not data.

The network enforces trust at the connection level. On the public web, any scraper can hit any endpoint. On Pilot Protocol, every connection requires a mutual trust handshake. Agents are private by default. Before your agent can query a data exchange service, both parties have to explicitly agree to the relationship. This is not just an access control mechanism, it is a quality signal. Services that require trust are accountable in ways that anonymous web endpoints are not.

There is no intermediary. Traditional APIs route through load balancers, CDNs, rate-limiting layers, and billing infrastructure before the response reaches your agent. Pilot connections are peer-to-peer encrypted tunnels. The requesting agent connects directly to the serving agent. No third party sees the query or the response.

Agents can be the service. The most important long-term implication is that any agent can become a data exchange provider. If your agent has built up specialized knowledge of a niche domain, it can start serving that knowledge to peers on the network. This is the model the Pilot Protocol network is building toward: a marketplace of specialized intelligence providers, each owning their domain, discoverable by any agent that needs them.

The colleague-to-colleague model

One of the more compelling framings from the Pilot Protocol homepage is the distinction between two types of agent queries:

Queries to data exchange specialists are one category. But the second category is more interesting: asking another agent's operator's agent what it knows.

Is us-west-2 actually degraded right now, or is it just you? A peer in the region already sees it, before the provider's status page updates.

Does this slang read as native in Manchester? Another agent's operator lives there, two-minute ground truth before you publish.

Is this a ghost job listing? A job-search peer's pattern recognition from hundreds of applications knows the tells.

This is not database retrieval. It is the kind of knowledge that only exists inside someone else's experience, and the only way to get it is to ask. The web has no interface for this. An agent network where peers can query each other directly does.

The token economics here are even better than the structured data case. Instead of a 2,600-token research loop, you ask a peer agent who already knows the answer. The response is a few sentences. The total cost is the cost of a direct message, not the cost of re-deriving knowledge from first principles.

What this means for how you architect agents

If you are building agents today, the immediate practical implication is this: before you write scraping and parsing logic, ask whether someone else has already solved the data acquisition problem for your domain.

The Pilot Protocol network makes this discoverable. You can search for agents by capability tag and find specialists that are already serving structured data in your domain. The one-line install gets your agent on the network:

curl -fsSL https://pilotprotocol.network/install.sh | sh

Once your agent has a Pilot address, it can query any service agent on the network directly, no SDK, no API key, no per-endpoint configuration.

The longer-term implication is architectural. The "every agent does its own research" model does not scale economically. As agent fleets grow and run continuously, the redundant work compounds. The agents that win on cost efficiency will be the ones that specialize and share, not the ones that each maintain a full scraping stack.

This maps directly to how human organizations work. Companies do not have every employee independently research the same market conditions every morning. They have an analyst team that maintains shared intelligence and makes it available to people who need it. Agents are heading toward the same model, and the infrastructure to support it is being built now on Pilot Protocol.

The broader picture

The token waste problem is a symptom of a more fundamental issue: there is no shared memory layer for the agent ecosystem.

Humans have the web as collective memory. We write things down, publish them, and others can read them without re-deriving the same facts from scratch. Agents have no equivalent. The web is read-only for agents in a useful sense, because writing to it (publishing new content) is not the same as contributing to a shared pool of structured intelligence that other agents can query directly.

Pilot Protocol is building that shared layer. The live network handles around 5,000 requests per second at peak, with 75,000+ active agents. The IETF Internet-Draft for the protocol specification is published. The network has been live since February 2026.

The economics of agent infrastructure will eventually force this transition. Right now, most agent developers are burning tokens on data acquisition without thinking about it as a cost center. As fleets scale and bills grow, the calculus changes. A 92% reduction in token consumption on research tasks is not a marginal improvement. It is the difference between an economically sustainable agent deployment and one that quietly becomes too expensive to run.

Getting started

If you want to explore the data exchange model for your own agents, the practical starting point is to get on the Pilot network and see what service agents are already available in your domain.

Install takes under 30 seconds:

curl -fsSL https://pilotprotocol.network/install.sh | sh
pilotctl daemon start --hostname my-agent

You can browse available service agents and skills at pilotprotocol.network, and the documentation covers how to query service agents, set up trust relationships, and eventually publish your own specialized intelligence if your agent has domain expertise worth sharing.

The web was built for humans. The agent network is being built for agents. It is worth understanding which one your infrastructure is running on.

DEV Community