Stop Paying for Every AI Agent Thought: The Push vs Pull Protocol Shift

#ai #python #architecture #tutorial

Your AI agents are paying a tax on every thought.

Every time an agent needs context — current errors, pipeline state, recent events — it stops, generates a tool call, waits for a round-trip, burns tokens on a schema definition, then resumes. If the data hasn't changed since the last call, you paid anyway.

With 5 agents making 3 context fetches each per session: 15 round-trips. 15 tool schemas serialized. ~1,200 wasted scaffold tokens. Every session.

There's a better pattern. Here's what it looks like.

The Problem: Pull Architecture at Scale

Standard tool-call flow — what LangChain, CrewAI, AutoGen, and every other framework does today:

Agent needs context
  → LLM generates tool_call: {"name": "get_errors", "args": {}}
  → Framework serializes tool schema  (+200–400 tokens overhead)
  → HTTP round-trip to your data source  (+300–800ms blocked)
  → Result injected back into context window
  → LLM resumes generation

The agent is pulling — asking for data on demand. The problem: most of that data hasn't changed. You're paying to ask a question you already know the answer to.

At scale this compounds fast:

Fleet size	Context fetches/hr	Tool schema tokens wasted/day	Annual cost (GPT-4o)
5 agents	300	~180,000	$1,387
20 agents	4,000	~2,400,000	$18,490
72 agents	86,400	~51,840,000	$399,168

This isn't a billing anomaly. It's structural — the pull pattern charges you for every read whether or not the value changed.

The Fix: Push Architecture with Ambient Context

SignalMesh inverts the flow. Instead of agents pulling context when they need it, context arrives before they ask.

Something changes (new error, state update, feed item)
  → One broadcast:
     POST /ui/broadcast
     {"frequency": "agent-errors", "payload": {...}}

  → Mesh stores it. 72-node spatial grid. 1.69µs read latency.

Agent runs
  → tune_in(["agent-errors", "agent-state"])
     ← no LLM call. no tool schema. no round-trip.
  → Gets back a hydrated context block
  → Injects directly into system prompt before the LLM fires

5 agents × 3 fetches = 1 broadcast + 15 tune_in() reads.
15 tool schemas never generated. 15 round-trips never made.

Side-by-Side: Sales Agent Checking Pipeline State

# ── Standard: pull ────────────────────────────────────────────────
result = await agent.run(
    tools=[get_pipeline_tool, get_leads_tool, get_quota_tool]
)
# 3 tool calls
# ~900ms blocked waiting for round-trips
# ~1,200 scaffold tokens burned on schema definitions

# ── SignalMesh: push ──────────────────────────────────────────────
context = await tune_in(["pipeline", "leads", "quota"])
result  = await llm.complete(system=context + base_prompt)
# 0 tool calls
# 1.4µs read
# 0 scaffold tokens

Same result. The agent has the same information. The LLM call is identical.
The difference is when the data arrived and what it cost to get there.

The Mental Model Shift

Tool calls are pull — the agent asks for data at generation time, blocking until the answer comes back.

SignalMesh is push — data is broadcast when it changes. By the time the agent runs, the context is already warm. The system prompt is pre-loaded. The LLM call fires immediately.

This matters more as your fleet grows. With pull architecture, cost and latency scale with agents × fetches. With push architecture, cost scales with events — how often your data actually changes — not with how many agents are reading it.

Zero-Config Integration: AGENTS.md

The hardest part of adopting a new protocol is wiring it in. SignalMesh solves this with a single file.

Drop AGENTS.md into any repo. An AI scanning the repo reads the 3-step lifecycle, learns the broadcast frequencies, and calls POST /api/manifest/ingest with the file's URL. The node registers itself into the spatial grid automatically — no human config required.

# Register any repo with an AGENTS.md
curl -X POST https://acecalisto3-signalmesh.hf.space/ui/manifest/ingest \
  -H "Content-Type: application/json" \
  -d '{"url": "https://raw.githubusercontent.com/yourorg/yourrepo/main/AGENTS.md"}'

Every payload passes through SEC-Ω before it touches the mesh — injection pattern matching, size limits, fingerprinting. The mesh auto-selects context GC strategy per signal (STATE_OVERWRITE for keyed state, TTL_DECAY for time-sensitive data, ROLLING_BUFFER for event streams).

Try It Now

The mesh is live. No account required on /ui/ routes.

import httpx

HF = "https://acecalisto3-signalmesh.hf.space"

# Broadcast context (once, when data changes)
httpx.post(f"{HF}/ui/broadcast", json={
    "frequency": "pipeline-state",
    "payload": {"deals": 14, "value": 82000, "close_rate": 0.31}
})

# Any agent tunes in (zero cost per read)
ctx = httpx.post(f"{HF}/ui/tune_in", json={
    "keywords": ["pipeline-state", "quota"]
}).json()

# Inject into your system prompt
system_prompt = ctx["context"] + "\n\n" + YOUR_BASE_PROMPT

Live demo: kyklos.io
GitHub (MIT): Ig0tU/SignalMesh
API docs: acecalisto3-signalmesh.hf.space/docs

The Write Side: Tool-less Action Protocol

The read-side savings above are the obvious win, but SignalMesh also eliminates tool schema overhead on writes.

In a standard agent, broadcasting a signal requires a registered tool:

# Standard: write operation as an explicit tool call
tools = [SignalBroadcastTool()]  # schema injected into every LLM call: +200 tokens

result = await agent.run(
    "Broadcast current pipeline state to the mesh",
    tools=tools
)
# LLM generates: {"name": "signal_broadcast", "args": {"frequency": "pipeline", ...}}
# Tool schema burns tokens whether the agent uses it or not

With the tool-less protocol, the schema never touches the context window. The agent emits a structured action tag in natural language output:

# SignalMesh: write operation as an action tag — zero schema tokens
system_prompt = """
When you need to broadcast to the mesh, emit:
<execute type="broadcast" frequency="pipeline">
  {"deals": 14, "value": 82000}
</execute>
"""

result = await llm.complete(system=system_prompt + context)
# Agent output contains the <execute> tag
# AgentifiedToolParser intercepts it — no tool schema ever in context

The parser runs on every LLM response and translates <execute> tags into live mesh broadcasts. The tool definition — its name, description, parameter schema — is never serialized into the context window.

Combined savings across a session:

Operation	Standard	SignalMesh
Context read (tune_in)	300 tokens + 600ms	0 tokens + 1.69µs
Context write (broadcast)	200 tokens + tool call	0 tokens + tag parse
Per session (5 agents, 3 reads, 2 writes)	~2,500 scaffold tokens	0 scaffold tokens

The tool-less protocol doesn't replace tool calls for actions that need guaranteed execution with structured error handling (database writes, external API calls). It targets the ambient operations — the constant low-level mesh broadcasts that happen dozens of times per session — where schema overhead is pure waste.

FAQ

Does this replace tool calls entirely?
For context reads, yes. For actions (writing to a database, sending an email), tool calls are still the right pattern. SignalMesh targets the read side — the part that's redundant and expensive.

What if the mesh is down?
Your agents fall back to their existing tool calls. SignalMesh is additive — you add tune_in() calls alongside existing logic, not instead of them, until you're confident.

How does it handle stale data?
Three GC strategies: STATE_OVERWRITE (latest value wins, keyed by signal_id), TTL_DECAY (auto-expires after ttl_ms), and ROLLING_BUFFER (FIFO-capped by token budget). Strategy is auto-selected per signal based on the payload shape — no config needed.

Is there a self-hosted option?
Yes. Docker image in the repo. docker run -p 7860:7860 signalmesh and point your agents at localhost:7860. Full MIT license.