Your AI agents are paying a tax on every thought.
Every time an agent needs context — current errors, pipeline state, recent events — it stops, generates a tool call, waits for a round-trip, burns tokens on a schema definition, then resumes. If the data hasn't changed since the last call, you paid anyway.
With 5 agents making 3 context fetches each per session: 15 round-trips. 15 tool schemas serialized. ~1,200 wasted scaffold tokens. Every session.
There's a better pattern. Here's what it looks like.
The Problem: Pull Architecture at Scale
Standard tool-call flow — what LangChain, CrewAI, AutoGen, and every other framework does today:
Agent needs context
→ LLM generates tool_call: {"name": "get_errors", "args": {}}
→ Framework serializes tool schema (+200–400 tokens overhead)
→ HTTP round-trip to your data source (+300–800ms blocked)
→ Result injected back into context window
→ LLM resumes generation
The agent is pulling — asking for data on demand. The problem: most of that data hasn't changed. You're paying to ask a question you already know the answer to.
At scale this compounds fast:
| Fleet size | Context fetches/hr | Tool schema tokens wasted/day | Annual cost (GPT-4o) |
|---|---|---|---|
| 5 agents | 300 | ~180,000 | $1,387 |
| 20 agents | 4,000 | ~2,400,000 | $18,490 |
| 72 agents | 86,400 | ~51,840,000 | $399,168 |
This isn't a billing anomaly. It's structural — the pull pattern charges you for every read whether or not the value changed.
The Fix: Push Architecture with Ambient Context
SignalMesh inverts the flow. Instead of agents pulling context when they need it, context arrives before they ask.
Something changes (new error, state update, feed item)
→ One broadcast:
POST /ui/broadcast
{"frequency": "agent-errors", "payload": {...}}
→ Mesh stores it. 72-node spatial grid. 1.69µs read latency.
Agent runs
→ tune_in(["agent-errors", "agent-state"])
← no LLM call. no tool schema. no round-trip.
→ Gets back a hydrated context block
→ Injects directly into system prompt before the LLM fires
5 agents × 3 fetches = 1 broadcast + 15 tune_in() reads.
15 tool schemas never generated. 15 round-trips never made.
Side-by-Side: Sales Agent Checking Pipeline State
# ── Standard: pull ────────────────────────────────────────────────
result = await agent.run(
tools=[get_pipeline_tool, get_leads_tool, get_quota_tool]
)
# 3 tool calls
# ~900ms blocked waiting for round-trips
# ~1,200 scaffold tokens burned on schema definitions
# ── SignalMesh: push ──────────────────────────────────────────────
context = await tune_in(["pipeline", "leads", "quota"])
result = await llm.complete(system=context + base_prompt)
# 0 tool calls
# 1.4µs read
# 0 scaffold tokens
Same result. The agent has the same information. The LLM call is identical.
The difference is when the data arrived and what it cost to get there.
The Mental Model Shift
Tool calls are pull — the agent asks for data at generation time, blocking until the answer comes back.
SignalMesh is push — data is broadcast when it changes. By the time the agent runs, the context is already warm. The system prompt is pre-loaded. The LLM call fires immediately.
This matters more as your fleet grows. With pull architecture, cost and latency scale with agents × fetches. With push architecture, cost scales with events — how often your data actually changes — not with how many agents are reading it.
Zero-Config Integration: AGENTS.md
The hardest part of adopting a new protocol is wiring it in. SignalMesh solves this with a single file.
Drop AGENTS.md into any repo. An AI scanning the repo reads the 3-step lifecycle, learns the broadcast frequencies, and calls POST /api/manifest/ingest with the file's URL. The node registers itself into the spatial grid automatically — no human config required.
# Register any repo with an AGENTS.md
curl -X POST https://acecalisto3-signalmesh.hf.space/ui/manifest/ingest \
-H "Content-Type: application/json" \
-d '{"url": "https://raw.githubusercontent.com/yourorg/yourrepo/main/AGENTS.md"}'
Every payload passes through SEC-Ω before it touches the mesh — injection pattern matching, size limits, fingerprinting. The mesh auto-selects context GC strategy per signal (STATE_OVERWRITE for keyed state, TTL_DECAY for time-sensitive data, ROLLING_BUFFER for event streams).
Try It Now
The mesh is live. No account required on /ui/ routes.
import httpx
HF = "https://acecalisto3-signalmesh.hf.space"
# Broadcast context (once, when data changes)
httpx.post(f"{HF}/ui/broadcast", json={
"frequency": "pipeline-state",
"payload": {"deals": 14, "value": 82000, "close_rate": 0.31}
})
# Any agent tunes in (zero cost per read)
ctx = httpx.post(f"{HF}/ui/tune_in", json={
"keywords": ["pipeline-state", "quota"]
}).json()
# Inject into your system prompt
system_prompt = ctx["context"] + "\n\n" + YOUR_BASE_PROMPT
Live demo: kyklos.io
GitHub (MIT): Ig0tU/SignalMesh
API docs: acecalisto3-signalmesh.hf.space/docs
The Write Side: Tool-less Action Protocol
The read-side savings above are the obvious win, but SignalMesh also eliminates tool schema overhead on writes.
In a standard agent, broadcasting a signal requires a registered tool:
# Standard: write operation as an explicit tool call
tools = [SignalBroadcastTool()] # schema injected into every LLM call: +200 tokens
result = await agent.run(
"Broadcast current pipeline state to the mesh",
tools=tools
)
# LLM generates: {"name": "signal_broadcast", "args": {"frequency": "pipeline", ...}}
# Tool schema burns tokens whether the agent uses it or not
With the tool-less protocol, the schema never touches the context window. The agent emits a structured action tag in natural language output:
# SignalMesh: write operation as an action tag — zero schema tokens
system_prompt = """
When you need to broadcast to the mesh, emit:
<execute type="broadcast" frequency="pipeline">
{"deals": 14, "value": 82000}
</execute>
"""
result = await llm.complete(system=system_prompt + context)
# Agent output contains the <execute> tag
# AgentifiedToolParser intercepts it — no tool schema ever in context
The parser runs on every LLM response and translates <execute> tags into live mesh broadcasts. The tool definition — its name, description, parameter schema — is never serialized into the context window.
Combined savings across a session:
| Operation | Standard | SignalMesh |
|---|---|---|
| Context read (tune_in) | 300 tokens + 600ms | 0 tokens + 1.69µs |
| Context write (broadcast) | 200 tokens + tool call | 0 tokens + tag parse |
| Per session (5 agents, 3 reads, 2 writes) | ~2,500 scaffold tokens | 0 scaffold tokens |
The tool-less protocol doesn't replace tool calls for actions that need guaranteed execution with structured error handling (database writes, external API calls). It targets the ambient operations — the constant low-level mesh broadcasts that happen dozens of times per session — where schema overhead is pure waste.
FAQ
Does this replace tool calls entirely?
For context reads, yes. For actions (writing to a database, sending an email), tool calls are still the right pattern. SignalMesh targets the read side — the part that's redundant and expensive.
What if the mesh is down?
Your agents fall back to their existing tool calls. SignalMesh is additive — you add tune_in() calls alongside existing logic, not instead of them, until you're confident.
How does it handle stale data?
Three GC strategies: STATE_OVERWRITE (latest value wins, keyed by signal_id), TTL_DECAY (auto-expires after ttl_ms), and ROLLING_BUFFER (FIFO-capped by token budget). Strategy is auto-selected per signal based on the payload shape — no config needed.
Is there a self-hosted option?
Yes. Docker image in the repo. docker run -p 7860:7860 signalmesh and point your agents at localhost:7860. Full MIT license.
Top comments (0)