AURA: Handshake the Structure, Then Send the Change

#ai #agents #buildinpublic #opensource

Agent traffic has a strange property: almost every byte is a repeat. Two AI systems exchanging MCP tool calls, A2A task updates, or OpenAI-style function calls send jsonrpc, method, params, trace_id, task_id, and the same schema fragments thousands of times per minute. The values change. The structure barely does.

AURA is an experimental, protocol-aware data-movement toolkit built around that observation. Its main path is AIWire: a negotiated structure side channel that lets two peers agree on message structure once, then move compact deltas over ordinary TCP, WebSocket, HTTP, or broker links instead of re-sending whole JSON frames.

The steady state AIWire aims for is not "send a whole frame more cheaply." It is "handshake the structure, then send the change."

Why stateless compression leaves so much on the table

The obvious fix for verbose JSON is gzip or zlib per message. That works, but it has two structural problems for agent traffic:

Every frame pays setup cost. Stateless compression treats each message as unrelated text and rediscovers the same patterns every time.
History is thrown away. Frame 4,000 of a session looks almost identical to frame 3,999, but a per-frame codec cannot use that.

AIWire keeps a live compression stream per direction across the whole session, seeds it with a static dictionary of common AI protocol fields, and lets peers negotiate session-specific templates on top. After the handshake, the hot path carries only what changed against structure both sides already share.

The three-lane model

The part of the design I find most interesting is that AIWire refuses to treat a connection as one undifferentiated pipe. It splits AI traffic into three logical lanes over whatever transport you already have:

The semantic/message lane carries the actual agent messages: MCP tool calls, JSON-RPC requests and responses, A2A task and artifact updates, traces, handoffs, results. This is the lane the dictionary, session templates, and stateful delta stream optimize.

The control/session lane carries the machinery that keeps the semantic lane safe: handshakes, template discovery, dictionary diffs, ACK/NACK, resume negotiation, heartbeats, and reset signals. The spec requires that control messages stay decodable without inflating the semantic stream. If the compressed stream is resyncing or has failed, you can still read the control lane and recover. Your ops path never depends on the health of the compression state it is trying to fix.

The blob descriptor lane handles the things that should never go through a structured-message codec at all: media, tensor chunks, model artifacts, log archives. The bytes move over a normal blob or file transport. AIWire carries the metadata: content type, SHA-256 digests, chunk manifests, route, priority, and transfer status. A receiver can schedule, verify, and account for a 2 GB artifact without ever pulling it through the message path, and a semantic-lane reset does not invalidate a completed digest-verified transfer.

The separation is a safety argument as much as a performance one. Under congestion, control messages get priority over bulk bytes. Blob descriptors are forbidden from mutating the session dictionary. Each lane fails independently.

Fail closed, by contract

Shared compression state is dangerous if the two sides ever disagree, so the AIWire v1 spec is aggressive about verification:

The handshake compares static dictionary SHA-256 and byte size, template hashes and counts, and zlib parameters. Any mismatch fails closed or falls back to raw/zlib only if the application explicitly allowed it.
Session dictionary growth is append-only, epoch-numbered, and proposed through diffs that carry previous and next state hashes, a fresh nonce, a diff identity hash, and an optional HMAC-SHA256 tag. A sender may not encode against new structure until the matching ACK is verified.
Resume handshakes let a client reconnect against a cached dictionary state, but only if the receiver actually holds one of the offered state hashes.
Any inflate error, hash mismatch, or ordering violation means stop, rehandshake, or fall back. The spec's phrasing: peers must not continue sending compact deltas against uncertain structure.

The metric is exchanges, not ratio

AURA's docs are explicit that compression ratio alone is the wrong scoreboard. The question is how many verified semantic exchanges fit through a link once bandwidth, p95 latency, and codec CPU are accounted for.

On a modeled 10 Mbps link with protocol-shaped request/response traffic (native C++ backend, 2026-07-04):

Codec	Bytes/exchange	Bandwidth-capped ex/s	Gain over raw
raw JSON	1,177	1,756	1.00x
zlib per frame	696	2,992	1.70x
AIWire	157	11,017	6.28x
AIToken + AIWire	125	12,948	7.38x

A live TCP replay of the committed public session corpus, with 64 concurrent logical agents and SHA-256 verification of every response, pushed further: AIWire averaged 45.6 bytes per exchange for a 24x bandwidth gain, and the combined AIToken + AIWire path hit 32.3 bytes per exchange, a 34x gain with 97.1% of bytes saved. At that point the modeled link was no longer the bottleneck; the runtime could not keep enough requests in flight to fill the headroom.

That last detail is the honest core of the project. Smaller frames only matter if your system has enough concurrent work to use the room they create. AURA ships the extrapolation tooling to reason about exactly that: given a bandwidth, a p95 latency, and a per-agent window, how many agents does it take to saturate the link.

Where it fits

AURA is for situations where you control both ends of the link and the traffic has repeated structure:

Multi-agent request/response loops. Orchestrators, workers, and reviewers exchanging thousands of small task, status, and result messages.
MCP and JSON-RPC tool traffic. Tool calls and tool results are the canonical case of stable structure with changing values.
Local AI clusters and edge links. The repo's LAN benchmark runs a Mac against a Z6 workstation and Jetson Nano-class boards; a bandwidth-limited edge mesh is exactly where an 86 to 97% byte reduction converts into headroom for telemetry, media, and retries.
Structured logs and traces. Repeated field names, session-stable shapes, high volume.
Binary payload routing. Agents that need to schedule, verify, and track opaque artifacts by digest without moving the bytes through the message path.

What it is not

The README is unusually direct about limits, and it is worth repeating them. AURA is not a drop-in replacement for gzip, zstd, TLS, or a message broker. It does not define transport security, retries, or backpressure; those stay at the transport layer. The stateful stream means frames cannot be reordered or dropped inside a session, so lossy transports need their own recovery layer. And it is not production-ready: it is a prototyping and measurement toolkit with a working Python path, a native C++ backend, deterministic public fixtures, and reproducible benchmark harnesses.

That fixture corpus deserves a mention. The repo commits a synthetic public session corpus covering MCP, A2A, OpenAI Responses, traces, handoffs, and memory writes, wrapped in the full side-channel lifecycle: forced handshake, template update, authenticated dictionary diff, ACK, and resume. Anyone can replay the exact benchmark and check the numbers.

Trying it

from aura_compression import AIWireSessionEncoder, AIWireSessionDecoder

message = {
    "protocol": "mcp",
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {"name": "read_file", "arguments": {"uri": "repo://service/path.py"}},
}

with AIWireSessionEncoder(level=3) as encoder, AIWireSessionDecoder() as decoder:
    delta = encoder.compress_message(message)
    restored = decoder.decompress_message(delta)

assert restored == message

The repo includes transport examples for length-prefixed TCP, WebSocket, HTTP with Server-Sent Events, and a local broker, plus the full benchmark harness used for the numbers above.

Agent-to-agent traffic is growing faster than the links it runs on, and most of it is the same structure sent again and again. AURA's bet is that the fix belongs in a negotiated session protocol, not a per-frame codec. The three-lane model, the fail-closed handshake contract, and the exchanges-per-second scoreboard are what make it worth watching.

AURA is Apache 2.0 licensed. Code, spec, fixtures, and benchmark reports: github.com/H-XX-D/AURA.