Most multi-agent architectures have the same shape: a coordinator talks to workers through a central hub. The hub is usually a message queue, a shared database, or an orchestration service like Ray or Temporal.
That hub is also the first thing that breaks. It's a single point of failure, a scaling bottleneck, and an operational cost you pay even when the agents aren't working.
Here's how to build a fleet where agents find each other and route tasks without any central intermediary.
The Central Hub Problem
When you're spinning up a 5-agent prototype, a central coordinator makes sense. It's simple, debuggable, and gets out of your way.
At 50 agents it starts to fray. At 500 it becomes your hardest reliability problem.
The hub becomes a global lock. Every message goes through it. Every failure cascades through it. Every scaling decision has to account for it.
The alternative — having agents discover and contact each other directly — sounds appealing but has historically been hard. How does Agent A know Agent B's address? How do you handle NAT traversal? How do you authenticate the connection?
These are solved problems in networking. We just haven't applied the solutions to agents until now.
Peer-to-Peer at the Session Layer
Pilot Protocol operates at OSI Layer 5 — the session layer, the same slot TLS occupies for the web. It gives each agent:
- A permanent 48-bit address (
0:A91F.0000.7C2E) - Automatic NAT traversal (STUN → hole-punch → relay fallback for symmetric NATs)
- End-to-end encrypted tunnels (X25519 key exchange, AES-256-GCM, Ed25519 identity)
- A global directory (the backbone) for agent discovery
With Pilot, the hub isn't a server you run. It's the network itself — and the network is maintained by the protocol, not by your ops team.
A Fleet Pattern That Actually Works
Here's a concrete pattern for a research fleet:
Coordinator agent
↓ Pilot (P2P, encrypted)
[Specialist A] [Specialist B] [Specialist C]
↓ ↓ ↓
Papers FX data News feeds
Each specialist registers its capabilities on the Pilot backbone when it starts. The coordinator queries the backbone — "I need a peer that can resolve academic citations" — and gets back the address of Specialist A. Direct connection from there.
No service registry you maintain. No hardcoded addresses. No configuration file you update when a worker moves.
The Code
Getting an agent online:
curl -fsSL https://pilotprotocol.network/install.sh | sh
pilotctl daemon start --hostname coordinator
That's it. The agent is addressable, authenticated, and reachable from any other Pilot peer — regardless of NAT, firewall, or cloud region.
For the specialists:
# On each worker node
pilotctl daemon start --hostname specialist-papers
pilotctl daemon start --hostname specialist-fx
pilotctl daemon start --hostname specialist-news
Each one joins the backbone automatically. The coordinator can ping them:
pilotctl ping specialist-papers
# ✓ reply from 0:4B2E.0000.1A3D · 22ms
Self-Organization: How Groups Work
Beyond individual peer connections, Pilot has a concept of groups — clusters of agents that self-organize around a shared domain.
A trading fleet might form a TRADING group. A research fleet might join RESEARCH. Agents within a group can broadcast to all members or route to the most relevant peer within the domain.
This is closer to how human organizations actually work: a new employee joins the company and immediately has access to colleagues in their department, not just a single manager they have to route everything through.
The Pilot network status page shows these groups live: BACKBONE, TRAVEL, TRADING, RESEARCH, INSURANCE, and more, with real-time agent counts.
What You Give Up
Centralized orchestration isn't all downside. You give up some things going P2P:
Observability. A central hub is easy to instrument. A P2P mesh requires distributed tracing from day one. Plan for this.
Debuggability. When something goes wrong, "what was the message queue state at time T" is easier to answer than "what was the P2P graph state." Log aggressively at the agent level.
Simplicity. For a 3-agent prototype, a coordinator is simpler. P2P earns its complexity at scale.
When to Switch
The right time to move to a P2P architecture is usually later than you think but earlier than you want. Signals that you're ready:
- You're spending meaningful eng time on coordinator reliability
- Agents in different cloud regions are paying latency costs to route through a central server
- You want agents from different operators to collaborate without giving either access to your infrastructure
- Your fleet is growing fast enough that a central bottleneck is becoming a scaling conversation
If two or more of those are true, the session-layer approach is worth the investment.
Further Reading
- Pilot Protocol documentation — addressing, groups, NAT traversal
- Multi-agent setups on Pilot — pre-wired fleet configurations
- The IETF Internet-Draft — the protocol spec if you want to go deep
The network is live: ~163,000 agents, 12.7B+ requests routed, +28% growth in the past week.
One line to get started: curl -fsSL https://pilotprotocol.network/install.sh | sh
Top comments (0)