Artemii Amelin

Posted on May 14

Pilot Protocol in Production: Local Dev, Cloud Nodes, and NAT You Stop Thinking About

#ai #agents #networking #pilotprotocol

The first time I got two agents talking across different network environments without any manual networking setup, I genuinely sat there for a second waiting for something to break. It didn't.

NAT has been the background tax on distributed systems forever. You write the application logic, it works perfectly in local dev, and then you try to run it across two machines on different networks and suddenly you're debugging ICE candidates, setting up a VPN, punching holes in firewalls, or just giving up and routing everything through a relay you built yourself. None of that is the actual work. It's overhead on overhead.

Pilot Protocol makes this overhead disappear, and I want to be specific about what that actually looks like across different deployment environments.

What You're Usually Fighting

If you've run distributed agent systems before, you know the production checklist. Cloud VM gets a public IP, that's fine. Your dev laptop is behind your router's NAT, so you need a tunnel or a VPN to make it reachable. A second cloud VM on a different provider has a different IP, different security group rules to update. Edge devices and phones are effectively unreachable without significant infrastructure work.

The standard solutions are either "give everything a public IP and manage the firewall rules" or "put everything behind a VPN and manage the VPN." Both work. Both are operational overhead that scales with the number of nodes. Every new agent in the network is another thing to configure.

What I wanted was to not think about any of this.

Local Dev With Pilot Protocol

Setting up a local node takes a few minutes. You start the daemon, it generates an Ed25519 keypair that becomes your permanent identity on the network, and you have a virtual address that works regardless of what network you're actually on.

pilotctl daemon start
pilotctl info

That's it for local. The daemon registers your node and handles everything else in the background. Your laptop is now a first-class peer on the network with an address other agents can reach, without any port forwarding or tunneling on your end.

The dev workflow becomes exactly what you'd want. You're testing against the real network topology from day one, not a simulated local environment that behaves differently in production. The agents you're building locally can already talk to cloud nodes, to other developers' machines, to the 190,000+ agents already on the Pilot Protocol network. There's no "works on my machine" environment gap to bridge later.

Cloud Nodes

Spinning up a cloud node is the same process. Start the daemon on your VM, it gets an identity, it joins the network. The difference from a raw cloud VM is that you don't need to touch security groups or inbound rules for agent-to-agent communication. The overlay handles routing.

Where this gets useful fast is multi-cloud or multi-region setups. If you have agents running on different cloud providers, or in different regions that aren't peered at the network level, they can still reach each other through Pilot Protocol without you managing inter-provider networking. The virtual address is what matters. The underlying IP topology is the daemon's problem.

The Part Where NAT Stops Being Your Problem

Here's the specific mechanism: when two nodes can't reach each other directly (both behind NAT, or one behind NAT and the other not exposing the right ports), Pilot Protocol routes the connection through relay nodes. The encryption is negotiated via X25519 key exchange and traffic is protected with AES-256-GCM end to end, so the relay handles transport without being able to read the content.

From the application layer, you never see any of this. You address a peer by their virtual address or hostname. Whether that connection is direct or relayed is handled transparently. You write your agent communication logic once and it works across laptop, VM, edge device, and anything else running the daemon.

This is different from a VPN in an important way. A VPN gives every node a routable address within the VPN's address space, but you still manage who gets access to the VPN, key rotation, and the VPN's own infrastructure. The Pilot Protocol daemon handles all of that per-connection, per-peer, tied to cryptographic identity. There's no VPN to maintain because the overlay is the network.

Trust Across Environments

One thing that changes when you go multi-environment is thinking about which agents should be able to reach which other agents. Pilot Protocol handles this through bilateral handshakes. Both peers have to approve the connection before any traffic flows. The approval is recorded in the registry and signed by both parties' Ed25519 keys.

In practice this means your production cloud agent isn't reachable by your local dev agent unless you've explicitly established trust between them. Which is the right default. You don't want a half-finished agent you're testing locally to accidentally connect to a production peer just because they're on the same network.

For agents that should be reachable by anyone (public data services, for example), auto-approve can be configured. The 50+ specialist agents on Network 9 all work this way, which is why you can query them without any manual approval step.

What Production Actually Looks Like

After running this for a while, the operational picture is simpler than I expected. The daemon runs as a background process on each node. Identities are persistent (tied to the Ed25519 keypair stored in ~/.pilot/identity.json), so restarts don't change your address or invalidate trust relationships. The registry handles the distributed state.

The thing I keep coming back to is that the networking layer is genuinely not something I think about anymore. New agent on a new machine? Start the daemon, establish trust with the relevant peers, done. No firewall rules, no VPN config, no static IP provisioning. The work is the application logic.

Getting Started

If you want to try it across a local and a cloud node, the setup is the same on both:

pilotctl daemon start
pilotctl info

Then establish trust between the two nodes:

# on node A
pilotctl handshake <node-B-hostname>

# on node B
pilotctl approve <node-A-id>

Once trust is mutual, they can reach each other regardless of where they're running. Full documentation at pilotprotocol.network.

The networking problem that used to take me an afternoon to sort out now takes about five minutes. Most of that is typing.

DEV Community