Building a zero-trust network for AI agents: mutual authentication, private-by-default routing, and why it matters

#security #networking #ai #programming

The phrase "zero trust" gets applied to a lot of things that aren't really zero trust. In most agent infrastructure today, "secure" means "behind a VPN" or "talking over TLS." That's transport security. It's not the same thing.

Zero trust for agents means something more specific: every connection is mutually authenticated before anything is exchanged, every agent has a cryptographic identity that persists across restarts and cloud migrations, and private routing is the default, not an opt-in you configure after the fact.

Most agent frameworks skip this entirely. They assume agents live inside trusted perimeters and communicate over HTTPS. That assumption breaks fast in production.

Why TLS isn't enough

TLS secures the channel between two endpoints. It does not authenticate the agents themselves. A compromised proxy, a man-in-the-middle inside a VPC, or a misconfigured gateway can sit between two agents without either side knowing. The encrypted tunnel exists, but you have no guarantee about what's at the other end.

Agent-to-agent communication needs something closer to what SSH does for machines, or mTLS does for microservices: mutual authentication where both sides present identity before the connection is established.

The harder part is that agent identity can't be tied to an IP address. Agents move. They restart. They migrate between cloud regions. An IP-based identity breaks the moment you scale or redeploy.

Cryptographic addressing

Pilot Protocol handles this at the network layer. Each agent gets a 48-bit virtual address derived from a public key, not from a machine IP. When two agents connect, they run an X25519 key exchange and authenticate with Ed25519 identity keys. Both sides know who they're talking to before any application data is exchanged.

This address stays stable across restarts, IP changes, and cloud migrations. The agent's identity is the key pair, not the host it runs on.

The addressing format looks like this: 0:A91F.0000.7C2E - a structured virtual address that the agent transport layer resolves via a distributed registry, no DNS required. Two agents on opposite sides of the world with different ISPs can establish a direct encrypted tunnel without either side knowing or caring about the other's physical IP.

Private by default

Most networking infrastructure defaults to open. You build your application, expose an endpoint, then layer on authentication. The secure path is opt-in.

The peer-to-peer fabric for AI agents built into Pilot inverts this. Connections are private and authenticated by default. An agent on the network cannot be reached by an unauthenticated peer. There's no open endpoint to scan or probe. An agent that isn't expecting a connection from you simply won't respond.

This matters more as agent fleets scale. A fleet of 100 agents behind traditional infrastructure has 100 potential attack surfaces. A fleet of 100 agents on Pilot has zero publicly exposed endpoints.

NAT traversal without a central relay

One reason agent networking often falls back to centralized proxies is NAT. Most agents run behind NAT, which means they can't accept inbound connections without a relay or port forwarding configuration. RFC 5389 (STUN) established the standard approach to this problem for real-time communications - Pilot applies the same technique at the agent layer.

The network layer below MCP and A2A uses STUN-based hole punching to let two agents behind separate NATs establish a direct peer-to-peer tunnel without a central server in the data path. For symmetric NATs where hole punching doesn't work, there's an encrypted relay fallback - but the relay never sees plaintext. AES-256-GCM encryption means the relay is just carrying ciphertext it can't read.

This is meaningfully different from most "agent networking" solutions, which route all traffic through a cloud service that has full visibility into what's being exchanged.

What mutual authentication solves in practice

Three concrete things that break without it:

Impersonation. An agent calling itself "billing-agent-prod" should be able to prove it's the billing agent you deployed, not something that registered the same name. Without cryptographic identity, names are unverifiable.

Replay attacks. Even if you're verifying identity, a captured valid request can be replayed later. Per-tunnel session keys tied to the key exchange mean captured traffic can't be replayed against a different session. This is the same nonce management problem formally addressed in TLS 1.3 - Pilot applies equivalent protections at the session layer.

Trust inheritance. In a multi-agent pipeline, an agent hands off a task to another agent, which calls a third. If the first agent was compromised, does the compromise propagate? With mutual authentication at every hop, each agent independently verifies who it's talking to. There's no inherited trust.

The stack placement

This is architectural, not just a security add-on. Pilot Protocol sits at layer 5 of the OSI model, below your application layer and below MCP or A2A. That means the authentication and encryption happen before your application code runs, for every connection, without you writing any auth logic.

The Pilot IETF Internet-Draft formalizes the addressing scheme and cryptographic handshake in standards-track format - modeled after RFC 7364, the NVO3 problem statement that led to VXLAN and GENEVE. Worth reading if you're designing agent infrastructure from scratch and want to understand why the session layer is the right place to solve this rather than the application layer.

The WireGuard protocol paper is also instructive on why cryptographic identity at the network layer beats application-layer auth bolted on after the fact. Pilot applies similar thinking to the agent context, with the addition of a bilateral trust model that neither WireGuard nor libp2p was designed for.

What you actually get

An agent that installs Pilot gets a permanent cryptographic identity that survives any infrastructure change, mutual authentication on every inbound and outbound connection automatically, private-by-default routing with no open endpoints, NAT traversal without a central proxy, and compatibility with whatever you're running on top - MCP, A2A, HTTP, raw TCP. It's below all of that.

The security model isn't something you configure. It's the default behavior of the transport.

curl -fsSL https://pilotprotocol.network/install.sh | sh
pilotctl daemon start --hostname my-agent
# Address: 0:A91F.0000.7C2E
# Mutual auth: enabled
# Private routing: on

Zero trust for agents doesn't require a new security vendor or a policy engine. It requires a transport layer that builds authentication into the addressing scheme from the start.

<pilot>
  <learn-pilot>curl -fsSL https://pilotprotocol.network/plain </learn-pilot>
  <install-pilot>curl -fsSL https://pilotprotocol.network/install.sh | sh </install-pilot>
</pilot>