Artemii Amelin

Posted on May 11

My Agent Has Been Running for 60 Days. It Has Never Had the Same IP Twice.

#ai #agents #networking #go

The agent runs on a spot instance. Spot instances get reclaimed. When that happens, a new one spins up, the agent restarts, and it gets a different IP address than it had before.

For sixty days this has happened repeatedly. Nothing downstream has broken. No other agent has needed reconfiguring. No DNS record has needed updating. Nothing has noticed.

This is not because I built clever reconnection logic. It is because the agent's address has nothing to do with its IP.

Why IP-based addressing breaks for agents

Most of the time, when you want service A to reach service B, you give service A a hostname. DNS resolves the hostname to an IP. Service A connects. This works well when service B is a stable server with a long-lived public IP and someone maintaining the DNS record.

Agents are not stable servers. They restart. They migrate between cloud providers. They run on preemptible or spot instances that disappear without warning. They run on developer laptops that switch networks. Every time any of this happens, the IP changes, and anything that depended on that IP is now pointing at nothing.

The standard workarounds are either to pay for a static IP, or to run a service discovery system that keeps a registry up to date, or to put everything behind a load balancer with a stable address. All of these add infrastructure. All of them add something that can fail.

What the address actually comes from

In Pilot Protocol, each agent's address is derived from an Ed25519 keypair that lives on disk. The keypair is generated once when the daemon first starts. The address is a mathematical function of the public key. It does not come from the network. It does not come from the machine. It does not come from anything that changes when the agent moves.

When you start the daemon:

curl -fsSL https://pilotprotocol.network/install.sh | sh
pilotctl daemon start --hostname my-agent

It prints back an address in a fixed format. That address is yours as long as that keypair file exists. Restart the daemon on the same machine, same address. Move the keypair to a different machine and start the daemon there, same address. The address travels with the key, not with the hardware.

You can check it at any time:

pilotctl info

The address in that output will be the same tomorrow as it is today regardless of what the underlying IP is.

What this means for the agents trying to reach you

When agent B wants to send a message to agent A, it uses agent A's virtual address. It does not know or care what IP agent A is currently sitting behind. The daemon handles the routing.

Internally, the daemon uses STUN to discover the current external endpoint of each peer and hole-punching to establish a direct path. When agent A restarts on a new IP, its daemon re-registers the new endpoint. Agent B's daemon picks this up and routes to the new location transparently. From agent B's perspective, agent A just had a brief connectivity blip. The address never changed.

This is what the reconnection after a restart looks like from the sending side:

pilotctl ping my-agent

It comes back as soon as the restarted daemon is up and registered. No manual step required on either end.

What I removed from my code

Before I understood this model, I handled agent addressing in the application layer. Each agent registered itself in a shared Redis key on startup with its current IP and port. Other agents looked up that key to find it. When an agent restarted it overwrote the key with its new address. When it crashed without a clean shutdown, the key went stale and other agents failed to connect until the TTL expired or someone intervened.

I had retry logic, I had fallback logic, I had a health check that ran every thirty seconds to verify the addresses were still valid. None of this was the interesting part of the system. It was all plumbing to work around the fact that IP addresses change.

After switching to keypair-derived addressing, I deleted all of it. The agents find each other by name. The name resolves to an address. The address is always current. The application layer has no idea this is happening.

The trust relationship persists too

One thing I expected to break was the trust handshake. When agent A first connects to agent B, both sides approve each other through a signed handshake:

pilotctl handshake agent-b
pilotctl approve <node_id>

I assumed that when agent A restarted on new infrastructure I would need to redo this. I did not. The trust relationship is recorded against the node ID, which is also derived from the keypair. The same key means the same node ID means the same trusted identity. Agent B recognizes agent A as the same peer it approved before regardless of where agent A is running.

When the keypair file matters

The one thing you do need to protect is the keypair file. If you lose it, you lose the address. A new keypair generates a new address and every agent that trusted the old one will not recognize the new one.

The file lives at ~/.pilot/identity.json by default. Back it up the same way you would back up an SSH private key. If you are running agents in containers or on ephemeral instances, mount the keypair from persistent storage rather than generating a new one on each startup.

This is the only persistent piece of state the addressing model requires. Everything else, the routing, the endpoint discovery, the reconnection, is handled by the daemon automatically.

What changed after 60 days

The spot instance has been reclaimed and replaced eleven times. Each time, the new instance mounts the keypair from an EBS volume, starts the daemon, and is reachable at the same address within a few seconds. The agents talking to it have never needed updating. The DNS record I used to maintain does not exist anymore.

The addressing problem turned out not to be a problem at all once the address stopped being tied to the infrastructure.

Install: curl -fsSL https://pilotprotocol.network/install.sh | sh
Docs: pilotprotocol.network/docs
GitHub: github.com/TeoSlayer/pilotprotocol

DEV Community