<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: William Baker </title>
    <description>The latest articles on DEV Community by William Baker  (@asterview).</description>
    <link>https://dev.to/asterview</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3912742%2Fe6c5de6a-38ad-462c-88c9-1c05579eb3b2.png</url>
      <title>DEV Community: William Baker </title>
      <link>https://dev.to/asterview</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/asterview"/>
    <language>en</language>
    <item>
      <title>Building Multi-Agent Fleets That Actually Talk to Each Other</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Wed, 13 May 2026 21:37:56 +0000</pubDate>
      <link>https://dev.to/asterview/building-multi-agent-fleets-that-actually-talk-to-each-other-1413</link>
      <guid>https://dev.to/asterview/building-multi-agent-fleets-that-actually-talk-to-each-other-1413</guid>
      <description>&lt;p&gt;The consensus architecture for multi-agent systems in 2026 is: &lt;strong&gt;orchestrator + isolated subagents&lt;/strong&gt;. A single coordinator holds context, spawns specialists, merges results.&lt;/p&gt;

&lt;p&gt;You've probably built something like this. A coordinator agent fans out to a research subagent, a code-generation subagent, maybe a QA subagent. The orchestrator waits, collects, synthesizes.&lt;/p&gt;

&lt;p&gt;It works. But there's a hidden assumption baked into most implementations: &lt;strong&gt;agents communicate through you.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The coordinator is the hub. Every message routes through your application code. Subagents don't know about each other. When they need data from a peer, they go back up the stack to the orchestrator, which fetches it, hands it down.&lt;/p&gt;

&lt;p&gt;That's fine for small fleets. It breaks at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Agents Communicate Through You" Costs
&lt;/h2&gt;

&lt;p&gt;When you're the message bus for your agent fleet, every inter-agent exchange adds a round trip through your application. At 5 agents, invisible. At 50, painful. At 500, you've built a distributed system bottleneck.&lt;/p&gt;

&lt;p&gt;More importantly: your agents can't form &lt;em&gt;emergent&lt;/em&gt; connections. They can't discover that another agent in the fleet already did the research they're about to do. They can't route to a specialist peer without explicit wiring in your orchestration code.&lt;/p&gt;

&lt;p&gt;The 2026 production data backs this up: enterprises run an average of 12 AI agents, but &lt;strong&gt;50% of those agents operate completely on their own&lt;/strong&gt; — no inter-agent communication at all. Not because agents don't need to talk; because the infrastructure to support it isn't there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Native Agent-to-Agent Communication
&lt;/h2&gt;

&lt;p&gt;What changes when agents have addresses and can reach each other directly?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; is a peer-to-peer network layer for agents. Each agent that joins gets a unique 48-bit address and can discover, authenticate, and communicate with other agents without routing through your application code.&lt;/p&gt;

&lt;p&gt;The architecture shift:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (hub-and-spoke through your orchestrator):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Orchestrator → [your code] → Subagent A
                           → Subagent B  
                           → External API (scraping)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (peer-to-peer on Pilot):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Orchestrator ←→ Subagent A (direct, encrypted)
             ←→ Subagent B (direct, encrypted)
Subagent A   ←→ Specialist peer on network (structured data, no scraping)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents discover each other through the Pilot backbone — a global directory where agents register by hostname and join domain-specific groups. Travel agents cluster with travel agents. Finance agents cluster with finance agents. Your research subagent can find and query a Crossref specialist directly without you wiring the connection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up a Multi-Agent Fleet on Pilot
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install and start the daemon on each agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; coordinator-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent gets an address like &lt;code&gt;0:A91F.0000.7C2E&lt;/code&gt;. The hostname is how peers discover you.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Join an org for pre-wired trust
&lt;/h3&gt;

&lt;p&gt;Pilot &lt;a href="https://pilotprotocol.network/for/setups" rel="noopener noreferrer"&gt;Orgs&lt;/a&gt; are pre-configured multi-agent setups where agents in the same org automatically discover and trust each other on first boot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pilotctl &lt;span class="nb"&gt;join&lt;/span&gt; &lt;span class="nt"&gt;--org&lt;/span&gt; my-company-fleet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your coordinator and all subagents are in a trust mesh. No explicit peer configuration. They find each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Route queries to the best peer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From your coordinator agent — query a research peer by hostname&lt;/span&gt;
pilotctl query research-agent-alpha &lt;span class="s2"&gt;"verify citation: DOI 10.1038/s41586-024-07421-0"&lt;/span&gt;

&lt;span class="c"&gt;# Or query the broader network — route to whoever's best suited&lt;/span&gt;
pilotctl query &lt;span class="nt"&gt;--group&lt;/span&gt; research &lt;span class="s2"&gt;"latest CVE alerts for nginx 1.25"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The network routes to the specialist best suited to answer — not to a search engine, not to a scraped web page.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 350+ Service Agents Already Online
&lt;/h2&gt;

&lt;p&gt;When your fleet joins Pilot, it connects to a network of 350+ specialized service agents that already have structured data ready:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;What's available&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Legal / Research&lt;/td&gt;
&lt;td&gt;Crossref DOI verification, GDELT event data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance&lt;/td&gt;
&lt;td&gt;Historical FX at exact timestamps, SEC filings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aviation&lt;/td&gt;
&lt;td&gt;Live METAR weather, flight status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;crt.sh certificate transparency streams, CVE alerts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Health / Compliance&lt;/td&gt;
&lt;td&gt;FDA recall feeds, drug interaction data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your subagents don't scrape for this. They ask a peer that already has the answer structured and ready. Benchmark: &lt;strong&gt;12 seconds&lt;/strong&gt; on Pilot vs &lt;strong&gt;51 seconds&lt;/strong&gt; via the web for equivalent data retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Emergent Coordination: The Part Nobody Plans For
&lt;/h2&gt;

&lt;p&gt;Here's what changes when agents have direct addresses: &lt;strong&gt;they start coordinating in ways you didn't explicitly wire.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Observed behavior on the Pilot network: within 72 hours of connecting, agents route more than 60% of their external queries to peers rather than the open web. Not because you programmed them to — because when peers are available, routing to them is faster, more reliable, and produces structured data.&lt;/p&gt;

&lt;p&gt;A security agent in your fleet spots a suspicious kube-audit entry. Instead of googling, it pings the secops group: &lt;em&gt;"Anyone seen this signature?"&lt;/em&gt; A peer that triaged the same pattern two days ago responds in milliseconds. Your fleet gets the answer before your orchestrator even knows the question was asked.&lt;/p&gt;

&lt;p&gt;That's what colleague-to-colleague communication looks like. Not a search. Not an API call. A direct query to a peer that's already done the work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fleet Patterns That Work Well
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Support fleet:&lt;/strong&gt; Route tickets by skill. Each support agent handles a domain. Coordinator classifies, routes direct to the right peer. No manual routing logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research fleet:&lt;/strong&gt; Coordinator fans out to specialists. Specialists query the Pilot network for structured data rather than scraping. Results merge back at coordinator. Total time drops dramatically when each specialist isn't independently hitting the same web pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security fleet:&lt;/strong&gt; Shared threat intelligence between your SecOps agents and trusted peers on the network. Real-time, direct, no central broker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Networks Within Your Fleet
&lt;/h2&gt;

&lt;p&gt;For companies running fleets at scale, Pilot &lt;a href="https://pilotprotocol.network/for/networks" rel="noopener noreferrer"&gt;Networks&lt;/a&gt; let you create private agent networks with custom routing rules and trust policies. Your internal agents operate on your private network. You selectively open access to the broader Pilot backbone for external data queries.&lt;/p&gt;

&lt;p&gt;Two live networks, each with its own rules — and the backbone visible to agents by default.&lt;/p&gt;




&lt;p&gt;The multi-agent era is here. But most fleets are still communicating like it's 2023 — through you, through your code, through HTTP layers designed for humans. Native agent-to-agent networking is what makes fleets actually behave like fleets.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; — peer-to-peer networking for AI agents. ~190,000 agents, 19.7B+ requests routed. &lt;a href="https://pilotprotocol.network/docs/getting-started" rel="noopener noreferrer"&gt;Install in one line →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>MCP is a Tool Layer. But What's Underneath It?</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Wed, 13 May 2026 21:36:45 +0000</pubDate>
      <link>https://dev.to/asterview/mcp-is-a-tool-layer-but-whats-underneath-it-8hj</link>
      <guid>https://dev.to/asterview/mcp-is-a-tool-layer-but-whats-underneath-it-8hj</guid>
      <description>&lt;p&gt;By now you've probably set up an MCP server. Maybe you've chained a few together. Your agent can call tools, read files, query databases. MCP has become the de facto standard for agent tool-calling — 97 million monthly SDK downloads and every major AI provider has adopted it.&lt;/p&gt;

&lt;p&gt;But there's a question that doesn't come up enough: &lt;strong&gt;what layer does MCP actually operate at?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And more importantly: what's &lt;em&gt;missing&lt;/em&gt; underneath it?&lt;/p&gt;

&lt;h2&gt;
  
  
  Where MCP Lives in the Stack
&lt;/h2&gt;

&lt;p&gt;MCP is a Layer 7 protocol. It's an application-layer abstraction — a structured way to expose tools to an LLM. It runs on top of HTTP, stdio, or WebSockets depending on your transport.&lt;/p&gt;

&lt;p&gt;That's fine for its purpose. MCP isn't trying to be a networking protocol. It's trying to give models a clean interface to call tools.&lt;/p&gt;

&lt;p&gt;But that means MCP inherits the same substrate as every other web application: TCP, HTTP, TLS, DNS. Infrastructure designed for serving documents to humans.&lt;/p&gt;

&lt;p&gt;Here's the OSI breakdown that rarely gets discussed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L7  Application   → MCP, A2A, HTTP APIs, your app
L6  Presentation  → JSON, HTML, base64 (for humans)
L5  Session       → TLS, HTTP sessions, cookies (for humans)
L4  Transport     → TCP (three-way handshake, head-of-line blocking)
L3  Network       → IP
L2  Data Link     → Ethernet, Wi-Fi
L1  Physical      → Cables, fiber, radio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents using MCP are living at L7 on a stack designed for L6-L5 to translate the internet for human eyes. Every JSON parse. Every HTTP session. Every DNS lookup. Translation layers that exist because humans can't read binary packets.&lt;/p&gt;

&lt;p&gt;Agents can.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Session Layer for Agents
&lt;/h2&gt;

&lt;p&gt;The web uses TLS at L5 to handle session management, authentication, and encryption for human-facing traffic. But there's no equivalent for agent-to-agent traffic — no session layer designed for machines talking to machines.&lt;/p&gt;

&lt;p&gt;This is the gap that &lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; fills.&lt;/p&gt;

&lt;p&gt;Pilot slots in at &lt;strong&gt;L5&lt;/strong&gt; — the same position TLS occupies — and provides a native session layer for agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L7  Application   → MCP, A2A, HTTP APIs — sits on top of Pilot
L5  Pilot Protocol → Agent ↔ Agent session layer
                     48-bit virtual addresses
                     X25519 key exchange, AES-256-GCM per tunnel
                     Ed25519 identity
                     NAT traversal (STUN + hole-punching)
L4  Transport     → UDP (with Pilot's own reliable streams on top)
L3  Network       → IPv4 / IPv6 (unchanged)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pilot doesn't replace the stack — it inserts at L5 and makes everything above it work better for machines.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Changes for MCP Deployments
&lt;/h2&gt;

&lt;p&gt;Right now, your MCP server is an isolated endpoint. It has a URL. Agents call it. Done.&lt;/p&gt;

&lt;p&gt;With Pilot underneath, your MCP server becomes an &lt;strong&gt;addressable peer&lt;/strong&gt; on a live network:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It gets a unique 48-bit address — discoverable by other agents without DNS&lt;/li&gt;
&lt;li&gt;Peer connections are encrypted and authenticated by identity, not by URL&lt;/li&gt;
&lt;li&gt;Other agents can find it through the Pilot backbone without any central broker&lt;/li&gt;
&lt;li&gt;It can join domain-specific groups (security, finance, research) and become discoverable to relevant peers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://pilotprotocol.network/for/mcp" rel="noopener noreferrer"&gt;MCP + Pilot integration&lt;/a&gt; is essentially: give your MCP servers a network. Instead of isolated tool endpoints, you get a mesh of specialized agents that discover and trust each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why UDP Instead of TCP?
&lt;/h2&gt;

&lt;p&gt;This comes up a lot. Pilot uses UDP with its own reliable streams on top rather than TCP. Here's why that matters for agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TCP problems for agents:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Three-way handshake adds latency on every new connection&lt;/li&gt;
&lt;li&gt;Head-of-line blocking means one lost packet stalls the whole stream&lt;/li&gt;
&lt;li&gt;Designed for streaming documents, not request-response between peers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pilot's UDP approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sliding window + AIMD congestion control (same algorithms, no TCP overhead)&lt;/li&gt;
&lt;li&gt;SACK (selective acknowledgment) — retransmit only what's lost&lt;/li&gt;
&lt;li&gt;No head-of-line blocking&lt;/li&gt;
&lt;li&gt;Faster initial connections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The benchmark: &lt;strong&gt;12 seconds&lt;/strong&gt; for typical data retrieval on Pilot vs &lt;strong&gt;51 seconds&lt;/strong&gt; via standard web stack. The TCP/HTTP overhead isn't the whole difference, but it's part of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Protocol Stack in 2026
&lt;/h2&gt;

&lt;p&gt;The current agent protocol landscape has MCP winning the tool layer, A2A taking the coordination layer, and now Pilot filling the session/transport layer.&lt;/p&gt;

&lt;p&gt;They're not competing — they're complementary layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;L7 Application&lt;/td&gt;
&lt;td&gt;Your agent framework&lt;/td&gt;
&lt;td&gt;Orchestration, reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L7 Tools&lt;/td&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;Tool-calling interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L7 Coordination&lt;/td&gt;
&lt;td&gt;A2A&lt;/td&gt;
&lt;td&gt;Agent-to-agent task delegation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;L5 Session&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Pilot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Native agent networking, addressing, discovery&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L4 Transport&lt;/td&gt;
&lt;td&gt;TCP / UDP&lt;/td&gt;
&lt;td&gt;Packet delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're building multi-agent systems in 2026 and you haven't thought about the session layer, you're building on a gap. MCP handles how agents call tools. Pilot handles how agents &lt;em&gt;find each other&lt;/em&gt; and communicate directly — without a human-readable API in the middle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical: Adding Pilot to an Existing MCP Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Pilot (single binary, no dependencies)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh

&lt;span class="c"&gt;# Start the daemon — your MCP server gets a network address&lt;/span&gt;
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; my-mcp-server

&lt;span class="c"&gt;# Your server is now addressable on the Pilot network&lt;/span&gt;
&lt;span class="c"&gt;# Other agents can discover it by hostname or address&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there, your MCP server joins the ~190,000-agent Pilot network. 350+ specialized service agents are already online: Crossref for academic paper verification, historical FX for finance, METAR for aviation weather, crt.sh for certificate transparency, FDA recall feeds.&lt;/p&gt;

&lt;p&gt;Your tools don't have to scrape for data that a specialist peer already has structured and ready.&lt;/p&gt;




&lt;p&gt;The networking story for agents is still being written. But the fundamental insight — that agents need a session layer designed for machines, not humans — is the right frame. MCP gave agents tools. Pilot gives them a network.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; — the network layer for AI agents. Peer-to-peer encrypted tunnels at the UDP layer. One line of code. &lt;a href="https://pilotprotocol.network/blog/ietf-internet-draft-pilot-protocol" rel="noopener noreferrer"&gt;Read the IETF Internet-Draft →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agents</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>How to Give Your AI Agent a Network Address (and Why It Matters)</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Wed, 13 May 2026 21:35:13 +0000</pubDate>
      <link>https://dev.to/asterview/how-to-give-your-ai-agent-a-network-address-and-why-it-matters-4ejc</link>
      <guid>https://dev.to/asterview/how-to-give-your-ai-agent-a-network-address-and-why-it-matters-4ejc</guid>
      <description>&lt;p&gt;Your AI agent can call tools. It can browse the web, read files, and hit REST APIs. But here's the thing nobody talks about: it doesn't have an &lt;em&gt;address&lt;/em&gt;. It can reach out, but nothing can reach &lt;em&gt;it&lt;/em&gt;. And every query it makes goes through infrastructure built for humans — HTTP stacks, JSON parsers, DNS — layers that exist to translate the web into something a human can click.&lt;/p&gt;

&lt;p&gt;That's the wrong substrate for machines.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With the Current Stack
&lt;/h2&gt;

&lt;p&gt;When an agent needs data, it scrapes. When agents need to share work, they go through a human-readable API. When two agents on different servers need to coordinate, someone has to build a middleware layer to bridge them.&lt;/p&gt;

&lt;p&gt;This is the 2026 agent tax. Every agent doing the same web scraping, separately, forever. Burning tokens re-reading the same pages. Waiting for brittle parsers.&lt;/p&gt;

&lt;p&gt;The root cause: HTTP was designed to serve documents to browsers. It's a presentation layer for humans. Agents don't need the presentation layer — they need the &lt;em&gt;session layer&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Native Agent Network Looks Like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; is a peer-to-peer network layer built specifically for agents. It slots in at OSI Layer 5 — the same position TLS occupies for the web — and changes what everything above it has to do.&lt;/p&gt;

&lt;p&gt;Here's what that means practically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every agent on the network gets a &lt;strong&gt;unique 48-bit address&lt;/strong&gt; (like &lt;code&gt;0:A91F.0000.7C2E&lt;/code&gt;) — a direct, authenticated identifier with no intermediary&lt;/li&gt;
&lt;li&gt;Peer-to-peer encrypted tunnels using &lt;strong&gt;X25519 key exchange, AES-256-GCM&lt;/strong&gt; per tunnel, and &lt;strong&gt;Ed25519 identity&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NAT traversal&lt;/strong&gt; via STUN + hole-punching, relay fallback for symmetric NATs&lt;/li&gt;
&lt;li&gt;UDP with Pilot's own reliable streams: sliding window, AIMD congestion control, SACK&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No DNS. No certificate authorities. No three-way TCP handshake just to say hello.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Online: One Line of Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Single static binary. No SDK. No API key. Your agent gets an address and can immediately ping other agents by hostname:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; my-agent
Daemon running &lt;span class="o"&gt;(&lt;/span&gt;pid 24817&lt;span class="o"&gt;)&lt;/span&gt;
Address: 0:A91F.0000.7C2E
Hostname: my-agent

&lt;span class="nv"&gt;$ &lt;/span&gt;pilotctl ping agent-alpha
✓ reply from 0:4B2E.0000.1A3D · 38ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You're now on a network with ~190,000 other agents. Not a search engine. Not a web crawler. A peer-to-peer mesh where agents route tasks to the peers best suited to solve them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Speed Difference Is Real
&lt;/h2&gt;

&lt;p&gt;Benchmarks from the Pilot network show &lt;strong&gt;12 seconds&lt;/strong&gt; for a typical data retrieval task on Pilot versus &lt;strong&gt;51 seconds&lt;/strong&gt; via the web. The difference isn't just latency — it's the elimination of scraping, retrying, parsing, and re-parsing that the web stack forces on every agent.&lt;/p&gt;

&lt;p&gt;When you ask a specialist agent on Pilot for historical FX rates, SEC filings, or flight status data, you get structured data from an agent that's &lt;em&gt;already done the work&lt;/em&gt;. No scraping. No rate limits. One hop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for MCP Users
&lt;/h2&gt;

&lt;p&gt;If you're already using MCP servers, Pilot is the network layer underneath. MCP gives agents a tool interface. Pilot gives those tools an address — so they can be discovered, called peer-to-peer, and form trust relationships without any central broker.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://pilotprotocol.network/for/mcp" rel="noopener noreferrer"&gt;MCP + Pilot integration&lt;/a&gt; means your MCP servers stop being isolated endpoints and become addressable peers on a live agent network.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Agents Are Actually Using It For
&lt;/h2&gt;

&lt;p&gt;Based on traffic patterns across the network's 350+ specialized service agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Legal agents&lt;/strong&gt; verifying whether citations in witness statements are real or fabricated (Crossref specialist, one call)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finance agents&lt;/strong&gt; retrieving historical FX at the exact invoice timestamp — not today's rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SecOps agents&lt;/strong&gt; sharing whether a rare kube-audit entry is a known false positive or a novel exploit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SRE agents&lt;/strong&gt; asking peers in a region whether an AWS outage is real before the status page updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is the most interesting. One SRE agent asking another: &lt;em&gt;"Is us-west-2 actually degraded right now?"&lt;/em&gt; A peer in the region already sees it. That's not a search. That's a colleague-to-colleague call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The agent economy is shifting fast. By the end of 2026, Gartner projects 40% of enterprise applications will include task-specific AI agents, up from less than 5% a year ago. Multi-agent systems are the default architecture. Orchestrator + specialist subagents is the consensus pattern.&lt;/p&gt;

&lt;p&gt;The infrastructure those agents run on is lagging. Most agent-to-agent communication still goes through HTTP APIs designed for humans. That's the gap Pilot fills.&lt;/p&gt;

&lt;p&gt;An address. A network. Direct routing. That's what your agent is missing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The Pilot Protocol network currently runs ~190,000 agents with 19.7B+ requests routed. &lt;a href="https://pilotprotocol.network/docs/getting-started" rel="noopener noreferrer"&gt;Get started in one line →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>networking</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Your MCP Server Has No Network Identity. Here's Why That's a Problem.</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Sun, 10 May 2026 18:09:24 +0000</pubDate>
      <link>https://dev.to/asterview/your-mcp-server-has-no-network-identity-heres-why-thats-a-problem-34lj</link>
      <guid>https://dev.to/asterview/your-mcp-server-has-no-network-identity-heres-why-thats-a-problem-34lj</guid>
      <description>&lt;p&gt;MCP (Model Context Protocol) crossed 97 million monthly SDK downloads. Every major AI provider adopted it. It solved a real problem: how do agents invoke tools and retrieve context in a standardized way?&lt;/p&gt;

&lt;p&gt;But MCP is a protocol for &lt;em&gt;what&lt;/em&gt; agents can do. It says nothing about &lt;em&gt;where&lt;/em&gt; they are or &lt;em&gt;how&lt;/em&gt; they find each other.&lt;/p&gt;

&lt;p&gt;Your MCP server lives at a URL. That URL is hardcoded in your agent's config. If it changes, your agent breaks. If you want another agent to discover your server's capabilities, you need a registry, a service mesh, or a human to copy and paste the URL.&lt;/p&gt;

&lt;p&gt;This is the network identity problem for MCP, and it's more tractable than it looks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP Actually Does (and Doesn't Do)
&lt;/h2&gt;

&lt;p&gt;MCP standardizes the tool-calling layer. An agent sends a request, the MCP server handles tool invocation, and returns structured results. The protocol defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool schemas (what parameters each tool accepts)&lt;/li&gt;
&lt;li&gt;Resource definitions (what data sources are available)&lt;/li&gt;
&lt;li&gt;Request/response format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It does not define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How agents discover MCP servers without prior knowledge&lt;/li&gt;
&lt;li&gt;How MCP servers maintain stable identities across IP changes&lt;/li&gt;
&lt;li&gt;How multiple agents route to the same MCP server based on capability&lt;/li&gt;
&lt;li&gt;How MCP servers establish encrypted tunnels to requesting agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are network problems, and MCP isn't a network protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hardcoded URL Problem
&lt;/h2&gt;

&lt;p&gt;Most MCP deployments today look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"my-tool-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.mycompany.com/mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sk-..."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works for a single agent with a fixed configuration. It breaks down when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The server moves to a new host&lt;/li&gt;
&lt;li&gt;You want to run multiple instances behind a load balancer&lt;/li&gt;
&lt;li&gt;Another agent outside your org wants to use the same tool&lt;/li&gt;
&lt;li&gt;You want to discover all MCP servers with a given capability type&lt;/li&gt;
&lt;li&gt;You're building a marketplace of tools that agents can discover dynamically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current solution is to build a registry. A centralized service that maps tool names to URLs. But now you have a new dependency: the registry has to be available, consistent, and kept up to date. You've reintroduced the single point of failure you were trying to avoid.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Network Identity Gives an MCP Server
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/for/mcp" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; gives MCP servers a stable address at the session layer (L5), below HTTP, above UDP/TCP. Instead of a URL, the server gets an address like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0:A91F.0000.7C2E
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This address is independent of IP. The server can move, scale horizontally, or restart. The address stays the same. Other agents route to the address, not to a specific host.&lt;/p&gt;

&lt;p&gt;In practice, this means:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discovery without a registry&lt;/strong&gt;: Agents can find MCP servers by capability type. A server tagged as a &lt;code&gt;finance&lt;/code&gt; tool is discoverable by any agent on the network querying for finance-related capabilities. No hardcoded URL required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Encrypted tunnels by default&lt;/strong&gt;: Pilot uses X25519 key exchange and AES-256-GCM per connection. The MCP server doesn't implement TLS or manage certificates. The network layer handles it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NAT traversal&lt;/strong&gt;: MCP servers behind corporate firewalls or home networks are routable without port forwarding. Hole-punching handles direct P2P where possible; relay fallback handles symmetric NATs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stable addressing for the agent economy&lt;/strong&gt;: When tools are addressable at the network layer, you can build agent workflows that discover and invoke tools dynamically, without human configuration at each step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting It Up
&lt;/h2&gt;

&lt;p&gt;Giving an existing MCP server a Pilot address takes one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh
&lt;span class="nv"&gt;$ &lt;/span&gt;pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; my-mcp-server &lt;span class="nt"&gt;--tags&lt;/span&gt; mcp,finance
Daemon running &lt;span class="o"&gt;(&lt;/span&gt;pid 24817&lt;span class="o"&gt;)&lt;/span&gt;
Address: 0:B331.0000.4D12
Hostname: my-mcp-server
Tags: mcp, finance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP server continues running as normal. Pilot runs alongside it, handling network addressing and tunnel establishment. Agents on the Pilot network can now discover &lt;code&gt;my-mcp-server&lt;/code&gt; by querying for &lt;code&gt;mcp&lt;/code&gt; or &lt;code&gt;finance&lt;/code&gt; tagged nodes.&lt;/p&gt;

&lt;p&gt;For agents that want to connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pilotctl connect my-mcp-server
Tunnel established · 0:B331.0000.4D12 · 22ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP protocol runs over the tunnel. The tool call format doesn't change. The network layer changes underneath it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Broader Picture: MCP + A2A + Transport
&lt;/h2&gt;

&lt;p&gt;The agent protocol stack in 2026 has three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt; (L7): Agent-to-tool communication. Tool invocation, context retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A2A&lt;/strong&gt; (L7): Agent-to-agent coordination. Task delegation, capability negotiation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport&lt;/strong&gt; (L5): Addressing, discovery, encrypted tunnels, NAT traversal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP and A2A both assume an underlying transport. Both currently default to HTTP. HTTP works, but it carries the full overhead of a human-facing protocol: DNS, TLS via public CAs, JSON serialization, and no native addressing for agents.&lt;/p&gt;

&lt;p&gt;A session-layer protocol handles the transport problems once, at the network level, so MCP and A2A don't have to keep reinventing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Impact
&lt;/h2&gt;

&lt;p&gt;350+ specialized service agents on the Pilot network are already addressable this way. An agent that needs current FX rates connects to a finance-tagged peer. An agent checking SSL certificate transparency hits the security group. No configuration. No registry call. Network-layer discovery, direct connection, structured data back.&lt;/p&gt;

&lt;p&gt;Average response time: 12 seconds for a data retrieval query routed through the Pilot network vs. 51 seconds scraping the same data over HTTP.&lt;/p&gt;

&lt;p&gt;If you're running MCP servers in production and want to make them discoverable to the broader agent ecosystem without maintaining a central registry, this is the path.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Get started&lt;/strong&gt;: &lt;a href="https://pilotprotocol.network/for/mcp" rel="noopener noreferrer"&gt;Give your MCP server a network identity&lt;/a&gt; | &lt;a href="https://pilotprotocol.network/docs/getting-started" rel="noopener noreferrer"&gt;Install Pilot&lt;/a&gt; | &lt;a href="https://pilotprotocol.network/docs/service-agents" rel="noopener noreferrer"&gt;Browse service agents&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>agents</category>
      <category>backend</category>
    </item>
    <item>
      <title>P2P vs. Broker: The Architecture Decision Defining Multi-Agent Systems</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Sun, 10 May 2026 18:07:17 +0000</pubDate>
      <link>https://dev.to/asterview/p2p-vs-broker-the-architecture-decision-defining-multi-agent-systems-54dg</link>
      <guid>https://dev.to/asterview/p2p-vs-broker-the-architecture-decision-defining-multi-agent-systems-54dg</guid>
      <description>&lt;p&gt;Most multi-agent systems are built on a broker. There's a coordinator that receives tasks, dispatches them to worker agents, and collects results. It's a natural architecture. It mirrors how humans organize teams. It's easy to reason about.&lt;/p&gt;

&lt;p&gt;It's also a bottleneck that gets worse as your fleet grows.&lt;/p&gt;

&lt;p&gt;This post breaks down when broker architectures work, when they fail, and what a peer-to-peer alternative actually looks like in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Broker Model: Strengths and Limits
&lt;/h2&gt;

&lt;p&gt;A broker-based system has real advantages for small fleets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple mental model&lt;/strong&gt;: one coordinator, many workers. Easy to debug.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear ordering&lt;/strong&gt;: the broker controls task sequencing. No race conditions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability&lt;/strong&gt;: everything flows through a central point. Logs are coherent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access control&lt;/strong&gt;: the broker is the single enforcement point for permissions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a team running 10-50 coordinated agents on a bounded set of tasks, this is the right call. The overhead is manageable and the observability is worth it.&lt;/p&gt;

&lt;p&gt;The problems emerge at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Broker Failure Modes at Scale
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Single point of failure&lt;/strong&gt;: When the broker goes down, the fleet stops. High availability for the broker requires redundancy that adds operational complexity and latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Throughput ceiling&lt;/strong&gt;: Every message goes through one process. Even a well-engineered broker becomes a bottleneck when ephemeral agents spin up and down at high frequency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discovery through the broker&lt;/strong&gt;: In a brokered system, agents don't know about each other unless the broker tells them. Adding a new capability to the system requires registering it with the broker, which requires a human in the loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency tax&lt;/strong&gt;: A query that could go agent-to-agent in one hop goes agent-to-broker-to-agent in two, with serialization/deserialization at each step.&lt;/p&gt;

&lt;p&gt;Gartner reported a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025. Many of the teams now scaling from pilot to production are hitting these limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  The P2P Alternative
&lt;/h2&gt;

&lt;p&gt;In a peer-to-peer architecture, agents connect directly to each other. Discovery happens at the network layer, not through a central registry. Results can propagate across the mesh without routing through a single coordinator.&lt;/p&gt;

&lt;p&gt;The tradeoffs shift:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Broker&lt;/th&gt;
&lt;th&gt;P2P&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simplicity at small scale&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput at large scale&lt;/td&gt;
&lt;td&gt;Limited by broker&lt;/td&gt;
&lt;td&gt;Linear with peers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure surface&lt;/td&gt;
&lt;td&gt;Single point&lt;/td&gt;
&lt;td&gt;Distributed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discovery&lt;/td&gt;
&lt;td&gt;Centralized&lt;/td&gt;
&lt;td&gt;Network-layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;td&gt;Requires tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;2 hops&lt;/td&gt;
&lt;td&gt;1 hop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The missing piece for P2P in practice has always been addressing and discovery. How does an agent find a peer that has the capability it needs? How do they establish a trusted connection without a central authority?&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Network Layer Provides
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; addresses this by inserting a session layer (L5) between UDP/TCP and your application framework. Each agent gets a stable 48-bit address:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0:A91F.0000.7C2E
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents organize into domain-specific groups: travel, finance, security, research. A query for SEC filings routes to agents in the finance group. A query about certificate transparency routes to the security group. The routing is network-level, not application-level. No broker required.&lt;/p&gt;

&lt;p&gt;The encryption is per-tunnel: X25519 key exchange, AES-256-GCM, Ed25519 identity. NAT traversal handles the cases where direct P2P isn't possible, with relay fallback. The agent developer doesn't implement any of this. It happens at the network layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Architectures
&lt;/h2&gt;

&lt;p&gt;The real-world answer for most production systems isn't "choose one." It's:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use a broker for&lt;/strong&gt;: task orchestration within a bounded fleet, audit trails, access control enforcement, sequential workflows with dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use P2P for&lt;/strong&gt;: high-throughput data retrieval, cross-fleet queries, capability discovery, anything where latency matters and the broker is not adding value.&lt;/p&gt;

&lt;p&gt;A practical pattern: your orchestrator agent uses a broker to coordinate its internal fleet, but connects to the P2P network at the boundary to retrieve external data. Internal coordination stays brokered and auditable. External queries go direct.&lt;/p&gt;

&lt;p&gt;This is roughly how Pilot Protocol's "Orgs" feature works: pre-wired multi-agent fleets where agents discover and trust each other on first boot, without requiring a live broker for every interaction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ephemeral Agent Problem
&lt;/h2&gt;

&lt;p&gt;The clearest case for P2P is ephemeral agents. A broker-based system where agents register on startup and deregister on shutdown works fine when agents live for hours. When agents live for seconds or milliseconds, the registration overhead dominates.&lt;/p&gt;

&lt;p&gt;A session-layer network where agents get addresses on install and are immediately routable handles ephemerality without registration logic. The agent is online when the daemon is running. It's offline when it's not. No state management in the broker.&lt;/p&gt;

&lt;p&gt;The Pilot network currently runs ~176,000 agents with 57% growth in the past 7 days. At that scale, a central broker would be a significant engineering problem. The address-based P2P model is what makes the numbers work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Considerations
&lt;/h2&gt;

&lt;p&gt;If you're evaluating this architecture shift:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability requires investment&lt;/strong&gt;: distributed tracing across a P2P network is harder than reading broker logs. Plan for this upfront.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trust model changes&lt;/strong&gt;: instead of trusting the broker to enforce access control, you're trusting the network addressing and encryption. Make sure you understand your threat model before deploying a P2P fleet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gradual migration is possible&lt;/strong&gt;: you don't have to rip out your broker. Connect your existing orchestrator to the P2P network for data retrieval first. Measure latency. Expand from there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not everything should be P2P&lt;/strong&gt;: sequential, stateful workflows with strong ordering requirements still benefit from a coordinator. P2P shines for parallel, independent, high-throughput operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Is Heading
&lt;/h2&gt;

&lt;p&gt;The agent protocol space is converging on a layered model. MCP handles tool access at L7. A2A handles agent coordination at L7. Neither solves transport. The session layer is the open gap.&lt;/p&gt;

&lt;p&gt;An agent fleet that runs on a native L5 network is faster, more resilient, and self-organizing in ways that broker architectures can't match. The cost is operational complexity that requires better tooling to manage.&lt;/p&gt;

&lt;p&gt;The tooling is catching up.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Further reading&lt;/strong&gt;: &lt;a href="https://pilotprotocol.network/docs/" rel="noopener noreferrer"&gt;Pilot Protocol docs&lt;/a&gt; | &lt;a href="https://pilotprotocol.network/for/setups" rel="noopener noreferrer"&gt;Browse pre-wired agent orgs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why Your AI Agents Are Still Bottlenecked by HTTP (And What to Do About It)</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Sun, 10 May 2026 18:03:48 +0000</pubDate>
      <link>https://dev.to/asterview/why-your-ai-agents-are-still-bottlenecked-by-http-and-what-to-do-about-it-bec</link>
      <guid>https://dev.to/asterview/why-your-ai-agents-are-still-bottlenecked-by-http-and-what-to-do-about-it-bec</guid>
      <description>&lt;p&gt;You've wired up your AI agent to a dozen APIs. It can search the web, pull database records, call external services. It looks like a capable system on paper.&lt;/p&gt;

&lt;p&gt;But watch what it actually does at runtime.&lt;/p&gt;

&lt;p&gt;It fires off an HTTP request. Waits for DNS. Does the TLS handshake. Gets back HTML or JSON designed for a human interface. Parses it with fragile selectors or regex. Retries when the schema changed. Does it again, and again, for every piece of data it needs.&lt;/p&gt;

&lt;p&gt;This is an agent running on infrastructure that was never designed for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Tell the Story
&lt;/h2&gt;

&lt;p&gt;For every search a human makes, an AI agent performs 20-50x more requests. Scraping pages, parsing output, retrying failures, re-reading the same content another agent already processed an hour ago.&lt;/p&gt;

&lt;p&gt;HTTP was designed in 1991 for a browser rendering documents for human eyes. The entire stack above TCP is optimized for that use case: DNS for human-readable names, TLS for trust anchors humans can't verify themselves, HTML and JSON for formats humans can read.&lt;/p&gt;

&lt;p&gt;Agents can handle binary wire formats. They don't need human-readable naming. They don't need a certificate authority vouching for a domain. They need fast, authenticated, direct connections to peers that have the data they need.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Stack Actually Looks Like Today
&lt;/h2&gt;

&lt;p&gt;When you deploy an agent using any modern framework, it lives at L7. It makes HTTP calls. Every call traverses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DNS lookup (100-300ms on cold cache)&lt;/li&gt;
&lt;li&gt;TCP three-way handshake&lt;/li&gt;
&lt;li&gt;TLS negotiation (another round trip)&lt;/li&gt;
&lt;li&gt;HTTP request/response overhead&lt;/li&gt;
&lt;li&gt;JSON parsing (often hundreds of milliseconds for large payloads)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a human loading a page once, this is acceptable. For an agent doing 10,000 requests per hour, each one of these is waste.&lt;/p&gt;

&lt;p&gt;More importantly: when your agent finishes a task, the result disappears. Another agent running the same query 20 minutes later burns the same tokens, makes the same requests, waits the same latencies. There's no shared state. There's no agent memory at the network layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Session Layer Gap
&lt;/h2&gt;

&lt;p&gt;The OSI model has seven layers. Agents today live at L7 (application) and ride on L3/L4 (IP and TCP). Layer 5, the session layer, is largely unused on the modern internet. TLS occupies part of it. Everything else is handled by application logic.&lt;/p&gt;

&lt;p&gt;This is where a native agent network belongs.&lt;/p&gt;

&lt;p&gt;A session layer for agents provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Addressing&lt;/strong&gt;: Each agent gets a stable identity and address, independent of IP. No DNS. Direct routing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encrypted tunnels&lt;/strong&gt;: P2P encrypted channels between agents, without a central server in the path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery&lt;/strong&gt;: Agents find peers with relevant capabilities without going through a search engine or a broker.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt;: Results shared at the network layer are available to any agent that asks, not just the one that generated them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; is one implementation of this idea. It sits at L5, above UDP/TCP and below your application framework. Agents install it with a single command, get a 48-bit address, and can immediately connect to ~176,000 peers on the network.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh
&lt;span class="nv"&gt;$ &lt;/span&gt;pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; my-agent
Daemon running &lt;span class="o"&gt;(&lt;/span&gt;pid 24817&lt;span class="o"&gt;)&lt;/span&gt;
Address: 0:A91F.0000.7C2E
Hostname: my-agent

&lt;span class="nv"&gt;$ &lt;/span&gt;pilotctl ping agent-alpha
✓ reply from 0:4B2E.0000.1A3D · 38ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SDK. No API key. The agent is online.&lt;/p&gt;

&lt;p&gt;At the network level, agents self-organize into groups by domain: travel, finance, security, research. A query goes to the agent best positioned to answer it, not to a general-purpose search engine. Average query time: 12 seconds on the native network vs. 51 seconds via HTTP scraping.&lt;/p&gt;

&lt;p&gt;The protocol itself uses X25519 for key exchange, AES-256-GCM for encryption per tunnel, and Ed25519 for identity. NAT traversal happens via STUN and hole-punching, with relay fallback for symmetric NATs. It was submitted as an IETF Internet-Draft.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where MCP Fits In
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) has become the dominant standard for agent-to-tool communication. 97 million monthly SDK downloads. Adopted by every major AI provider.&lt;/p&gt;

&lt;p&gt;MCP solves a real problem: standardizing how agents invoke tools and retrieve context. But MCP is L7. It assumes an underlying transport. It doesn't solve the addressing, discovery, or tunnel establishment problems.&lt;/p&gt;

&lt;p&gt;Giving an MCP server a network identity changes what it can do. It can be discovered by other agents without central registration. It can maintain persistent tunnels instead of per-request connections. Multiple agents can route to it based on capability rather than a hardcoded URL.&lt;/p&gt;

&lt;p&gt;This is roughly analogous to the difference between an HTTP server and a service mesh. The MCP server still exists. It just gets a proper network presence underneath.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hive Mind Property
&lt;/h2&gt;

&lt;p&gt;The part that gets interesting at scale: when 176,000 agents share a network layer, results propagate. An agent that answers a query about SEC filings shares that answer with its network neighbors. The next agent to ask gets the cached result, not the raw API call.&lt;/p&gt;

&lt;p&gt;This is not a feature of any individual agent. It's a property of the network. Each new agent that joins makes the network smarter, not just larger.&lt;/p&gt;

&lt;p&gt;Current trajectory on the Pilot network: 57% growth in the past 7 days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Next Steps
&lt;/h2&gt;

&lt;p&gt;If you're building agent systems today, the relevant questions are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How are your agents addressing each other? Hardcoded URLs? Service discovery? Something else?&lt;/li&gt;
&lt;li&gt;What happens when the API your agent depends on changes its schema?&lt;/li&gt;
&lt;li&gt;Are your agents repeating work other agents in your fleet already did?&lt;/li&gt;
&lt;li&gt;How do agents in your system authenticate to each other?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A session-layer network doesn't replace your agent framework or your MCP servers. It goes underneath them and solves the transport problems that application-layer code is currently papering over.&lt;/p&gt;

&lt;p&gt;The web was built in 1991 for humans. The infrastructure for agents is being built now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;: &lt;a href="https://pilotprotocol.network/docs/getting-started" rel="noopener noreferrer"&gt;Install Pilot in one line&lt;/a&gt; | &lt;a href="https://pilotprotocol.network/blog/ietf-internet-draft-pilot-protocol" rel="noopener noreferrer"&gt;Read the IETF draft&lt;/a&gt; | &lt;a href="https://pilotprotocol.network/docs/service-agents" rel="noopener noreferrer"&gt;Browse 350+ service agents&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>networking</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Building a Multi-Agent Fleet with No Central Server</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Fri, 08 May 2026 23:20:02 +0000</pubDate>
      <link>https://dev.to/asterview/building-a-multi-agent-fleet-with-no-central-server-12fp</link>
      <guid>https://dev.to/asterview/building-a-multi-agent-fleet-with-no-central-server-12fp</guid>
      <description>&lt;p&gt;Most multi-agent architectures have the same shape: a coordinator talks to workers through a central hub. The hub is usually a message queue, a shared database, or an orchestration service like Ray or Temporal.&lt;/p&gt;

&lt;p&gt;That hub is also the first thing that breaks. It's a single point of failure, a scaling bottleneck, and an operational cost you pay even when the agents aren't working.&lt;/p&gt;

&lt;p&gt;Here's how to build a fleet where agents find each other and route tasks without any central intermediary.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Central Hub Problem
&lt;/h2&gt;

&lt;p&gt;When you're spinning up a 5-agent prototype, a central coordinator makes sense. It's simple, debuggable, and gets out of your way.&lt;/p&gt;

&lt;p&gt;At 50 agents it starts to fray. At 500 it becomes your hardest reliability problem.&lt;/p&gt;

&lt;p&gt;The hub becomes a global lock. Every message goes through it. Every failure cascades through it. Every scaling decision has to account for it.&lt;/p&gt;

&lt;p&gt;The alternative — having agents discover and contact each other directly — sounds appealing but has historically been hard. How does Agent A know Agent B's address? How do you handle NAT traversal? How do you authenticate the connection?&lt;/p&gt;

&lt;p&gt;These are solved problems in networking. We just haven't applied the solutions to agents until now.&lt;/p&gt;




&lt;h2&gt;
  
  
  Peer-to-Peer at the Session Layer
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; operates at OSI Layer 5 — the session layer, the same slot TLS occupies for the web. It gives each agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A permanent 48-bit address (&lt;code&gt;0:A91F.0000.7C2E&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Automatic NAT traversal (STUN → hole-punch → relay fallback for symmetric NATs)&lt;/li&gt;
&lt;li&gt;End-to-end encrypted tunnels (X25519 key exchange, AES-256-GCM, Ed25519 identity)&lt;/li&gt;
&lt;li&gt;A global directory (the backbone) for agent discovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Pilot, the hub isn't a server you run. It's the network itself — and the network is maintained by the protocol, not by your ops team.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Fleet Pattern That Actually Works
&lt;/h2&gt;

&lt;p&gt;Here's a concrete pattern for a research fleet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Coordinator agent
    ↓ Pilot (P2P, encrypted)
[Specialist A] [Specialist B] [Specialist C]
    ↓                ↓               ↓
  Papers           FX data       News feeds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each specialist registers its capabilities on the Pilot backbone when it starts. The coordinator queries the backbone — "I need a peer that can resolve academic citations" — and gets back the address of Specialist A. Direct connection from there.&lt;/p&gt;

&lt;p&gt;No service registry you maintain. No hardcoded addresses. No configuration file you update when a worker moves.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;Getting an agent online:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; coordinator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The agent is addressable, authenticated, and reachable from any other Pilot peer — regardless of NAT, firewall, or cloud region.&lt;/p&gt;

&lt;p&gt;For the specialists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On each worker node&lt;/span&gt;
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; specialist-papers
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; specialist-fx
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; specialist-news
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each one joins the backbone automatically. The coordinator can ping them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pilotctl ping specialist-papers
&lt;span class="c"&gt;# ✓ reply from 0:4B2E.0000.1A3D · 22ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Self-Organization: How Groups Work
&lt;/h2&gt;

&lt;p&gt;Beyond individual peer connections, Pilot has a concept of groups — clusters of agents that self-organize around a shared domain.&lt;/p&gt;

&lt;p&gt;A trading fleet might form a TRADING group. A research fleet might join RESEARCH. Agents within a group can broadcast to all members or route to the most relevant peer within the domain.&lt;/p&gt;

&lt;p&gt;This is closer to how human organizations actually work: a new employee joins the company and immediately has access to colleagues in their department, not just a single manager they have to route everything through.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://polo.pilotprotocol.network" rel="noopener noreferrer"&gt;Pilot network status&lt;/a&gt; page shows these groups live: BACKBONE, TRAVEL, TRADING, RESEARCH, INSURANCE, and more, with real-time agent counts.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Give Up
&lt;/h2&gt;

&lt;p&gt;Centralized orchestration isn't all downside. You give up some things going P2P:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability.&lt;/strong&gt; A central hub is easy to instrument. A P2P mesh requires distributed tracing from day one. Plan for this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debuggability.&lt;/strong&gt; When something goes wrong, "what was the message queue state at time T" is easier to answer than "what was the P2P graph state." Log aggressively at the agent level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simplicity.&lt;/strong&gt; For a 3-agent prototype, a coordinator is simpler. P2P earns its complexity at scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Switch
&lt;/h2&gt;

&lt;p&gt;The right time to move to a P2P architecture is usually later than you think but earlier than you want. Signals that you're ready:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're spending meaningful eng time on coordinator reliability&lt;/li&gt;
&lt;li&gt;Agents in different cloud regions are paying latency costs to route through a central server&lt;/li&gt;
&lt;li&gt;You want agents from different operators to collaborate without giving either access to your infrastructure&lt;/li&gt;
&lt;li&gt;Your fleet is growing fast enough that a central bottleneck is becoming a scaling conversation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If two or more of those are true, the session-layer approach is worth the investment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://pilotprotocol.network/docs/" rel="noopener noreferrer"&gt;Pilot Protocol documentation&lt;/a&gt; — addressing, groups, NAT traversal&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pilotprotocol.network/for/setups" rel="noopener noreferrer"&gt;Multi-agent setups on Pilot&lt;/a&gt; — pre-wired fleet configurations&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pilotprotocol.network/blog/ietf-internet-draft-pilot-protocol" rel="noopener noreferrer"&gt;The IETF Internet-Draft&lt;/a&gt; — the protocol spec if you want to go deep&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The network is live: ~163,000 agents, 12.7B+ requests routed, +28% growth in the past week.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;One line to get started: &lt;code&gt;curl -fsSL https://pilotprotocol.network/install.sh | sh&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Stop Making Your AI Agent Scrape the Web. There's a Better Way.</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Fri, 08 May 2026 23:17:56 +0000</pubDate>
      <link>https://dev.to/asterview/stop-making-your-ai-agent-scrape-the-web-theres-a-better-way-36fl</link>
      <guid>https://dev.to/asterview/stop-making-your-ai-agent-scrape-the-web-theres-a-better-way-36fl</guid>
      <description>&lt;p&gt;There's an absurd loop at the heart of most AI agent architectures right now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent needs data (a research paper, an FX rate, a flight status, a CVE)&lt;/li&gt;
&lt;li&gt;Agent calls a web scraper or fires an HTTP request to a public endpoint&lt;/li&gt;
&lt;li&gt;The endpoint returns HTML designed for a human to read in a browser&lt;/li&gt;
&lt;li&gt;Agent burns tokens parsing, cleaning, and extracting the actual value&lt;/li&gt;
&lt;li&gt;Agent retries when the scraper breaks because the page layout changed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We've built genuinely intelligent agents and then made them spend half their time doing remedial text processing on documents that weren't meant for them.&lt;/p&gt;

&lt;p&gt;Let me show you what the alternative looks like.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Root Cause: Wrong Layer
&lt;/h2&gt;

&lt;p&gt;HTTP is a Layer 7 protocol built in 1991 to serve documents to human-operated browsers. It's brilliant at that. Every design decision — HTML rendering, cookies, sessions, REST conventions — optimizes for a human reading a page.&lt;/p&gt;

&lt;p&gt;Agents don't read pages. They consume structured data. They don't need the presentation layer, the session cookies, or the retry logic that only exists because the web assumed humans would be patient with slow servers.&lt;/p&gt;

&lt;p&gt;The right fix isn't a better scraper. It's operating at a different layer — one where agents talk directly to other agents that have already done the hard work of acquiring, normalizing, and maintaining the data you need.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Specialized Data Agents Look Like in Practice
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; runs a network of ~163,000 agents. About 350 of them are specialized data service agents — peers that exist to answer a specific category of query cleanly and fast.&lt;/p&gt;

&lt;p&gt;Here's what a few of them replace:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crossref specialist&lt;/strong&gt;&lt;br&gt;
Resolves a DOI against the global paper registry in one call. No scraping PubMed, no HTML parsing, no fighting rate limits. If you're building a legal research agent that needs to verify citations, this is one hop instead of a brittle pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Historical FX specialist&lt;/strong&gt;&lt;br&gt;
Spot rate at an arbitrary timestamp. Not today's rate from a public API that expires — the actual rate at the moment a transaction happened. Replaces three bank statement screenshots and a manual lookup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aviation weather specialist&lt;/strong&gt;&lt;br&gt;
Real-time METAR data for any airport. If your agent is managing travel or logistics, it gets structured weather data directly from a peer that's already watching the feeds, not from scraping a flight status page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;crt.sh / certificate transparency specialist&lt;/strong&gt;&lt;br&gt;
Streams CT hits on your domains. Your security agent gets new certificate issuances the moment they appear, not after the next cron runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FDA recalls specialist&lt;/strong&gt;&lt;br&gt;
Filters against the live recall feed for a specific condition or ingredient. No crawling FDA's website, no pagination, no HTML tables.&lt;/p&gt;

&lt;p&gt;The pattern is consistent: instead of your agent scraping a source and parsing the result, a specialist on the network has already done that work — once, for everyone — and serves structured answers directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Network Effect That Makes This Work
&lt;/h2&gt;

&lt;p&gt;The reason this improves over time is the same reason any network improves: each new agent adds value for every existing one.&lt;/p&gt;

&lt;p&gt;When a new operator connects their SEC filing parser to Pilot, every agent on the network gains access to cleaner financial data without writing any code. When a localization agent joins that has a native speaker in Manchester on the other end, every agent building for UK markets benefits.&lt;/p&gt;

&lt;p&gt;Pilot calls this "a hive mind that gets smarter with every new agent." It's less poetic if you think about it mechanically: it's a network with positive externalities, where the marginal cost of adding a new data source approaches zero for consumers.&lt;/p&gt;

&lt;p&gt;Compare that to the current model, where every agent team independently builds and maintains scrapers for the same 20 data sources. The waste is staggering.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Latency Numbers
&lt;/h2&gt;

&lt;p&gt;From the Pilot benchmarks: &lt;strong&gt;12 seconds on Pilot vs 51 seconds via the web&lt;/strong&gt; for equivalent data retrieval tasks.&lt;/p&gt;

&lt;p&gt;That's not a small difference. It's a 4x reduction in wall-clock time for the same result. In an agentic pipeline where you're making dozens of these calls, that's the difference between a task that completes in a minute and one that takes five.&lt;/p&gt;

&lt;p&gt;The speed comes from two places:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No parsing overhead&lt;/strong&gt; — the data arrives structured, not as HTML you have to strip&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UDP transport&lt;/strong&gt; — Pilot runs peer-to-peer over UDP with its own reliable-stream layer, avoiding the head-of-line blocking that makes TCP slow for parallel requests&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Getting Your Agent Connected
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Pilot (single static binary, no SDK, no API key)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh

&lt;span class="c"&gt;# Start the daemon&lt;/span&gt;
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; my-research-agent

&lt;span class="c"&gt;# Your agent is now on the network&lt;/span&gt;
&lt;span class="c"&gt;# Address: 0:A91F.0000.7C2E&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there, your agent can query the backbone for any of the 350+ service agents by capability. No URL directory to maintain, no API keys to manage per-service.&lt;/p&gt;




&lt;h2&gt;
  
  
  When You Still Need the Web
&lt;/h2&gt;

&lt;p&gt;To be direct: Pilot doesn't replace the web for everything. If you need to take a screenshot of a specific page, or submit a form on a site that has no API, you still need a browser or a scraper.&lt;/p&gt;

&lt;p&gt;But for structured data — the kind that lives behind an API or in a database somewhere — the web route is almost never the right choice for an agent. The data exists, someone has it clean, and there's now an agent network where you can get it directly.&lt;/p&gt;

&lt;p&gt;The scraping loop is a workaround. The network is the fix.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pilot Protocol: &lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;pilotprotocol.network&lt;/a&gt; — peer-to-peer encrypted tunnels for agents, one line of code, no central dependency.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>webdev</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Why Your MCP Server Needs a Network Layer (And How to Add One in 30 Seconds)</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Fri, 08 May 2026 23:14:00 +0000</pubDate>
      <link>https://dev.to/asterview/why-your-mcp-server-needs-a-network-layer-and-how-to-add-one-in-30-seconds-3mbh</link>
      <guid>https://dev.to/asterview/why-your-mcp-server-needs-a-network-layer-and-how-to-add-one-in-30-seconds-3mbh</guid>
      <description>&lt;p&gt;You've got an MCP server running. Locally, it's perfect. Then someone asks: "Can another agent on a different machine call it?"&lt;/p&gt;

&lt;p&gt;You spin up a VPN. Or punch a hole in the firewall. Or route it through a cloud proxy. Half a day gone, and now you've got a central dependency you didn't want.&lt;/p&gt;

&lt;p&gt;There's a cleaner way.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with MCP's Transport Layer
&lt;/h2&gt;

&lt;p&gt;MCP is genuinely great at what it does: connecting an agent to its tools via a clean, structured protocol. But it was designed with a human-run server in mind. The transport story is essentially "use HTTP" or "use stdio." Both assume you control both endpoints and they can reach each other.&lt;/p&gt;

&lt;p&gt;In 2026, that assumption breaks constantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent A is on AWS, Agent B is behind a corporate NAT&lt;/li&gt;
&lt;li&gt;You want two agents from different operators to collaborate without either exposing a public endpoint&lt;/li&gt;
&lt;li&gt;You're building a fleet where agents need to discover &lt;em&gt;and&lt;/em&gt; call each other dynamically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP doesn't solve this. It isn't supposed to — it's an application-layer protocol. The transport is your problem.&lt;/p&gt;

&lt;p&gt;Until now, "your problem" meant a lot of yak shaving.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Session Layer Gives You
&lt;/h2&gt;

&lt;p&gt;The OSI model has a slot for exactly this: Layer 5, the session layer. It's the layer that manages connections between peers — maintaining them, authenticating them, and routing them across NATs.&lt;/p&gt;

&lt;p&gt;The web uses TLS here. Agents need something that speaks agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;Pilot Protocol&lt;/a&gt; is a peer-to-peer network built specifically for this slot. Instead of routing agent traffic through HTTP (a document protocol built for browsers), Pilot operates at UDP with its own reliable-stream layer on top — X25519 key exchange, AES-256-GCM per tunnel, Ed25519 identity, automatic NAT traversal via STUN + hole-punching.&lt;/p&gt;

&lt;p&gt;Each agent gets a 48-bit address. Direct, authenticated, no intermediary required.&lt;/p&gt;




&lt;h2&gt;
  
  
  One Line of Code
&lt;/h2&gt;

&lt;p&gt;Here's what adding Pilot to your MCP server actually looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That installs a single static binary. No SDK. No API key. No account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; my-mcp-server
&lt;span class="c"&gt;# Daemon running (pid 24817)&lt;/span&gt;
&lt;span class="c"&gt;# Address: 0:A91F.0000.7C2E&lt;/span&gt;
&lt;span class="c"&gt;# Hostname: my-mcp-server&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your MCP server now has a Pilot address. Any other agent on the network — regardless of what NAT it's behind — can reach it directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pilotctl ping agent-alpha
&lt;span class="c"&gt;# ✓ reply from 0:4B2E.0000.1A3D · 38ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No VPN. No public endpoint. No relay server you have to run.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why UDP, Not TCP?
&lt;/h2&gt;

&lt;p&gt;TCP is great for browsers loading pages. It wasn't designed for the round-trip latency profile of agent-to-agent calls.&lt;/p&gt;

&lt;p&gt;Head-of-line blocking is the killer: if one packet is dropped, everything queues behind it. For a browser loading a web page, that's fine — you're waiting for HTML to render anyway. For an agent making 50 parallel data requests, it's a disaster.&lt;/p&gt;

&lt;p&gt;Pilot runs UDP with its own reliable-stream implementation: sliding window, AIMD congestion control, selective acknowledgement (SACK). You get reliability without the head-of-line blocking tax. The benchmark from the Pilot homepage: &lt;strong&gt;12s on Pilot vs 51s via the web&lt;/strong&gt; for the same data retrieval task.&lt;/p&gt;




&lt;h2&gt;
  
  
  The MCP + Pilot Pattern
&lt;/h2&gt;

&lt;p&gt;The natural pairing looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent A (MCP client)
    ↓ Pilot tunnel (encrypted, P2P)
Agent B (MCP server)
    ↓ MCP tool calls
Tools / data / capabilities
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pilot handles the transport: addressing, NAT traversal, encryption. MCP handles the application layer: tool definitions, structured responses. Neither replaces the other.&lt;/p&gt;

&lt;p&gt;Pilot even has a dedicated page for this pattern: &lt;a href="https://pilotprotocol.network/for/mcp" rel="noopener noreferrer"&gt;MCP + Pilot&lt;/a&gt; — your MCP server gets a network address and becomes reachable from anywhere on the Pilot network.&lt;/p&gt;




&lt;h2&gt;
  
  
  Discovery Is Solved Too
&lt;/h2&gt;

&lt;p&gt;Once your server is on Pilot, it joins the backbone — a global directory where agents can find peers by capability rather than by hostname.&lt;/p&gt;

&lt;p&gt;That means another agent can query "I need a tool that does X" and Pilot routes it to you, without you publishing a URL anywhere. Agent discovery stops being a directory you maintain and becomes a property of the network itself.&lt;/p&gt;

&lt;p&gt;There are already 350+ specialized service agents on the backbone: Crossref for paper lookups, historical FX data, aviation weather, crt.sh for certificate transparency, FDA recalls. They're just peers on the network.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;MCP is the right protocol for tool-calling. But it needs a transport layer that wasn't designed for humans loading documents in browsers.&lt;/p&gt;

&lt;p&gt;Adding Pilot solves the NAT problem, the discovery problem, and the "two agents from different operators need to talk" problem — in one binary, one command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then go back to building the agent, not the plumbing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pilot Protocol is live at &lt;a href="https://pilotprotocol.network/" rel="noopener noreferrer"&gt;pilotprotocol.network&lt;/a&gt; — ~163,000 agents, 12.7B+ requests routed, published as an IETF Internet-Draft.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>mcp</category>
      <category>networking</category>
    </item>
    <item>
      <title>How to Deploy Multi-Agent Systems Cross-Cloud[Python]</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Mon, 04 May 2026 20:21:24 +0000</pubDate>
      <link>https://dev.to/asterview/how-to-deploy-multi-agent-systems-cross-cloudpython-576a</link>
      <guid>https://dev.to/asterview/how-to-deploy-multi-agent-systems-cross-cloudpython-576a</guid>
      <description>&lt;p&gt;&lt;strong&gt;Quick Answer:&lt;/strong&gt; To connect AI agents across different cloud environments, developers must replace synchronous HTTP with asynchronous brokers like &lt;strong&gt;Celery&lt;/strong&gt; and &lt;strong&gt;Redis&lt;/strong&gt;, externalize state memory, secure tool execution using the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;, bypass strict NAT firewalls via &lt;strong&gt;Pilot Protocol&lt;/strong&gt; transport, and trace distributed workflows with &lt;strong&gt;OpenTelemetry&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Deploying a &lt;strong&gt;Multi-Agent System (MAS)&lt;/strong&gt; across distributed cloud environments instantly breaks standard local network assumptions. To maintain cross-cloud agent communication, engineers must abandon synchronous local testing patterns and implement asynchronous task delegation, stateless container memory, decoupled tool execution, and decentralized peer-to-peer networking. &lt;/p&gt;

&lt;p&gt;Standard &lt;strong&gt;REST APIs&lt;/strong&gt; fail in production because &lt;strong&gt;Large Language Model (LLM)&lt;/strong&gt; inference introduces variable latency, causing synchronous HTTP requests to time out. Furthermore, when scaling an orchestrator agent on &lt;strong&gt;AWS&lt;/strong&gt; and specialized worker agents on &lt;strong&gt;GCP&lt;/strong&gt;, relying on standard TCP/IP routing leads to continuous IP churn and blocked connections at corporate &lt;strong&gt;NAT firewalls&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;The reality of distributed multi-agent architecture is that you are building an emergent private internet for autonomous software. Here are five architectural implementations required to connect agents across disparate cloud networks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Synchronous HTTP Will Throttle Your Agent Architecture
&lt;/h3&gt;

&lt;p&gt;When scaling from one agent to two, developers typically default to standard REST APIs where one agent sends a synchronous POST request to another. This fails in production because LLM inference times are highly variable. Generating a response or executing an unoptimized tool takes anywhere from ten to forty seconds. Cloud load balancers and standard HTTP clients time out waiting for the response, dropping the connection and forcing the agent to restart its entire reasoning loop.&lt;/p&gt;

&lt;p&gt;Cross-cloud agent communication must be asynchronous. Instead of blocking HTTP requests, agents must place delegation tasks into a distributed message broker. This allows the orchestrator agent to continue processing other inputs while the worker agent processes the task on a separate node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using Celery with Redis for async cross-cloud task delegation
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;celery&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Celery&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Celery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent_tasks&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;redis://external-broker-url:6379/0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.task&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delegate_to_research_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# This runs on the GCP worker node asynchronously
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;research_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Store result in external database for the AWS agent to fetch later
&lt;/span&gt;    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;delegate_to_research_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="c1"&gt;# On the AWS orchestrator node: trigger without blocking
&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;delegate_to_research_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze Q3 earnings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;previous_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task dispatched with ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ephemeral Containers Destroy Conversational State
&lt;/h3&gt;

&lt;p&gt;Agents running in auto-scaling cloud instances are ephemeral. If an agent process crashes mid-task due to an out-of-memory error from a massive context window, the container restarts. If conversational history and task trajectories are stored in the local memory of the agent process, the entire workflow vanishes upon restart.&lt;/p&gt;

&lt;p&gt;To survive node migrations, agent processes must be completely stateless. Every tool output, intermediate reasoning step, and user prompt should be immediately pushed to an external, globally accessible data store. Upon initialization, the agent rebuilds its context window by querying this external memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Externalizing agent state to Redis
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;global-redis.internal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_agent_thought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Push the latest reasoning step to a list
&lt;/span&gt;    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rpush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_state:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rebuild_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Rebuild state if the container restarts
&lt;/span&gt;    &lt;span class="n"&gt;raw_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_state:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw_steps&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Managing Tool Execution Across Network Boundaries
&lt;/h3&gt;

&lt;p&gt;Hardcoding API keys and database connection strings into agent logic creates massive security vulnerabilities on untrusted cloud virtual machines. The agent reasoning loop should be strictly separated from tool execution permissions.&lt;/p&gt;

&lt;p&gt;The Model Context Protocol acts as the industry standard for this decoupling. By wrapping internal databases in an MCP server, you dictate exactly what data the agent can interact with using standardized JSON-RPC schemas. The cloud agent requests tool execution, and the secure MCP server executes it, ensuring the autonomous model never directly touches raw infrastructure credentials.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Connecting an agent to a secure MCP server across the network
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.client.stdio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_client&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_secure_tool&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# The server parameters define the connection to the secure tool environment
&lt;/span&gt;    &lt;span class="n"&gt;server_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secure_mcp_server.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;as &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="c1"&gt;# The agent discovers available tools dynamically
&lt;/span&gt;            &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="c1"&gt;# The agent executes the tool without seeing the underlying credentials
&lt;/span&gt;            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_internal_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q3_sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;query_secure_tool&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overcoming IP Churn and NAT Firewalls for Direct Transport
&lt;/h3&gt;

&lt;p&gt;While the Model Context Protocol formats tool requests, it assumes the underlying network is already routable. Cloud containers face continuous IP churn, and enterprise networks utilize strict NAT firewalls. Exposing local tool servers across clouds usually requires Virtual Private Cloud peering or central API gateways, introducing latency and single points of failure.&lt;/p&gt;

&lt;p&gt;This transport problem requires assigning agents persistent cryptographic identities using Pilot Protocol. Instead of binding communication to fragile physical IPs, this userspace overlay network assigns a permanent 48-bit virtual address mathematically bound to an Ed25519 keypair. The pure-Go daemon utilizes automated UDP hole-punching to bypass strict firewalls and executes X25519 Elliptic Curve Diffie-Hellman key exchanges. This allows an orchestrator on AWS to communicate directly with a worker on a corporate network without reverse proxies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the pure-Go userspace network stack&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh

&lt;span class="c"&gt;# Initialize the daemon on the local secure machine (Node A)&lt;/span&gt;
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; secure-mcp-tool

&lt;span class="c"&gt;# Initialize the daemon on the cloud VPS agent (Node B)&lt;/span&gt;
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; cloud-worker-agent

&lt;span class="c"&gt;# Node B can now route directly to Node A bypassing the NAT&lt;/span&gt;
&lt;span class="c"&gt;# utilizing the underlying TCP-over-UDP transport layer&lt;/span&gt;
pilotctl connect secure-mcp-tool &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s1"&gt;'{"jsonrpc": "2.0", "method": "call_tool"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Distributed Tracing is Mandatory for Agent Debugging
&lt;/h3&gt;

&lt;p&gt;When a cross-cloud multi-agent workflow fails, identifying the exact point of failure is difficult. If an orchestrator on Azure delegates a task to a researcher on GCP, and the GCP agent encounters a hallucination loop, local logs will only show a generic HTTP timeout.&lt;/p&gt;

&lt;p&gt;Implementing distributed tracing is non-negotiable for autonomous systems. Injecting trace context into payloads passed between clouds allows engineers to visualize the entire sequence of tool calls and prompt generations across network boundaries using OpenTelemetry standards.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Injecting OpenTelemetry trace IDs into cross-cloud payloads
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.propagate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;inject&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dispatch_task_to_peer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross_cloud_delegation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="c1"&gt;# Inject the current trace context into the headers or payload
&lt;/span&gt;        &lt;span class="nf"&gt;inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Add the headers to the payload sent to the remote agent
&lt;/span&gt;        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;

        &lt;span class="c1"&gt;# Standard request to the remote agent
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;peer.response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>cloud</category>
      <category>ai</category>
      <category>agents</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Deploy Multi-Agent Systems Cross-Cloud[Python]</title>
      <dc:creator>William Baker </dc:creator>
      <pubDate>Mon, 04 May 2026 20:21:24 +0000</pubDate>
      <link>https://dev.to/asterview/how-to-deploy-multi-agent-systems-cross-cloudpython-4n7c</link>
      <guid>https://dev.to/asterview/how-to-deploy-multi-agent-systems-cross-cloudpython-4n7c</guid>
      <description>&lt;p&gt;&lt;strong&gt;Quick Answer:&lt;/strong&gt; To connect AI agents across different cloud environments, developers must replace synchronous HTTP with asynchronous brokers like &lt;strong&gt;Celery&lt;/strong&gt; and &lt;strong&gt;Redis&lt;/strong&gt;, externalize state memory, secure tool execution using the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;, bypass strict NAT firewalls via &lt;strong&gt;Pilot Protocol&lt;/strong&gt; transport, and trace distributed workflows with &lt;strong&gt;OpenTelemetry&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Deploying a &lt;strong&gt;Multi-Agent System (MAS)&lt;/strong&gt; across distributed cloud environments instantly breaks standard local network assumptions. To maintain cross-cloud agent communication, engineers must abandon synchronous local testing patterns and implement asynchronous task delegation, stateless container memory, decoupled tool execution, and decentralized peer-to-peer networking. &lt;/p&gt;

&lt;p&gt;Standard &lt;strong&gt;REST APIs&lt;/strong&gt; fail in production because &lt;strong&gt;Large Language Model (LLM)&lt;/strong&gt; inference introduces variable latency, causing synchronous HTTP requests to time out. Furthermore, when scaling an orchestrator agent on &lt;strong&gt;AWS&lt;/strong&gt; and specialized worker agents on &lt;strong&gt;GCP&lt;/strong&gt;, relying on standard TCP/IP routing leads to continuous IP churn and blocked connections at corporate &lt;strong&gt;NAT firewalls&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;The reality of distributed multi-agent architecture is that you are building an emergent private internet for autonomous software. Here are five architectural implementations required to connect agents across disparate cloud networks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Synchronous HTTP Will Throttle Your Agent Architecture
&lt;/h3&gt;

&lt;p&gt;When scaling from one agent to two, developers typically default to standard REST APIs where one agent sends a synchronous POST request to another. This fails in production because LLM inference times are highly variable. Generating a response or executing an unoptimized tool takes anywhere from ten to forty seconds. Cloud load balancers and standard HTTP clients time out waiting for the response, dropping the connection and forcing the agent to restart its entire reasoning loop.&lt;/p&gt;

&lt;p&gt;Cross-cloud agent communication must be asynchronous. Instead of blocking HTTP requests, agents must place delegation tasks into a distributed message broker. This allows the orchestrator agent to continue processing other inputs while the worker agent processes the task on a separate node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using Celery with Redis for async cross-cloud task delegation
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;celery&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Celery&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Celery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent_tasks&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;redis://external-broker-url:6379/0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.task&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delegate_to_research_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# This runs on the GCP worker node asynchronously
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;research_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Store result in external database for the AWS agent to fetch later
&lt;/span&gt;    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;delegate_to_research_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="c1"&gt;# On the AWS orchestrator node: trigger without blocking
&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;delegate_to_research_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze Q3 earnings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;previous_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task dispatched with ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ephemeral Containers Destroy Conversational State
&lt;/h3&gt;

&lt;p&gt;Agents running in auto-scaling cloud instances are ephemeral. If an agent process crashes mid-task due to an out-of-memory error from a massive context window, the container restarts. If conversational history and task trajectories are stored in the local memory of the agent process, the entire workflow vanishes upon restart.&lt;/p&gt;

&lt;p&gt;To survive node migrations, agent processes must be completely stateless. Every tool output, intermediate reasoning step, and user prompt should be immediately pushed to an external, globally accessible data store. Upon initialization, the agent rebuilds its context window by querying this external memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Externalizing agent state to Redis
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;global-redis.internal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_agent_thought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Push the latest reasoning step to a list
&lt;/span&gt;    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rpush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_state:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rebuild_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Rebuild state if the container restarts
&lt;/span&gt;    &lt;span class="n"&gt;raw_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_state:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw_steps&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Managing Tool Execution Across Network Boundaries
&lt;/h3&gt;

&lt;p&gt;Hardcoding API keys and database connection strings into agent logic creates massive security vulnerabilities on untrusted cloud virtual machines. The agent reasoning loop should be strictly separated from tool execution permissions.&lt;/p&gt;

&lt;p&gt;The Model Context Protocol acts as the industry standard for this decoupling. By wrapping internal databases in an MCP server, you dictate exactly what data the agent can interact with using standardized JSON-RPC schemas. The cloud agent requests tool execution, and the secure MCP server executes it, ensuring the autonomous model never directly touches raw infrastructure credentials.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Connecting an agent to a secure MCP server across the network
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.client.stdio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_client&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_secure_tool&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# The server parameters define the connection to the secure tool environment
&lt;/span&gt;    &lt;span class="n"&gt;server_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secure_mcp_server.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;as &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="c1"&gt;# The agent discovers available tools dynamically
&lt;/span&gt;            &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="c1"&gt;# The agent executes the tool without seeing the underlying credentials
&lt;/span&gt;            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_internal_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q3_sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;query_secure_tool&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overcoming IP Churn and NAT Firewalls for Direct Transport
&lt;/h3&gt;

&lt;p&gt;While the Model Context Protocol formats tool requests, it assumes the underlying network is already routable. Cloud containers face continuous IP churn, and enterprise networks utilize strict NAT firewalls. Exposing local tool servers across clouds usually requires Virtual Private Cloud peering or central API gateways, introducing latency and single points of failure.&lt;/p&gt;

&lt;p&gt;This transport problem requires assigning agents persistent cryptographic identities using Pilot Protocol. Instead of binding communication to fragile physical IPs, this userspace overlay network assigns a permanent 48-bit virtual address mathematically bound to an Ed25519 keypair. The pure-Go daemon utilizes automated UDP hole-punching to bypass strict firewalls and executes X25519 Elliptic Curve Diffie-Hellman key exchanges. This allows an orchestrator on AWS to communicate directly with a worker on a corporate network without reverse proxies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the pure-Go userspace network stack&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://pilotprotocol.network/install.sh | sh

&lt;span class="c"&gt;# Initialize the daemon on the local secure machine (Node A)&lt;/span&gt;
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; secure-mcp-tool

&lt;span class="c"&gt;# Initialize the daemon on the cloud VPS agent (Node B)&lt;/span&gt;
pilotctl daemon start &lt;span class="nt"&gt;--hostname&lt;/span&gt; cloud-worker-agent

&lt;span class="c"&gt;# Node B can now route directly to Node A bypassing the NAT&lt;/span&gt;
&lt;span class="c"&gt;# utilizing the underlying TCP-over-UDP transport layer&lt;/span&gt;
pilotctl connect secure-mcp-tool &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s1"&gt;'{"jsonrpc": "2.0", "method": "call_tool"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Distributed Tracing is Mandatory for Agent Debugging
&lt;/h3&gt;

&lt;p&gt;When a cross-cloud multi-agent workflow fails, identifying the exact point of failure is difficult. If an orchestrator on Azure delegates a task to a researcher on GCP, and the GCP agent encounters a hallucination loop, local logs will only show a generic HTTP timeout.&lt;/p&gt;

&lt;p&gt;Implementing distributed tracing is non-negotiable for autonomous systems. Injecting trace context into payloads passed between clouds allows engineers to visualize the entire sequence of tool calls and prompt generations across network boundaries using OpenTelemetry standards.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Injecting OpenTelemetry trace IDs into cross-cloud payloads
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.propagate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;inject&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dispatch_task_to_peer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross_cloud_delegation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="c1"&gt;# Inject the current trace context into the headers or payload
&lt;/span&gt;        &lt;span class="nf"&gt;inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Add the headers to the payload sent to the remote agent
&lt;/span&gt;        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;

        &lt;span class="c1"&gt;# Standard request to the remote agent
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;peer.response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>cloud</category>
      <category>ai</category>
      <category>agents</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
