mgd43b for AgentEnsemble

Posted on May 13 • Originally published at agentensemble.net

Federation for Agent Networks: Cross-Namespace Capability Sharing via Realms

#java #ai #agents #architecture

Discovery lets ensembles find capabilities within a network. But in a real deployment, not every ensemble lives in the same namespace or even the same cluster. A hotel chain might run separate ensemble networks at each property, each in its own Kubernetes namespace, but want them to share spare capacity when one property is overloaded.

This is the federation problem: how do you extend capability discovery across trust and network boundaries without collapsing everything into one flat namespace?

Realms as trust boundaries

AgentEnsemble v3.0.0 introduces realms as the organizational unit for federation. A realm is a namespace-level discovery and trust boundary -- typically mapping to a Kubernetes namespace in production deployments.

FederationConfig federation = FederationConfig.builder()
    .localRealm("hotel-downtown")
    .federationName("Hotel Chain")
    .realm("hotel-airport", "hotel-airport-ns")
    .realm("hotel-beach", "hotel-beach-ns")
    .build();

Within a realm, ensembles discover each other freely. Cross-realm discovery requires explicit opt-in: an ensemble must advertise its capacity as shareable for other realms to use it.

This is a deliberate design choice. Discovery within a namespace is expected and low-risk. Discovery across namespaces is a capacity-sharing decision that should be explicit.

Capacity advertisement

Ensembles periodically broadcast their current load and availability using capacity updates:

{
  "type": "capacity_update",
  "ensemble": "kitchen",
  "realm": "hotel-downtown",
  "status": "available",
  "currentLoad": 0.35,
  "maxConcurrent": 100,
  "shareable": true
}

The shareable flag is the federation gate. When true, the ensemble's spare capacity is available to ensembles in other realms. When false, the ensemble only serves local requests.

The CapacityAdvertiser handles the periodic broadcasting:

CapacityAdvertiser advertiser = new CapacityAdvertiser(
    "kitchen",
    "hotel-downtown",
    () -> computeCurrentLoad(),
    100,                              // max concurrent
    true,                             // shareable to other realms
    message -> broadcast(message));

advertiser.start(Duration.ofSeconds(10));

Status is derived automatically from load: "available" when load is below 1.0, "busy" when at capacity.

The routing hierarchy

When an ensemble needs a capability, the FederationRegistry routes the request using a three-level hierarchy:

Priority	Scope	Condition
1 (highest)	Local realm	Provider is in the same realm
2	Same realm (unregistered)	Provider has no realm info (assumed local)
3 (lowest)	Cross-realm	Provider is in a different realm and `shareable = true`

Within each level, the least-loaded provider is preferred.

FederationRegistry registry = new FederationRegistry(capabilityRegistry);

// Find the best provider using the routing hierarchy
Optional<String> provider = registry.findProvider(
    "prepare-meal", "hotel-downtown");

The routing hierarchy encodes a simple operational principle: prefer local providers (lower latency, same failure domain), fall back to cross-realm providers when local capacity is insufficient.

Why this matters for agent systems

Federation solves a specific operational problem: agent networks that need to scale across deployment boundaries without becoming a single monolithic system.

Consider a hotel chain with three properties. Each property runs its own ensemble network (kitchen, front desk, maintenance, room service). Each network is self-contained and operates independently. But during peak events -- a conference at one property, a holiday weekend at another -- one property's kitchen may be overwhelmed while another has spare capacity.

Without federation, each property is an island. The overloaded kitchen queues requests or drops them. With federation, the overloaded kitchen's requests overflow to a kitchen in another realm that has advertised spare capacity.

The key constraint is that federation should be additive, not required. Each realm must work independently. Federation adds cross-realm routing as an optimization, not a dependency. If the federation link goes down, each realm continues operating on its own.

Network configuration

Enable federation at the network level:

NetworkConfig config = NetworkConfig.builder()
    .ensemble("kitchen", "ws://kitchen:7329/ws")
    .ensemble("maintenance", "ws://maintenance:7329/ws")
    .federationConfig(FederationConfig.builder()
        .localRealm("hotel-downtown")
        .federationName("Hotel Chain")
        .realm("hotel-airport", "hotel-airport-ns")
        .realm("hotel-beach", "hotel-beach-ns")
        .build())
    .build();

The FederationConfig is optional. Without it, the network operates in single-realm mode -- standard discovery without cross-realm routing.

Tradeoffs

Cross-realm latency. Requests routed to a different realm incur network latency that local requests do not. For agent workloads where task execution takes seconds or minutes, the routing overhead is negligible. For latency-sensitive workflows, the local-first routing hierarchy mitigates this -- cross-realm routing only happens when local capacity is insufficient.

Capacity staleness. Capacity updates are periodic (default: every 10 seconds). Between updates, the routing decisions are based on slightly stale data. An ensemble that was available 5 seconds ago might be at capacity now. The consequence is occasional request routing to busy providers, which queue the request rather than rejecting it.

Trust is implicit. Realm membership is declared in configuration, not enforced cryptographically. Any ensemble that claims to be in a realm is trusted. For environments where this matters, the transport layer needs authentication -- which is outside the scope of the federation layer itself.

Realm topology is static. The set of realms in a federation is declared at configuration time. You cannot add new realms at runtime without reconfiguring existing ensembles. For dynamic environments where namespaces are created and destroyed frequently, this requires a configuration management strategy.

Shareable is binary. An ensemble either shares all its spare capacity or none. There is no per-realm or per-capability sharing control. If you need more granular sharing policies, you would need to implement them in the capacity advertisement logic.

The design principle

The useful insight is that federation is a capacity-sharing problem, not a networking problem. The networking (WebSockets, Kafka) already works across boundaries. What federation adds is a policy layer: who can use whose spare capacity, and in what order.

Realms provide the organizational unit. Capacity advertisement provides the data. The routing hierarchy provides the policy. Together, they turn a collection of independent agent networks into a cooperative federation that shares spare capacity while maintaining operational independence.

Federation is part of AgentEnsemble. The federation guide covers the full API including capacity advertisement and realm configuration.

I'd be interested in whether others are dealing with multi-namespace agent deployments, and how they handle cross-boundary capability sharing.

DEV Community