mgd43b for AgentEnsemble

Posted on May 11 • Originally published at agentensemble.net

Dynamic Discovery in Agent Networks: From Hardcoded Routes to Capability Catalogs

#java #ai #agents #architecture

The simplest way to connect two agent ensembles is a direct reference: ensemble A knows ensemble B's address and calls it. This works when you have two or three ensembles with stable relationships.

It stops working when you have ten ensembles, or when ensembles come and go, or when the same capability is provided by multiple ensembles and you want the caller to use whichever one is available. At that point, you need discovery -- a way for ensembles to find capabilities without knowing in advance who provides them.

The static wiring problem

In a statically wired agent network, every cross-ensemble call requires knowing the provider's identity and address:

// Static: caller knows exactly who to call
NetworkTask mealTask = NetworkTask.of("kitchen", "prepare-meal",
    "ws://kitchen:7329/ws");

This creates coupling. If the kitchen ensemble moves to a different port, every caller needs updating. If you add a second kitchen for capacity, callers need load-balancing logic. If the kitchen goes down, callers need fallback logic.

The fundamental issue is that callers should care about what they need (a meal preparation capability), not who provides it or where it runs.

Capability advertisement with tags

AgentEnsemble v3.0.0 introduces capability discovery. Ensembles advertise their shared tasks and tools with optional tags, and other ensembles discover providers at runtime.

Advertising capabilities

When building an ensemble, declare what it shares with the network:

Ensemble kitchen = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.of("Manage kitchen operations"))
    .shareTool("check-inventory", inventoryTool, "food", "stock")
    .shareTask("prepare-meal", mealTask, "food", "cooking")
    .build();

kitchen.start(7329);

The shareTool and shareTask methods register capabilities in the network's capability registry. The trailing string arguments are tags -- metadata that classifies the capability for filtered discovery.

Discovering capabilities

Another ensemble can discover providers without knowing their identity:

// Discover by capability name
NetworkTool inventoryCheck = NetworkTool.discover("check-inventory", registry);

// Discover by tag
List<CapabilityInfo> foodCapabilities = registry.findByTag("food");

The registry returns the provider that currently offers the requested capability. If multiple providers offer the same capability, the registry can apply selection logic (round-robin, least-loaded, affinity-based).

Tag-based catalogs

Tags turn the capability registry into a searchable catalog. Rather than querying for specific capability names, you can query for categories:

// Find all capabilities tagged with "food"
List<CapabilityInfo> food = registry.findByTag("food");

// Find all capabilities tagged with both "food" and "stock"
List<CapabilityInfo> stockChecks = registry.findByTags("food", "stock");

Each CapabilityInfo includes the capability name, type (task or tool), provider ensemble name, and tags:

CapabilityInfo info = food.get(0);
String name = info.name();           // "check-inventory"
String type = info.type();           // "TOOL"
String provider = info.provider();   // "kitchen"
Set<String> tags = info.tags();      // ["food", "stock"]

This is useful for building dynamic agent systems where an orchestrating ensemble does not know in advance what capabilities are available. It can discover capabilities at runtime, filter by category, and wire them into its workflow dynamically.

The registry abstraction

The capability registry is part of the transport SPI, which means it has pluggable implementations:

Implementation	Backing	Use case
In-memory	`ConcurrentHashMap`	Development, testing
WebSocket-broadcast	Network messages	Multi-process, simple mode
Kafka-backed	Kafka topics	Production, durable

In development, capabilities are registered and discovered within a single process or across WebSocket connections. In production, the registry can be backed by Kafka for durability and horizontal scaling.

The application code that registers and discovers capabilities does not change between implementations.

Dynamic vs. static wiring

The choice between static and dynamic wiring is not binary. A practical network often uses both:

Static wiring for well-known, stable relationships (the front desk always calls the kitchen)
Dynamic discovery for capabilities that may be provided by different ensembles depending on deployment, capacity, or availability

// Static: known relationship
NetworkTask knownMealTask = NetworkTask.of("kitchen", "prepare-meal",
    "ws://kitchen:7329/ws");

// Dynamic: discover at runtime
NetworkTool discovered = NetworkTool.discover("check-inventory", registry);

The two approaches coexist. Static tasks bypass the registry entirely. Dynamic tasks use the registry for resolution. The agent using the task or tool does not know which approach was used to create it.

Capability lifecycle

Capabilities have a lifecycle that mirrors the ensemble lifecycle:

Registration -- when the ensemble starts and calls shareTask or shareTool
Discovery -- when other ensembles query the registry
Deregistration -- when the ensemble stops or the capability is removed

In simple mode, deregistration happens when the ensemble process exits and the in-memory registry is garbage collected. With WebSocket transport, the ensemble broadcasts a deregistration message on shutdown. With Kafka, a tombstone record is produced.

The lifecycle matters for production systems. A stale registry entry (pointing to an ensemble that no longer exists) causes request failures. The registry needs to handle stale entries, either through explicit deregistration, heartbeat-based expiry, or health-check-based cleanup.

Tradeoffs

Discovery adds a lookup step. Every dynamically discovered capability requires a registry query. In practice, this is cached -- the first lookup queries the registry, subsequent uses of the same capability reuse the resolved provider. But the initial resolution adds latency.

Tag semantics are convention-based. There is no schema for tags. If one ensemble tags a capability as "food" and another tags it as "cuisine", they will not discover each other. Tag conventions need to be agreed upon across teams.

Multiple providers create ambiguity. When two ensembles offer the same capability, the registry needs a selection strategy. The current implementation supports least-loaded selection (when capacity information is available), but more sophisticated strategies (affinity, cost-based, latency-based) would need to be built.

Registry availability is a dependency. If the registry is unavailable, dynamic discovery fails. Static wiring works regardless of registry state. For critical paths, consider falling back to static wiring when discovery is unavailable.

The design principle

The useful abstraction is separating what from who. An ensemble that needs a meal preparation capability should express that need ("I need prepare-meal") without specifying the provider ("specifically from the kitchen ensemble at ws://kitchen:7329/ws").

This separation enables the network to evolve. New providers can come online. Existing providers can be replaced. Capacity can be redistributed. The callers do not need to change.

Discovery is the mechanism. Tags make it searchable. The transport SPI makes it portable across deployment environments.

Capability discovery is part of AgentEnsemble. The discovery guide covers the full API including tag-based filtering.

I'd be interested in how others handle capability discovery in multi-agent systems -- whether you use service registries, hardcoded routes, or something else entirely.

DEV Community