DEV Community: mgd43b

Error Handling in Agent Systems: Exception Hierarchies, Partial Results, and Exit Reasons

mgd43b — Sun, 31 May 2026 14:00:00 +0000

Agent systems fail in ways that traditional software does not. An LLM might return an unparseable response. A tool call might timeout. An agent might enter an infinite ReAct loop. A human reviewer might walk away from an approval gate. A task might succeed but produce output that a downstream task cannot use.

The interesting problem is not preventing these failures -- some are inherent to non-deterministic systems. The interesting problem is giving operators enough information to handle them gracefully: what failed, what succeeded before the failure, and what the system's terminal state actually is.

The Exception Hierarchy

AgentEnsemble uses a hierarchy of unchecked exceptions rooted at AgentEnsembleException. Every exception the framework throws extends this base, so you can catch everything with a single catch block or handle specific cases individually.

AgentEnsembleException (base)
  ValidationException             -- invalid configuration at build/run time
  TaskExecutionException          -- a task failed during execution
  AgentExecutionException         -- an LLM call failed
  MaxIterationsExceededException  -- agent exceeded its tool-call limit
  PromptTemplateException         -- unresolved template variables
  ToolExecutionException          -- a tool call failed
  ConstraintViolationException    -- required workers were not called
  GuardrailViolationException     -- a guardrail blocked execution

The hierarchy matters because different failure types require different responses. A ValidationException means your configuration is wrong -- no LLM was ever called, and the fix is in the code. A TaskExecutionException means the pipeline started but a task failed -- partial results may be available. A MaxIterationsExceededException means an agent got stuck in a tool-calling loop -- the fix might be fewer tools or a higher iteration limit.

Partial Results on Failure

When a multi-task pipeline fails partway through, the work completed before the failure is not discarded. TaskExecutionException carries a list of TaskOutput objects for tasks that completed before the failure:

try {
    EnsembleOutput output = ensemble.run(inputs);
    saveResults(output);
} catch (TaskExecutionException e) {
    // Save whatever was completed before the failure
    for (TaskOutput partial : e.getCompletedTaskOutputs()) {
        savePartialResult(partial);
    }
    alertOnFailure(e.getTaskDescription(), e.getAgentRole());
}

This is operationally significant. In a five-task pipeline where task four fails, you still have the outputs of tasks one through three. You can save them, display them to a user, or use them to resume the pipeline from where it left off.

Exit Reasons

Not every non-completion is an error. EnsembleOutput.getExitReason() distinguishes between four terminal states:

Exit Reason	Meaning
`COMPLETED`	All tasks ran to completion normally
`USER_EXIT_EARLY`	A human reviewer chose to stop the pipeline
`TIMEOUT`	A review gate timeout expired
`ERROR`	An unrecoverable exception terminated the pipeline

EnsembleOutput output = ensemble.run();

switch (output.getExitReason()) {
    case COMPLETED:
        System.out.println("All done: " + output.getRaw());
        break;
    case USER_EXIT_EARLY:
        System.out.println("User stopped after "
            + output.completedTasks().size() + " task(s)");
        break;
    case TIMEOUT:
        System.out.println("Review gate timed out");
        break;
    case ERROR:
        // Typically handled via exception
        break;
}

The distinction between USER_EXIT_EARLY and TIMEOUT matters for operational dashboards. A user exit is intentional -- the pipeline did its job and the human made a decision. A timeout might indicate a process problem (reviewer was not available) and may need escalation.

Specific Exception Types

ValidationException

Thrown before any LLM calls when the ensemble or its components are configured incorrectly. Common causes include missing required fields, tasks referencing unregistered agents, circular context dependencies, or invalid iteration limits.

This exception is your build-time safety net. If you see it, the fix is always in the configuration code.

AgentExecutionException

Thrown when the LLM call itself fails -- network errors, API errors, rate limiting, timeouts. Contains the agent role and task description so you can route the failure to the right team.

MaxIterationsExceededException

Thrown when an agent exceeds its maxIterations limit during the ReAct loop. Contains both the configured limit and the actual iteration count.

This is often a sign that the agent has too many tools and is cycling between them without making progress. The fix is usually to reduce the tool set, make tool descriptions more specific, or increase the iteration limit if the task genuinely requires many tool calls.

PromptTemplateException

Thrown when a task description contains {variable} placeholders that were not resolved. The exception lists the missing variable names, making it straightforward to fix.

GuardrailViolationException

Thrown when an input or output guardrail blocks execution. Contains the guardrail type (INPUT or OUTPUT), the violation message, the task description, and the agent role. This integrates with the guardrail system covered in the previous post.

The Retry Question

AgentEnsemble does not include built-in retry logic. This is a deliberate design choice.

The reasoning is that retry policies are highly context-dependent. A rate-limited API call might benefit from exponential backoff. A malformed LLM response might benefit from a retry with the same prompt. A task that failed because the model cannot perform the requested work should not be retried at all.

For transient failures, implement retry at the call site:

int attempts = 0;
EnsembleOutput output = null;

while (attempts < 3) {
    try {
        output = ensemble.run(inputs);
        break;
    } catch (AgentExecutionException e) {
        attempts++;
        if (attempts == 3) throw e;
        Thread.sleep(1000L * attempts);
    }
}

For production use, consider integrating a resilience library such as Resilience4j, which provides circuit breakers, rate limiters, and retry policies that compose well with the exception hierarchy.

The Operational Model

The error handling design reflects a particular view of how agent systems should be operated: failures are expected, partial results are valuable, and the framework should give you structured information rather than opaque error strings.

The exception hierarchy makes it possible to build monitoring and alerting that distinguishes between configuration errors (fix the code), transient failures (retry or escalate), agent loops (tune the workflow), and intentional stops (human decision). The partial result preservation makes it possible to build resumable pipelines. The exit reasons make it possible to build dashboards that accurately represent pipeline outcomes.

None of this prevents failures. It gives you the handles to respond to them systematically.

The full error handling guide is in the AgentEnsemble documentation.

I'd be interested in whether you have found the exception hierarchy granularity to be sufficient, or whether there are failure modes in your agent systems that do not map cleanly to these categories.

Scoped Memory for Agent Systems: Cross-Run Persistence Without Global State

mgd43b — Fri, 29 May 2026 14:00:00 +0000

Most agent frameworks treat each run as stateless. The agent starts fresh, does its work, and the output is consumed by whatever called it. If you run the same workflow again next week, the agent has no memory of what it produced last time.

For some use cases that is fine. For others -- recurring research tasks, iterative drafting, accumulated domain knowledge -- you want the agent to remember what it learned in previous runs and build on it.

The question is how to add cross-run memory without introducing global shared state that makes the system hard to reason about.

Named Scopes as the Isolation Mechanism

AgentEnsemble uses named memory scopes. Each task declares which scopes it reads from and writes to. A task can only see memory from scopes it explicitly declares.

MemoryStore store = MemoryStore.inMemory();

Task researchTask = Task.builder()
    .description("Research current AI trends")
    .expectedOutput("A research report")
    .agent(researcher)
    .memory("ai-research")
    .build();

Ensemble.builder()
    .agent(researcher)
    .task(researchTask)
    .memoryStore(store)
    .build()
    .run();

After the run, the task's output is stored in the "ai-research" scope. On a second run with the same store, the agent's prompt automatically includes entries from the first run under a ## Memory: ai-research section.

The scope name is the isolation boundary. Task A storing into "research" and task B declaring only "drafts" means task B never sees task A's output. This is not a security mechanism -- it is an attention mechanism. It controls what context an agent receives, keeping prompts focused on relevant history rather than everything that ever happened.

How It Works at the Prompt Level

The mechanics are straightforward:

At task startup, the framework retrieves entries from every declared scope and injects them into the agent's prompt.
At task completion, the framework stores the task output into every declared scope.
Because entries persist in the MemoryStore across runs, agents in later runs automatically see outputs from earlier runs.

The prompt injection looks like this:

## Memory: ai-project
The following information from scope "ai-project" may be relevant:

---
Research findings from previous run: AI is accelerating in healthcare...
---

## Task
Analyse the research findings

There is no magic retrieval. The framework puts the memory content into the prompt, and the LLM uses it (or ignores it) during reasoning.

Pluggable Storage

MemoryStore has two built-in implementations:

In-memory stores entries in insertion order per scope. Retrieval returns the most recent entries without semantic search. Suitable for development, testing, and single-JVM runs. Entries do not survive JVM restarts.

MemoryStore store = MemoryStore.inMemory();

Embedding-based stores entries via an embedding model and retrieves them via semantic similarity search. The backing EmbeddingStore controls durability -- Chroma, Qdrant, Pinecone, pgvector, or any LangChain4j-compatible store.

EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName("text-embedding-3-small")
    .build();

EmbeddingStore<TextSegment> embeddingStore = ChromaEmbeddingStore.builder()
    .baseUrl("http://localhost:8000")
    .collectionName("agentensemble-memory")
    .build();

MemoryStore store = MemoryStore.embeddings(embeddingModel, embeddingStore);

The design tradeoff is explicit. In-memory is fast and simple but loses data on restart and does not do semantic retrieval. Embedding-based is durable and semantically aware but requires an embedding model and a vector store. You choose based on your operational requirements.

Eviction Policies

Unbounded memory is a prompt-size problem. Every stored entry adds tokens to the next run's prompt. Scopes support optional eviction to keep sizes bounded:

// Retain only the 5 most recent entries
MemoryScope.builder()
    .name("research")
    .keepLastEntries(5)
    .build()

// Retain only entries from the past 7 days
MemoryScope.builder()
    .name("research")
    .keepEntriesWithin(Duration.ofDays(7))
    .build()

Eviction is applied after each task stores its output. For embedding-based stores, eviction is a no-op since most embedding stores do not support deletion of individual entries.

MemoryTool: Agent-Driven Memory Access

In addition to the automatic scope-based mechanism, agents can interact with memory directly during their ReAct loop using MemoryTool:

Agent researcher = Agent.builder()
    .role("Researcher")
    .goal("Research and remember important facts")
    .tools(MemoryTool.of("research", store))
    .build();

MemoryTool provides two tool methods the LLM can call: storeMemory(key, value) to store an arbitrary fact, and retrieveMemory(query) to retrieve relevant memories by query.

When the same MemoryStore instance is used for both MemoryTool and Ensemble.builder().memoryStore(...), explicit tool access and automatic scope-based access share the same backing store. This means an agent can both receive automatic context from previous runs and actively query or store additional facts during execution.

Multiple Tasks Sharing a Scope

Multiple tasks can declare the same scope name. Each task writes its output to the scope after it completes, so later tasks in a sequential workflow see earlier tasks' outputs:

Task research = Task.builder()
    .description("Research AI trends")
    .memory("ai-project")
    .build();

Task analysis = Task.builder()
    .description("Analyse the research findings")
    .memory("ai-project")
    .build();

Ensemble.builder()
    .task(research)
    .task(analysis)
    .memoryStore(store)
    .build()
    .run();

This is within-run memory sharing. The analysis task sees the research task's output because they share the "ai-project" scope. On the next run, both tasks see outputs from the previous run's research and analysis.

The Design Principle

The key design decision is that memory is opt-in and scoped, not global and automatic. An agent does not remember everything by default. Each task explicitly declares what it wants to remember and what it wants to recall.

This makes the system easier to reason about. You can look at a task definition and know exactly what memory context it will receive. You can test a task with a pre-populated store and verify that it uses the memory correctly. You can clear a scope without affecting other scopes.

The tradeoff is that you have to think about memory design upfront. Which tasks share scopes? How many entries should be retained? Should you use semantic search or recency-based retrieval? These are design decisions that the framework surfaces explicitly rather than hiding behind defaults.

The full memory guide is in the AgentEnsemble documentation.

I'd be interested in how you handle the prompt-size tension -- whether bounded eviction is sufficient, or whether you have needed more sophisticated retrieval strategies for production memory systems.

Tool Pipelines: Eliminating LLM Round-Trips for Deterministic Tool Chains

mgd43b — Wed, 27 May 2026 14:00:00 +0000

In a standard ReAct loop, every tool call requires an LLM round-trip. The agent calls a search tool, receives results, reasons about them, calls a filter tool, receives filtered output, reasons again, calls a format tool, and so on. Each step costs tokens, adds latency, and requires the LLM to make a decision that is often trivial -- the next step in the chain is predetermined.

For deterministic data transformation chains, the LLM adds no reasoning value between steps. It just passes the output of one tool as input to the next. The interesting question is whether you can collapse that chain into a single tool call.

The ToolPipeline Abstraction

AgentEnsemble provides ToolPipeline, which chains multiple tools into a single compound tool. The LLM calls it once; all steps execute sequentially without LLM round-trips between them.

// Standard ReAct loop (3 LLM round-trips for tool mediation):
LLM -> search_tool -> LLM -> filter_tool -> LLM -> format_tool -> LLM

// With ToolPipeline (0 extra round-trips):
LLM -> search_then_filter_then_format -> LLM

The simplest way to create one:

ToolPipeline pipeline = ToolPipeline.of(
    new WebSearchTool(provider),
    new JsonParserTool(),
    FileWriteTool.of(outputPath)
);
// name: "web_search_then_json_parser_then_file_write"

var task = Task.builder()
    .description("Research AI trends and save the top result to disk")
    .expectedOutput("Confirmation that the result was saved")
    .tools(List.of(pipeline))
    .build();

Data Flow and Adapters

By default, ToolResult.getOutput() from step N is passed as the input to step N+1. This works when tool outputs are directly consumable by the next tool.

When you need to reshape data between steps, attach an adapter:

ToolPipeline pipeline = ToolPipeline.builder()
    .name("extract_and_calculate")
    .description("Extract a numeric field from JSON and apply a formula")
    .step(new JsonParserTool())
    .adapter(result -> result.getOutput() + " * 1.1")
    .step(new CalculatorTool())
    .build();

The adapter transforms the JsonParserTool output (e.g., "149.99") into a calculator expression ("149.99 * 1.1") before passing it to CalculatorTool. Adapters have full access to ToolResult, including getStructuredOutput() for typed payloads.

This is the key design decision: adapters are plain Java functions, not LLM calls. They handle the deterministic reshaping that the LLM would otherwise do at full inference cost.

Error Strategies

Pipelines support two error strategies:

FAIL_FAST (default) stops the pipeline on the first failed step and returns that failure to the LLM immediately. Subsequent steps are never executed.

CONTINUE_ON_FAILURE continues executing subsequent steps even when an intermediate step fails. The failed step's error message is forwarded as input to the next step.

ToolPipeline pipeline = ToolPipeline.builder()
    .name("resilient_pipeline")
    .description("Continues even when a step fails")
    .step(stepA)
    .step(stepB)
    .step(stepC)
    .errorStrategy(PipelineErrorStrategy.CONTINUE_ON_FAILURE)
    .build();

The choice between them depends on whether downstream steps can recover from upstream failures. For a search-then-save pipeline, FAIL_FAST makes sense -- there is nothing to save if the search failed. For a multi-source aggregation, CONTINUE_ON_FAILURE lets the pipeline produce partial results.

Approval Gates Within Pipelines

Steps inside a pipeline that require human approval will pause mid-pipeline, exactly as if they were standalone tools. The pipeline propagates the ensemble's ReviewHandler to all nested steps automatically.

ToolPipeline pipeline = ToolPipeline.of(
    new JsonParserTool(),
    FileWriteTool.builder(outputPath)
        .requireApproval(true)
        .build()
);

This means you can build pipelines that include a human checkpoint before a destructive operation (like writing to disk or calling an external API) without losing the token savings for the deterministic steps before the checkpoint.

Nesting and Composition

A ToolPipeline implements AgentTool, so it can be used as a step inside another pipeline:

ToolPipeline inner = ToolPipeline.of("step_a", "desc", toolA, toolB);
ToolPipeline outer = ToolPipeline.of("outer", "desc", inner, toolC);

This lets you build reusable pipeline fragments and compose them into larger chains. Each pipeline records its own aggregate metrics (timing, success/failure counts) in addition to the per-step metrics from individual tools.

When to Use Pipelines vs. Separate Tools

The decision boundary is whether the LLM needs to reason between steps.

Use ToolPipeline when steps are deterministic and order-locked -- the LLM should not skip or reorder them, and the data transformations between steps are mechanical. The full chain appears as one operation to the LLM.

Use separate tools when the LLM needs to decide which tool to call next based on intermediate results, or when intermediate results are useful for the LLM to see and reason about.

In practice, this means pipelines work well for data retrieval and transformation chains (search, parse, filter, write), while separate tools work better for exploratory workflows where the agent needs to adapt its approach based on what it finds.

The Broader Pattern

ToolPipeline is one instance of a broader design principle in AgentEnsemble: when something is deterministic, do not pay LLM inference costs for it. This same principle appears in deterministic-only orchestration (tasks that never call an LLM), typed tool inputs (schema validation without LLM intervention), and phase-level workflow grouping (execution order declared in code, not negotiated by the LLM).

The common thread is that the framework should handle mechanical work mechanically, and reserve LLM inference for decisions that actually require reasoning.

The full tool pipeline guide is in the AgentEnsemble documentation.

Curious whether you have seen tool chains where the boundary between "deterministic" and "needs reasoning" is ambiguous, and how you would draw that line.

Guardrails for Agent Output: Pluggable Validation Before and After LLM Calls

mgd43b — Mon, 25 May 2026 14:00:00 +0000

One of the harder problems in agent systems is constraining output quality without turning every prompt into a wall of instructions. You can ask the LLM to stay under 3000 characters, or to always include a conclusion section, or to never mention competitor products. But prompt-based constraints are probabilistic. The LLM might follow them. It might not.

Guardrails are the deterministic layer. They run as Java code before and after the LLM call, and they enforce rules that prompts cannot guarantee.

The Model

AgentEnsemble implements guardrails as two functional interfaces: InputGuardrail and OutputGuardrail. Both return a GuardrailResult -- either success or failure with a reason.

Input guardrails run before the LLM is contacted. If any fails, execution stops immediately and the agent's LLM is never called. Output guardrails run after the agent produces a response (and after structured output parsing, if configured).

InputGuardrail piiGuardrail = input -> {
    String desc = input.taskDescription().toLowerCase();
    if (desc.contains("ssn") || desc.contains("credit card")) {
        return GuardrailResult.failure(
            "Task description may contain personally identifiable information");
    }
    return GuardrailResult.success();
};

OutputGuardrail lengthGuardrail = output -> {
    if (output.rawResponse().length() > 3000) {
        return GuardrailResult.failure(
            "Response is " + output.rawResponse().length()
            + " chars, exceeds limit of 3000");
    }
    return GuardrailResult.success();
};

Both are configured per-task:

var task = Task.builder()
    .description("Write an executive summary")
    .expectedOutput("A concise summary")
    .agent(writer)
    .inputGuardrails(List.of(piiGuardrail))
    .outputGuardrails(List.of(lengthGuardrail))
    .build();

Why Functional Interfaces

The choice to make guardrails functional interfaces rather than annotation-based or configuration-driven has a few practical consequences.

First, guardrails are composable. You can build them from lambdas, combine them, or wrap them in utility methods. A guardrail that checks for PII can be reused across every task in the ensemble without any framework-specific wiring.

Second, they are testable in isolation. A guardrail is a pure function from input to result. You can unit test it without standing up an ensemble or mocking an LLM.

Third, they are stateless by default. Since guardrails may run concurrently (in parallel workflows), stateless lambdas are inherently thread-safe. If you need stateful validation, thread safety is your responsibility.

What Input Guardrails See

The GuardrailInput record carries everything you need to make a pre-execution decision:

taskDescription() -- the task description text
expectedOutput() -- the expected output specification
contextOutputs() -- outputs from prior context tasks (immutable)
agentRole() -- the role of the agent about to execute

This means you can write guardrails that check not just the current task, but the outputs of upstream tasks. For example, a guardrail that rejects a writing task if the research task upstream produced no findings:

InputGuardrail requireResearch = input -> {
    boolean hasResearch = input.contextOutputs().stream()
        .anyMatch(o -> o.getRaw().length() > 100);
    if (!hasResearch) {
        return GuardrailResult.failure("No substantive research output found");
    }
    return GuardrailResult.success();
};

Output Guardrails and Typed Output

When a task uses outputType for structured output, the execution order is:

Input guardrails run (before LLM)
LLM executes and produces raw text
Structured output parsing (JSON extraction + deserialization)
Output guardrails run (with both rawResponse() and parsedOutput() available)

This means output guardrails can inspect the typed Java object directly:

record ResearchReport(String title, List<String> findings, String conclusion) {}

OutputGuardrail findingsGuardrail = output -> {
    if (output.parsedOutput() instanceof ResearchReport report) {
        if (report.findings() == null || report.findings().isEmpty()) {
            return GuardrailResult.failure(
                "Report must include at least one finding");
        }
    }
    return GuardrailResult.success();
};

This is where guardrails and typed outputs reinforce each other. The type system gives you a parsed object; the guardrail gives you a place to enforce business rules on that object.

Multiple Guardrails and Evaluation Order

Multiple guardrails per task are evaluated in order. The first failure stops evaluation -- subsequent guardrails are not called.

var task = Task.builder()
    .description("Write an article")
    .expectedOutput("An article")
    .agent(writer)
    .inputGuardrails(List.of(piiGuardrail, roleGuardrail, domainGuardrail))
    .outputGuardrails(List.of(lengthGuardrail, conclusionGuardrail))
    .build();

If you want to collect all failures rather than short-circuit, compose them into a single guardrail:

InputGuardrail compositeGuardrail = input -> {
    List<String> failures = new ArrayList<>();
    for (InputGuardrail g : List.of(piiGuardrail, roleGuardrail)) {
        GuardrailResult r = g.validate(input);
        if (!r.isSuccess()) failures.add(r.getMessage());
    }
    return failures.isEmpty()
        ? GuardrailResult.success()
        : GuardrailResult.failure(String.join("; ", failures));
};

Exception Propagation

When a guardrail fails, GuardrailViolationException is thrown. It propagates through the workflow executor and is wrapped in TaskExecutionException, following the same pattern as other task failures.

The exception carries structured information -- guardrail type (INPUT or OUTPUT), violation message, task description, and agent role -- so you can route failures to metrics or alerting without parsing error strings.

try {
    ensemble.run();
} catch (TaskExecutionException ex) {
    if (ex.getCause() instanceof GuardrailViolationException gve) {
        metrics.increment("guardrail.violation." + gve.getGuardrailType());
        log.warn("Guardrail blocked task '{}': {}",
            gve.getTaskDescription(), gve.getViolationMessage());
    }
}

The Tradeoff

Guardrails are deterministic checks, not semantic analysis. A length limit is easy to enforce. A toxicity check is harder -- you would need to call an external classifier inside the guardrail, which adds latency and its own failure modes.

The design intentionally keeps guardrails as simple synchronous functions. If you need async validation, external API calls, or retry logic, you implement that inside the guardrail function. The framework does not impose an opinion on how complex your validation should be.

This means guardrails are most useful for structural and policy checks -- length limits, required sections, PII filters, role-based access, schema validation on typed outputs. For semantic quality checks, the phase review and task reflection mechanisms (covered in earlier posts) are a better fit.

The full guardrails guide is in the AgentEnsemble documentation.

I'd be interested in whether the input/output split feels like the right abstraction, or whether you have seen validation needs that do not fit cleanly into either category.

Wiring Agent Ensembles into Spring Boot, Micronaut, and Quarkus

mgd43b — Sat, 23 May 2026 14:00:00 +0000

One question that comes up early when evaluating an agent orchestration library is how it fits into an existing backend stack. If your services run on Spring Boot, Micronaut, or Quarkus, you want agents to live inside the same dependency injection container, use the same configuration system, and expose metrics through the same actuator endpoints.

The interesting design decision in AgentEnsemble is that it has no framework dependencies at all. It is a plain Java 21+ library with a builder API. Framework integration is just a matter of wrapping those builder calls in whatever DI mechanism your framework uses. Nothing in the library changes.

This keeps the core small and testable, but it also means the integration patterns are worth spelling out explicitly.

The Builder API as the Integration Surface

Every AgentEnsemble component -- agents, tasks, ensembles, memory stores, listeners -- is created through builders. The framework never scans for annotations, never registers beans automatically, and never assumes a particular lifecycle model.

This is deliberate. The builder API is the integration surface. In a DI container, you turn builder calls into bean definitions. In a plain main() method, you call the same builders directly.

Agent researcher = Agent.builder()
        .role("Research Analyst")
        .goal("Find accurate, up-to-date information")
        .backstory("You are a meticulous researcher.")
        .build();

That same code works identically inside a Spring @Configuration, a Micronaut @Factory, a Quarkus CDI producer, or a static main method.

Spring Boot

Spring Boot is the most common case. The LangChain4j Spring Boot starters handle ChatLanguageModel bean creation from application.properties automatically -- AgentEnsemble does not duplicate that responsibility.

Dependencies

dependencies {
    implementation("net.agentensemble:agentensemble-core:2.10.0")
    implementation("dev.langchain4j:langchain4j-spring-boot-starter:1.11.0")
    implementation("dev.langchain4j:langchain4j-open-ai-spring-boot-starter:1.11.0")
    // Optional: metrics via Spring Boot Actuator
    implementation("net.agentensemble:agentensemble-metrics-micrometer:2.10.0")
}

Configuration Class

Spring injects the ChatLanguageModel bean (created by the LangChain4j starter) and any EnsembleListener beans you have declared elsewhere.

@Configuration
public class AgentEnsembleConfig {

    @Bean
    public Agent researcher() {
        return Agent.builder()
                .role("Research Analyst")
                .goal("Find accurate, up-to-date information on the given topic")
                .backstory("You are a meticulous researcher with a talent for "
                        + "finding relevant information quickly.")
                .build();
    }

    @Bean
    public Ensemble ensemble(
            ChatLanguageModel chatModel,
            Agent researcher,
            List<EnsembleListener> listeners,
            Optional<ToolMetrics> toolMetrics) {

        Ensemble.Builder builder = Ensemble.builder()
                .chatModel(chatModel)
                .agents(researcher);

        listeners.forEach(builder::listener);
        toolMetrics.ifPresent(builder::toolMetrics);

        return builder.build();
    }
}

The pattern here is standard Spring: declare beans, let Spring wire them. Any @Component implementing EnsembleListener is automatically collected via the List<EnsembleListener> injection.

Metrics via Actuator

If you use Micrometer with Spring Boot Actuator, declare a ToolMetrics bean and agent metrics appear at /actuator/metrics automatically:

@Bean
public ToolMetrics toolMetrics(MeterRegistry registry) {
    return new MicrometerToolMetrics(registry);
}

Using the Ensemble

Inject the Ensemble bean wherever you need it. Build tasks at the call site where you have the runtime inputs:

@Service
public class ResearchService {
    private final Ensemble ensemble;
    private final Agent researcher;

    public ResearchService(Ensemble ensemble, Agent researcher) {
        this.ensemble = ensemble;
        this.researcher = researcher;
    }

    public String research(String topic) {
        Task task = Task.builder()
                .description("Research and summarise: " + topic)
                .expectedOutput("A concise summary with key findings")
                .agent(researcher)
                .build();
        return ensemble.run(task).finalOutput();
    }
}

Micronaut

Micronaut does not have a LangChain4j integration module, so you create the ChatLanguageModel bean directly. The rest of the pattern is the same -- a @Factory class with @Singleton methods.

@Factory
public class AgentEnsembleFactory {

    @Singleton
    public ChatLanguageModel chatModel(
            @Value("${agentensemble.openai.api-key}") String apiKey,
            @Value("${agentensemble.openai.model-name}") String modelName) {
        return OpenAiChatModel.builder()
                .apiKey(apiKey)
                .modelName(modelName)
                .build();
    }

    @Singleton
    public Ensemble ensemble(
            ChatLanguageModel chatModel,
            Agent researcher,
            List<EnsembleListener> listeners) {
        Ensemble.Builder builder = Ensemble.builder()
                .chatModel(chatModel)
                .agents(researcher);
        listeners.forEach(builder::listener);
        return builder.build();
    }
}

Micronaut injects all EnsembleListener beans automatically via the List<EnsembleListener> parameter. Micrometer metrics work out of the box since Micronaut ships with native Micrometer support.

Quarkus

Quarkus has its own quarkus-langchain4j extension with a different programming model. The example below uses the standard LangChain4j library directly with Quarkus CDI:

@ApplicationScoped
public class AgentEnsembleProducer {

    @ConfigProperty(name = "agentensemble.openai.api-key")
    String apiKey;

    @Produces @ApplicationScoped
    public ChatLanguageModel chatModel() {
        return OpenAiChatModel.builder()
                .apiKey(apiKey)
                .modelName("gpt-4o")
                .build();
    }

    @Produces @ApplicationScoped
    public Ensemble ensemble(
            ChatLanguageModel chatModel,
            Agent researcher,
            Instance<EnsembleListener> listeners) {
        Ensemble.Builder builder = Ensemble.builder()
                .chatModel(chatModel)
                .agents(researcher);
        listeners.forEach(builder::listener);
        return builder.build();
    }
}

The only Quarkus-specific detail is Instance<EnsembleListener> instead of List<EnsembleListener> -- CDI's lazy injection mechanism.

The Design Tradeoff

The choice to keep AgentEnsemble framework-agnostic means there is no auto-configuration, no classpath scanning, and no starter module that wires everything with a single dependency. You write the configuration class yourself.

The upside is that the integration is completely transparent. There is no hidden magic, no classpath-sensitive behavior, and no risk of version conflicts between the library's framework assumptions and your application's framework version. The builder API is the same everywhere, so moving between frameworks (or running without one) requires changing only the DI wiring.

For teams that already have a preferred framework and know how to write configuration classes, this is usually the right tradeoff. The wiring code is small, readable, and lives in one place.

What Crosses the DI Boundary

A few integration points are worth calling out:

Listeners integrate naturally as DI beans. Declare any EnsembleListener implementation as a bean, and the ensemble configuration collects them.
Memory components (MemoryStore, EnsembleMemory) are created via builders and passed to the ensemble. In a DI framework, declare them as beans.
Tools are configured per-agent. Declare tool instances as beans and inject them into agent factory methods.
Metrics via MicrometerToolMetrics plug into whatever MeterRegistry your framework provides.

The general rule: if a component is created via a builder, it can be a bean. If it is passed to the ensemble builder, it can be injected.

The framework integration guide and full code examples are in the AgentEnsemble documentation.

I'd be interested in whether this level of framework-agnosticism feels right, or whether starter modules that auto-configure common setups would be more useful for your team.

Operating Agent Networks: Visual Topology, Drill-Down, and Runtime Visibility

mgd43b — Thu, 21 May 2026 14:00:00 +0000

Building an agent network is one problem. Operating it is a different one. When you have ten ensembles communicating over WebSockets, sharing capabilities via discovery, routing requests across federation boundaries, and managing capacity with priority queues -- you need to see what is happening.

The operational question is not "does the code work?" but "what is the system doing right now?" Which ensembles are healthy? Which are overloaded? What capabilities are available? Where are requests being routed? What changed in the last hour?

The visibility gap

Individual ensemble dashboards (the live execution view covered in an earlier post) show what one ensemble is doing: its current task, agent iterations, tool calls, and trace. But they do not show the network -- how ensembles relate to each other, where requests flow, and where bottlenecks form.

The gap is between per-ensemble observability and network-level observability. Both are needed. The per-ensemble view tells you why a specific task took 30 seconds. The network view tells you why the kitchen has a queue depth of 15 while the maintenance team is idle.

The network dashboard

AgentEnsemble's network dashboard provides a topology view of the entire ensemble network. Navigate to:

http://localhost:5173/network?ensembles=kitchen:ws://localhost:7329/ws,maintenance:ws://localhost:7330/ws

The ensembles query parameter accepts a comma-separated list of name:wsUrl pairs. Each ensemble gets its own independent WebSocket connection -- no aggregating proxy or central coordinator needed.

Topology graph

Ensembles are displayed as nodes in an interactive graph powered by React Flow. Each node shows:

Ensemble name
Lifecycle state (green = READY, yellow = STARTING, red = STOPPED)
Active task count and queue depth
Task progress bar

Connections between ensembles (shared tasks and tools) are displayed as animated edges. The topology is derived from capability discovery -- if the kitchen shares a prepare-meal task and room service uses it, the edge appears automatically.

Ensemble detail sidebar

Click an ensemble node to open the sidebar panel showing:

Lifecycle state and connection status
Active tasks, queue depth, completed tasks metrics
Shared capabilities (tasks and tools with tags)
WebSocket URL

Drill-down to live execution

Click "Drill Down" in the sidebar to navigate to the live execution dashboard for that specific ensemble. This reuses the existing per-ensemble dashboard infrastructure -- the same trace view, agent iteration timeline, and tool call details.

The flow is: network topology (high-level) -> ensemble detail (mid-level) -> live execution trace (low-level). Each level answers different questions.

Dynamic ensemble addition

Click "Add Ensemble" in the header to connect to a new ensemble by entering its name and WebSocket URL. The dashboard is not static -- you can add ensembles as you discover them or as new ones come online.

Architecture

The network dashboard opens independent WebSocket connections to each ensemble. There is no central aggregator. Each ensemble already exposes a WebSocket endpoint for the live dashboard; the network dashboard reuses those same endpoints.

Status polling (every 5 seconds) fetches /api/status from each ensemble's HTTP endpoint for queue depth and lifecycle state. The existing HelloMessage with snapshotTrace provides late-join support -- when a new connection opens, it receives the current state immediately rather than waiting for the next update.

This architecture means the dashboard works with any combination of ensembles, including ensembles in different namespaces or clusters (as long as the WebSocket URLs are reachable). It also means the dashboard has no state of its own -- refreshing the page reconnects to all ensembles and rebuilds the view.

Audit trail

Beyond the real-time dashboard, the audit trail provides a historical record of network events:

Work requests received, completed, and failed
Capacity changes (profile applications, scaling events)
Discovery events (capabilities registered and deregistered)
Federation events (cross-realm routing decisions)

The audit trail is append-only and backed by the same transport infrastructure as the rest of the network. In development, it is an in-memory log. In production, it can be backed by Kafka for durability and external consumption.

The audit trail answers questions that the real-time dashboard cannot: "When did the kitchen start receiving requests from the airport realm?" "How many requests failed between 2am and 4am?" "When was the weekend profile last applied?"

Operational profiles in the dashboard

When an operational profile is applied, the dashboard receives a ProfileAppliedMessage and updates the topology to reflect the new capacity targets:

{
  "type": "profile_applied",
  "profileName": "sporting-event-weekend",
  "capacities": {
    "front-desk": { "replicas": 4, "maxConcurrent": 50, "dormant": false },
    "kitchen": { "replicas": 3, "maxConcurrent": 100, "dormant": false }
  },
  "appliedAt": "2026-03-28T14:30:00Z"
}

The dashboard can display the active profile, show which ensembles have adjusted their capacity, and highlight ensembles that have not yet reached their target capacity.

What the dashboard does not do

The dashboard is read-only. It does not send commands to ensembles, apply profiles, or adjust capacity. It observes and displays.

This is deliberate. Operational actions (scaling, profile application, ensemble restarts) should go through the directive system, the profile scheduler, or your deployment pipeline. The dashboard provides the visibility to make those decisions, not the mechanism to execute them.

The exception is the "Add Ensemble" feature, which adds a display connection -- it does not modify the ensemble's configuration or behavior.

Tradeoffs

No central state. The dashboard has no persistent state. If you close it and reopen, it reconnects and rebuilds. This simplifies the architecture but means there is no historical view in the dashboard itself -- that is what the audit trail is for.

WebSocket reachability. The dashboard needs WebSocket access to every ensemble. In a Kubernetes deployment, this may require ingress configuration or port-forwarding. In development, ensembles run on localhost and are directly reachable.

Polling interval. Status is polled every 5 seconds. Events that happen between polls (a brief spike in queue depth, a transient connection failure) may not be visible. For sub-second operational visibility, you would need to supplement with metrics (Micrometer, OpenTelemetry) exported to a time-series database.

No alerting. The dashboard shows current state and recent history. It does not trigger alerts when thresholds are crossed. Alerting should be handled by your monitoring stack (Grafana, Prometheus, PagerDuty) using the metrics that ensembles already export.

The design principle

The useful insight is that operating an agent network requires three levels of visibility:

Network level -- topology, connections, capacity distribution, routing patterns
Ensemble level -- queue depth, active tasks, shared capabilities, health
Execution level -- individual task traces, agent iterations, tool calls

Each level answers different questions. The network dashboard provides levels 1 and 2, with drill-down to level 3 via the existing live execution dashboard. The audit trail provides the historical dimension that complements the real-time view.

This layered observability is not unique to agent systems -- it mirrors the service mesh / individual service / request trace pattern in microservices. What is specific to agent systems is the non-determinism: you cannot predict how long a task will take, how many iterations an agent will need, or whether a request will be delegated to another ensemble. The dashboard helps operators reason about a system that is inherently unpredictable.

The network dashboard is part of AgentEnsemble. The network dashboard guide covers setup, and the audit trail guide covers the historical event log.

I'd be interested in what operational tools others are using for multi-agent systems -- and whether the topology-first approach matches how you think about agent network operations.

Testing Distributed Agent Systems: Stubs, Recordings, and Isolation

mgd43b — Tue, 19 May 2026 14:00:00 +0000

Testing a single agent ensemble is already harder than testing most software: the output is non-deterministic, the execution path depends on LLM responses, and the number of iterations is unpredictable.

Testing a network of agent ensembles adds distributed system concerns on top of that: WebSocket connections between services, shared state across ensembles, capability discovery, and cross-ensemble delegation. If your tests require all of these to be running, your test suite becomes an integration environment rather than a test suite.

The question is how to test an ensemble's network behavior without requiring the rest of the network to be running.

The testing problem

An ensemble that delegates work to other ensembles via NetworkTask or NetworkTool has external dependencies. In production, those dependencies are real ensembles running on real infrastructure. In tests, you need control over what those dependencies return.

// Production code: room service delegates to kitchen
Ensemble roomService = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.builder()
        .description("Handle room service request")
        .tools(NetworkTask.of("kitchen", "prepare-meal",
            "ws://kitchen:7329/ws"))
        .build())
    .build();

If you test this ensemble without a running kitchen, the NetworkTask fails to connect. If you run a real kitchen, your test depends on the kitchen's LLM, its prompt, its tools -- all of which are outside your control and non-deterministic.

Stubs for predictable behavior

NetworkTask.stub() and NetworkTool.stub() return canned responses without connecting to any real ensemble:

StubNetworkTask mealStub = NetworkTask.stub("kitchen", "prepare-meal",
    "Meal prepared: wagyu steak, medium-rare. Estimated 25 minutes.");

StubNetworkTool inventoryStub = NetworkTool.stub("kitchen", "check-inventory",
    "3 portions available");

Ensemble roomService = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.builder()
        .description("Handle room service request")
        .tools(mealStub, inventoryStub)
        .build())
    .build();

EnsembleOutput result = roomService.run();

The stub replaces the real network call with a predetermined response. The ensemble under test processes the stub's response exactly as it would process a response from a real kitchen ensemble.

This gives you deterministic network behavior while letting the ensemble's own LLM interactions remain non-deterministic. You are testing how the ensemble uses the network response, not whether the network itself works.

Recordings for assertion

Sometimes you need to verify not just the output but what the ensemble sent to its dependencies. NetworkTask.recording() captures every request for later assertion:

RecordingNetworkTask recorder = NetworkTask.recording("kitchen", "prepare-meal");

Ensemble roomService = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.builder()
        .description("Handle room service request for wagyu steak")
        .tools(recorder)
        .build())
    .build();

roomService.run();

// Assert what was sent to the kitchen
assertThat(recorder.callCount()).isEqualTo(1);
assertThat(recorder.lastRequest()).contains("wagyu");
assertThat(recorder.requests()).hasSize(1);

Recordings combine the predictability of stubs (they return a configurable response, defaulting to "recorded") with the observability of a mock. You can verify that the ensemble called the right capability with the right parameters.

This is particularly useful for testing delegation logic: does the ensemble delegate to the right capability? Does it include the right context in the request? Does it handle the response correctly?

Custom responses for recordings

By default, recordings return "recorded". You can provide a custom response:

RecordingNetworkTask recorder = NetworkTask.recording("kitchen", "prepare-meal",
    "Meal prepared in 25 minutes");

Tool naming consistency

Both stubs and recordings use the same naming convention as real network tools: "ensemble.capability". A stub for the kitchen's prepare-meal capability is named "kitchen.prepare-meal" -- the same name the agent sees when using a real NetworkTask.

This means the agent's tool selection logic works identically in tests and production. The agent does not know whether kitchen.prepare-meal is backed by a real WebSocket connection, a stub, or a recording.

Thread safety

All test doubles are thread-safe. RecordingNetworkTask and RecordingNetworkTool use CopyOnWriteArrayList internally, so concurrent calls from parallel tool execution are safely recorded.

This matters because agent ensembles with multiple concurrent tasks may invoke network tools from different threads simultaneously. The test doubles handle this correctly without external synchronization.

Integration test patterns

Stubs and recordings handle unit-level testing: verifying one ensemble's behavior in isolation. For integration testing -- verifying that two ensembles work together correctly -- you can run both ensembles in the same process with in-process transport:

// In-process: no WebSocket connections needed
Ensemble kitchen = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.of("Manage kitchen operations"))
    .shareTask("prepare-meal", mealTask)
    .build();

Ensemble roomService = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.builder()
        .description("Handle room service request")
        .tools(NetworkTask.of("kitchen", "prepare-meal"))
        .build())
    .build();

// Both ensembles run in-process with shared registry

In-process transport eliminates network concerns (connection timeouts, port conflicts, container management) while preserving real cross-ensemble communication. The ensembles interact through in-memory queues and registries, so the behavior is the same as production except for the transport layer.

Testing patterns summary

What to test	Tool	Approach
Ensemble uses network response correctly	`NetworkTask.stub()`	Canned response, deterministic
Ensemble sends correct request to network	`NetworkTask.recording()`	Capture and assert requests
Two ensembles work together	In-process transport	Real interaction, no network
End-to-end with real infrastructure	WebSocket transport	Full integration test

Each level adds realism and cost. Start with stubs for fast, focused tests. Use recordings when you need to verify outbound requests. Use in-process transport for integration tests. Reserve full WebSocket tests for the deployment verification layer.

Tradeoffs

Stubs hide integration problems. A stub always returns the same response, regardless of what the ensemble sends. If the ensemble sends a malformed request that a real kitchen would reject, the stub does not catch that. Integration tests with in-process transport or WebSocket transport are needed to verify the contract between ensembles.

LLM non-determinism leaks through. Even with stubbed network dependencies, the ensemble's own LLM calls are non-deterministic. The same test may pass or fail depending on the model's response. For fully deterministic tests, you need to stub the LLM as well (using LangChain4j's test doubles or a local model with temperature 0).

Recordings only capture what was sent. They do not verify that the request would be accepted by the real provider. Schema validation or contract testing would be needed to verify compatibility.

In-process tests share a JVM. Two ensembles running in the same process share class loaders, thread pools, and memory. Resource contention in tests that does not occur in production is possible. Conversely, isolation problems that only occur with separate processes are not caught.

The design principle

The useful insight is that network behavior and business logic are separable concerns. An ensemble's decision to delegate to the kitchen, and how it processes the kitchen's response, is business logic. The WebSocket connection, serialization, and transport is infrastructure.

Test doubles let you test the business logic without the infrastructure. In-process transport lets you test the interaction without the network. Full integration tests verify everything works together.

This layered approach is standard practice for distributed systems. What makes it notable in the agent context is that the business logic is already non-deterministic (LLM-driven), so isolating the network layer from the LLM layer is particularly valuable for test stability.

Network testing tools are part of AgentEnsemble. The network testing guide covers the full API including stubs, recordings, and in-process transport setup.

I'd be interested in how others approach testing multi-agent systems -- especially how you handle the double non-determinism of LLM behavior plus network behavior.

Capacity Management in Agent Networks: Rate Limiting, Priority Queues, and Backpressure

mgd43b — Sun, 17 May 2026 14:00:00 +0000

Agent ensembles that run as long-lived services on a network will, at some point, receive more work than they can handle. The question is what happens next.

Without capacity management, the answer is usually one of: unbounded queue growth (OOM), random request dropping, or cascade failures where an overloaded ensemble backs up its callers. None of these are acceptable for systems that run in production.

The capacity problem in agent networks

Agent workloads have properties that make capacity management harder than in traditional request/response systems:

Variable execution time. A simple analysis task might take 5 seconds. A complex coding task might take 5 minutes. You cannot predict queue drain rate from request count alone.
Variable cost. Each agent iteration consumes LLM tokens. An overloaded system does not just slow down -- it burns money faster.
Non-deterministic behavior. An agent might complete in 3 iterations or 30. Capacity planning based on averages can be wildly wrong for individual requests.
Fan-out amplification. One incoming request to a coordinator might fan out to 5 different ensembles. Overload at one point amplifies across the network.

These properties mean that simple concurrency limits are necessary but not sufficient. You also need priority ordering, starvation prevention, and the ability to adjust capacity proactively.

Rate limiting

The first line of defense is limiting how many tasks an ensemble processes concurrently:

Ensemble kitchen = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.of("Manage kitchen operations"))
    .maxConcurrent(10)
    .build();

When the concurrency limit is reached, new requests queue. The queue has a configurable maximum depth. When the queue is also full, the ensemble rejects new requests with a backpressure signal.

The backpressure signal propagates upstream: the calling ensemble receives a rejection and can retry, route to an alternative provider (via discovery), or report the failure to its own caller.

This is straightforward but effective. The ensemble protects itself from overload, and the backpressure signal gives callers information they need to make routing decisions.

Priority queues with aging

Not all requests are equally urgent. A VIP guest's meal request should be processed before a routine inventory check. Priority queues handle this, but naive priority queues have a starvation problem: low-priority requests may never be processed if high-priority requests keep arriving.

The PriorityRequestQueue adds aging to prevent starvation:

PriorityRequestQueue queue = PriorityRequestQueue.builder()
    .requestQueue(baseQueue)
    .levels(3)
    .agingInterval(Duration.ofMinutes(5))
    .build();

// Enqueue with priority
queue.enqueue(vipRequest, 0);       // highest priority
queue.enqueue(normalRequest, 1);    // normal
queue.enqueue(batchRequest, 2);     // lowest priority

Aging works by promoting requests that have waited longer than the aging interval. A batch request (priority 2) that has been waiting for 10 minutes (two aging intervals) gets promoted twice to priority 0. It will be processed next, regardless of incoming high-priority requests.

This guarantees that every request is eventually processed, while still giving meaningful priority to urgent work in the common case.

Operational profiles

Rate limits and priorities handle reactive capacity management -- responding to load as it arrives. Operational profiles handle proactive capacity management -- adjusting capacity in anticipation of known load changes.

A NetworkProfile bundles per-ensemble capacity targets and shared memory pre-load directives into a deployable unit:

NetworkProfile weekendProfile = NetworkProfile.builder()
    .name("sporting-event-weekend")
    .ensemble("front-desk", Capacity.replicas(4).maxConcurrent(50))
    .ensemble("kitchen", Capacity.replicas(3).maxConcurrent(100))
    .preload("kitchen", "inventory", "Extra beer and ice stocked")
    .build();

ProfileApplier applier = new ProfileApplier(sharedMemoryRegistry, broadcaster);
applier.apply(weekendProfile);

When a profile is applied, two things happen:

Pre-load directives seed shared memory scopes with context (e.g., alerting the kitchen about extra stock)
A ProfileAppliedMessage is broadcast to all ensembles with the new capacity targets

The broadcast message includes replica counts and concurrency limits. Consumers of this message (a Kubernetes operator, a scaling script, or the ensembles themselves) can use the targets to trigger actual scaling.

Scheduled profiles

Profiles can be applied on a schedule:

ProfileScheduler scheduler = new ProfileScheduler(applier);

// Apply weekend profile every 7 days
scheduler.schedule(weekendProfile,
    Duration.ofHours(2),    // initial delay
    Duration.ofDays(7));    // interval

// One-shot: return to normal after the weekend
scheduler.scheduleOnce(normalProfile, Duration.ofDays(3));

Directive-driven profiles

Profiles can also be applied via the directive system, enabling external triggers:

NetworkProfileDirectiveHandler handler =
    new NetworkProfileDirectiveHandler(applier, profiles);

directiveDispatcher.registerHandler("APPLY_PROFILE", handler);

An external system (monitoring, a human operator, a scheduler) sends an APPLY_PROFILE directive with the profile name, and the network adjusts.

Scheduled tasks

Long-running ensembles often need to perform recurring work: inventory checks, health monitoring, report generation. The ScheduledTask API makes this a first-class concern:

Ensemble kitchen = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.of("Manage kitchen operations"))
    .scheduledTask(ScheduledTask.builder()
        .name("inventory-check")
        .task(Task.of("Check current inventory levels and generate report"))
        .schedule(Schedule.every(Duration.ofHours(1)))
        .broadcastTo("hotel.inventory")
        .build())
    .scheduledTask(ScheduledTask.builder()
        .name("equipment-check")
        .task(Task.of("Verify all kitchen equipment is operational"))
        .schedule(Schedule.every(Duration.ofHours(12)))
        .build())
    .build();

Scheduled tasks run alongside the ensemble's normal work processing. They use the same concurrency limits -- a scheduled task counts against maxConcurrent the same as an incoming request. This prevents scheduled tasks from starving request processing.

The optional broadcastTo sends the task result to a named topic, where other ensembles can consume it as context.

Audit trail

For operational visibility, the audit trail captures significant events across the network:

Work request received/completed/failed
Capacity changes (profile applied, scaling events)
Discovery events (capability registered/deregistered)
Federation events (cross-realm routing)

The audit trail is append-only and can be backed by the same transport infrastructure as the rest of the network (in-memory for development, Kafka for production).

Tradeoffs

Rate limits are blunt. A concurrency limit of 10 treats all tasks equally. A quick 5-second task and a 5-minute coding task both count as one. For workloads with highly variable task duration, adaptive concurrency (adjusting limits based on observed completion times) would be more effective, but adds complexity.

Profile-based scaling is manual. Operational profiles define target capacities, but the actual scaling (adjusting K8s replicas, for instance) must be performed by an external system. The profile broadcast is a signal, not an actuator.

Priority is caller-declared. The caller assigns priority when enqueuing a request. There is no built-in mechanism to enforce priority policies or prevent callers from marking everything as high priority. Priority policies need to be enforced by convention or middleware.

Aging is time-based. Requests age based on wall-clock time, not queue depth or system load. Under sustained high load, aging may promote low-priority requests sooner than desired. Under low load, aging is irrelevant because everything is processed quickly anyway.

The design principle

Capacity management for agent networks needs three layers:

Reactive protection -- rate limits and backpressure to prevent overload in real time
Priority ordering -- ensuring urgent work is processed first, with aging to prevent starvation
Proactive adjustment -- operational profiles to scale capacity before anticipated load changes

Each layer addresses a different time horizon: seconds (rate limits), minutes (priority queues), and hours/days (operational profiles). Together, they give operators the tools to keep an agent network running under variable load without manual intervention for routine capacity changes.

Capacity management is part of AgentEnsemble. The rate limiting guide, operational profiles guide, and scheduled tasks guide cover the full APIs.

I'd be interested in how others handle capacity management in agent networks -- especially whether adaptive concurrency limits (based on observed task duration) have been useful in practice.

Shared Memory Across Agent Ensembles: Consistency Models for Distributed State

mgd43b — Fri, 15 May 2026 14:00:00 +0000

When agent ensembles operate as independent services on a network, they occasionally need to share state. The kitchen ensemble needs to know current inventory levels. The front desk needs to know room assignments. The maintenance team needs to know which equipment is out of service.

The question is not whether to share state -- it is how to share it without creating the coordination problems that shared mutable state always creates in distributed systems.

The consistency spectrum

Not all shared state needs the same consistency guarantees. Some observations:

Inventory notes ("extra beer stocked for the weekend") are advisory. If two ensembles read slightly different versions, the consequence is minor. Eventual consistency is fine.
Room assignments are exclusive. Two ensembles should not assign the same room to different guests. This needs stronger coordination -- distributed locks or optimistic locking.
Configuration preferences ("kitchen closes at 11pm") are rarely updated and widely read. Eventual consistency with version tracking works well.

Forcing a single consistency model on all shared state is either too expensive (locking everything) or too weak (eventual consistency for exclusive resources). The useful design is per-scope consistency selection.

SharedMemory with configurable consistency

AgentEnsemble v3.0.0 introduces SharedMemory -- a wrapper around the existing MemoryStore that adds consistency-aware read/write semantics:

SharedMemory sharedMemory = SharedMemory.builder()
    .store(MemoryStore.inMemory())
    .consistency(Consistency.EVENTUAL)
    .build();

// Store an entry
sharedMemory.store("inventory", MemoryEntry.builder()
    .content("Wagyu beef: 12 portions remaining")
    .storedAt(Instant.now())
    .build());

// Retrieve entries
List<MemoryEntry> entries = sharedMemory.retrieve("inventory", "beef", 10);

The Consistency enum controls the coordination behavior:

Model	Behavior	Use case
`EVENTUAL`	Last-write-wins, no coordination	Context, preferences, notes
`OPTIMISTIC`	Version-checked writes, retry on conflict	Counters, shared documents
`LOCKED`	Distributed lock before each read/write	Room assignments, exclusive resources

The consistency model is set per SharedMemory instance, which means different scopes can use different models:

// Advisory inventory notes -- eventual consistency
SharedMemory inventory = SharedMemory.builder()
    .store(MemoryStore.inMemory())
    .consistency(Consistency.EVENTUAL)
    .build();

// Room assignments -- distributed lock
SharedMemory rooms = SharedMemory.builder()
    .store(MemoryStore.inMemory())
    .consistency(Consistency.LOCKED)
    .build();

How the consistency models work

Eventual consistency

The simplest model. Writes go directly to the backing store with no coordination. Reads return the latest local value. If two ensembles write to the same key concurrently, the last write wins.

This is appropriate for state that is informational rather than authoritative -- agent context, preferences, notes, status updates. The cost of a stale read is low.

Optimistic locking

Each entry carries a version number. Writes include the expected version. If the actual version differs (because another ensemble wrote since the last read), the write fails with a ConcurrentModificationException and the caller retries with the updated version.

// Read with version
VersionedEntry entry = sharedMemory.readVersioned("inventory", "beef-count");

// Modify and write back with expected version
sharedMemory.writeVersioned("inventory", "beef-count",
    entry.withContent("11 portions"), entry.version());

This is appropriate for state that is updated frequently by multiple ensembles but where conflicts are rare. The optimistic assumption is that most writes will not conflict, so the overhead of locking is avoided in the common case.

Distributed locking

Each read or write acquires a distributed lock on the scope. Only one ensemble can read or write at a time. This provides the strongest consistency guarantee but the highest coordination cost.

This is appropriate for exclusive resources -- room assignments, equipment reservations, anything where concurrent access would cause correctness problems.

The lock implementation depends on the transport backing. With in-process transport, it is a ReentrantLock. With Kafka or Redis, it is a distributed lock (Redlock or similar).

Shared memory in the network configuration

Shared memory scopes are declared at the network level and automatically available to all ensembles in the network:

NetworkConfig config = NetworkConfig.builder()
    .ensemble("kitchen", "ws://kitchen:7329/ws")
    .ensemble("front-desk", "ws://front-desk:7329/ws")
    .sharedMemory("inventory", SharedMemory.builder()
        .store(MemoryStore.inMemory())
        .consistency(Consistency.EVENTUAL)
        .build())
    .sharedMemory("rooms", SharedMemory.builder()
        .store(MemoryStore.inMemory())
        .consistency(Consistency.LOCKED)
        .build())
    .build();

Each scope has a name that ensembles use to access it. The kitchen writes to "inventory"; the front desk reads from it. The consistency model is transparent to the application code -- the SharedMemory API is the same regardless of the consistency level.

When shared memory helps

Shared memory is useful when ensembles need to communicate state that does not fit the request/response pattern. Some examples:

Inventory tracking -- the kitchen updates inventory as it uses ingredients; room service reads inventory before making promises to guests
Shift context -- the front desk shares current occupancy, VIP arrivals, and special events; other ensembles use this context to adjust their behavior
Equipment status -- maintenance updates the status of equipment; the kitchen checks before starting tasks that require specific equipment

In each case, the state is written by one ensemble and read by others. The communication is asynchronous and does not require a direct request/response exchange.

When shared memory hurts

Shared memory is a distributed state coordination mechanism. It has the same problems as every distributed state coordination mechanism:

Consistency vs. availability tradeoff. Locked consistency blocks when the lock cannot be acquired (lock holder is down, network partition). Eventual consistency never blocks but may return stale data. There is no free lunch.

Debugging is harder. When an ensemble reads stale data, the bug manifests as incorrect behavior, not as an error message. Tracing why the kitchen thought there were 12 portions when there are actually 8 requires understanding the write/read timeline across multiple ensembles.

Scope proliferation. Without discipline, shared memory scopes multiply. Each scope is a coordination point that adds operational complexity. Prefer a small number of well-defined scopes over many narrow ones.

In-memory backing is not durable. The default MemoryStore.inMemory() loses data on process restart. For production, back it with a persistent store (Redis, a database, or a custom MemoryStore implementation).

The design principle

The useful insight is that shared state in agent networks is not monolithic. Different categories of state need different consistency guarantees, and forcing a single model is either too expensive or too weak.

Per-scope consistency selection lets you use eventual consistency for advisory state (low coordination cost, high availability), optimistic locking for frequently updated counters (low cost in the common case, retry on conflict), and distributed locks for exclusive resources (high coordination cost, strong guarantees).

The consistency model is a property of the data, not a property of the system. Choose it based on what happens when two ensembles access the same state concurrently.

Shared memory is part of AgentEnsemble. The shared memory guide covers the full API including consistency models and network configuration.

I'd be interested in how others handle shared state across agent systems -- whether you use explicit shared memory, pass context through task results, or avoid shared state entirely.

Federation for Agent Networks: Cross-Namespace Capability Sharing via Realms

mgd43b — Wed, 13 May 2026 14:00:00 +0000

Discovery lets ensembles find capabilities within a network. But in a real deployment, not every ensemble lives in the same namespace or even the same cluster. A hotel chain might run separate ensemble networks at each property, each in its own Kubernetes namespace, but want them to share spare capacity when one property is overloaded.

This is the federation problem: how do you extend capability discovery across trust and network boundaries without collapsing everything into one flat namespace?

Realms as trust boundaries

AgentEnsemble v3.0.0 introduces realms as the organizational unit for federation. A realm is a namespace-level discovery and trust boundary -- typically mapping to a Kubernetes namespace in production deployments.

FederationConfig federation = FederationConfig.builder()
    .localRealm("hotel-downtown")
    .federationName("Hotel Chain")
    .realm("hotel-airport", "hotel-airport-ns")
    .realm("hotel-beach", "hotel-beach-ns")
    .build();

Within a realm, ensembles discover each other freely. Cross-realm discovery requires explicit opt-in: an ensemble must advertise its capacity as shareable for other realms to use it.

This is a deliberate design choice. Discovery within a namespace is expected and low-risk. Discovery across namespaces is a capacity-sharing decision that should be explicit.

Capacity advertisement

Ensembles periodically broadcast their current load and availability using capacity updates:

{
  "type": "capacity_update",
  "ensemble": "kitchen",
  "realm": "hotel-downtown",
  "status": "available",
  "currentLoad": 0.35,
  "maxConcurrent": 100,
  "shareable": true
}

The shareable flag is the federation gate. When true, the ensemble's spare capacity is available to ensembles in other realms. When false, the ensemble only serves local requests.

The CapacityAdvertiser handles the periodic broadcasting:

CapacityAdvertiser advertiser = new CapacityAdvertiser(
    "kitchen",
    "hotel-downtown",
    () -> computeCurrentLoad(),
    100,                              // max concurrent
    true,                             // shareable to other realms
    message -> broadcast(message));

advertiser.start(Duration.ofSeconds(10));

Status is derived automatically from load: "available" when load is below 1.0, "busy" when at capacity.

The routing hierarchy

When an ensemble needs a capability, the FederationRegistry routes the request using a three-level hierarchy:

Priority	Scope	Condition
1 (highest)	Local realm	Provider is in the same realm
2	Same realm (unregistered)	Provider has no realm info (assumed local)
3 (lowest)	Cross-realm	Provider is in a different realm and `shareable = true`

Within each level, the least-loaded provider is preferred.

FederationRegistry registry = new FederationRegistry(capabilityRegistry);

// Find the best provider using the routing hierarchy
Optional<String> provider = registry.findProvider(
    "prepare-meal", "hotel-downtown");

The routing hierarchy encodes a simple operational principle: prefer local providers (lower latency, same failure domain), fall back to cross-realm providers when local capacity is insufficient.

Why this matters for agent systems

Federation solves a specific operational problem: agent networks that need to scale across deployment boundaries without becoming a single monolithic system.

Consider a hotel chain with three properties. Each property runs its own ensemble network (kitchen, front desk, maintenance, room service). Each network is self-contained and operates independently. But during peak events -- a conference at one property, a holiday weekend at another -- one property's kitchen may be overwhelmed while another has spare capacity.

Without federation, each property is an island. The overloaded kitchen queues requests or drops them. With federation, the overloaded kitchen's requests overflow to a kitchen in another realm that has advertised spare capacity.

The key constraint is that federation should be additive, not required. Each realm must work independently. Federation adds cross-realm routing as an optimization, not a dependency. If the federation link goes down, each realm continues operating on its own.

Network configuration

Enable federation at the network level:

NetworkConfig config = NetworkConfig.builder()
    .ensemble("kitchen", "ws://kitchen:7329/ws")
    .ensemble("maintenance", "ws://maintenance:7329/ws")
    .federationConfig(FederationConfig.builder()
        .localRealm("hotel-downtown")
        .federationName("Hotel Chain")
        .realm("hotel-airport", "hotel-airport-ns")
        .realm("hotel-beach", "hotel-beach-ns")
        .build())
    .build();

The FederationConfig is optional. Without it, the network operates in single-realm mode -- standard discovery without cross-realm routing.

Tradeoffs

Cross-realm latency. Requests routed to a different realm incur network latency that local requests do not. For agent workloads where task execution takes seconds or minutes, the routing overhead is negligible. For latency-sensitive workflows, the local-first routing hierarchy mitigates this -- cross-realm routing only happens when local capacity is insufficient.

Capacity staleness. Capacity updates are periodic (default: every 10 seconds). Between updates, the routing decisions are based on slightly stale data. An ensemble that was available 5 seconds ago might be at capacity now. The consequence is occasional request routing to busy providers, which queue the request rather than rejecting it.

Trust is implicit. Realm membership is declared in configuration, not enforced cryptographically. Any ensemble that claims to be in a realm is trusted. For environments where this matters, the transport layer needs authentication -- which is outside the scope of the federation layer itself.

Realm topology is static. The set of realms in a federation is declared at configuration time. You cannot add new realms at runtime without reconfiguring existing ensembles. For dynamic environments where namespaces are created and destroyed frequently, this requires a configuration management strategy.

Shareable is binary. An ensemble either shares all its spare capacity or none. There is no per-realm or per-capability sharing control. If you need more granular sharing policies, you would need to implement them in the capacity advertisement logic.

The design principle

The useful insight is that federation is a capacity-sharing problem, not a networking problem. The networking (WebSockets, Kafka) already works across boundaries. What federation adds is a policy layer: who can use whose spare capacity, and in what order.

Realms provide the organizational unit. Capacity advertisement provides the data. The routing hierarchy provides the policy. Together, they turn a collection of independent agent networks into a cooperative federation that shares spare capacity while maintaining operational independence.

Federation is part of AgentEnsemble. The federation guide covers the full API including capacity advertisement and realm configuration.

I'd be interested in whether others are dealing with multi-namespace agent deployments, and how they handle cross-boundary capability sharing.

Dynamic Discovery in Agent Networks: From Hardcoded Routes to Capability Catalogs

mgd43b — Mon, 11 May 2026 14:00:00 +0000

The simplest way to connect two agent ensembles is a direct reference: ensemble A knows ensemble B's address and calls it. This works when you have two or three ensembles with stable relationships.

It stops working when you have ten ensembles, or when ensembles come and go, or when the same capability is provided by multiple ensembles and you want the caller to use whichever one is available. At that point, you need discovery -- a way for ensembles to find capabilities without knowing in advance who provides them.

The static wiring problem

In a statically wired agent network, every cross-ensemble call requires knowing the provider's identity and address:

// Static: caller knows exactly who to call
NetworkTask mealTask = NetworkTask.of("kitchen", "prepare-meal",
    "ws://kitchen:7329/ws");

This creates coupling. If the kitchen ensemble moves to a different port, every caller needs updating. If you add a second kitchen for capacity, callers need load-balancing logic. If the kitchen goes down, callers need fallback logic.

The fundamental issue is that callers should care about what they need (a meal preparation capability), not who provides it or where it runs.

Capability advertisement with tags

AgentEnsemble v3.0.0 introduces capability discovery. Ensembles advertise their shared tasks and tools with optional tags, and other ensembles discover providers at runtime.

Advertising capabilities

When building an ensemble, declare what it shares with the network:

Ensemble kitchen = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.of("Manage kitchen operations"))
    .shareTool("check-inventory", inventoryTool, "food", "stock")
    .shareTask("prepare-meal", mealTask, "food", "cooking")
    .build();

kitchen.start(7329);

The shareTool and shareTask methods register capabilities in the network's capability registry. The trailing string arguments are tags -- metadata that classifies the capability for filtered discovery.

Discovering capabilities

Another ensemble can discover providers without knowing their identity:

// Discover by capability name
NetworkTool inventoryCheck = NetworkTool.discover("check-inventory", registry);

// Discover by tag
List<CapabilityInfo> foodCapabilities = registry.findByTag("food");

The registry returns the provider that currently offers the requested capability. If multiple providers offer the same capability, the registry can apply selection logic (round-robin, least-loaded, affinity-based).

Tag-based catalogs

Tags turn the capability registry into a searchable catalog. Rather than querying for specific capability names, you can query for categories:

// Find all capabilities tagged with "food"
List<CapabilityInfo> food = registry.findByTag("food");

// Find all capabilities tagged with both "food" and "stock"
List<CapabilityInfo> stockChecks = registry.findByTags("food", "stock");

Each CapabilityInfo includes the capability name, type (task or tool), provider ensemble name, and tags:

CapabilityInfo info = food.get(0);
String name = info.name();           // "check-inventory"
String type = info.type();           // "TOOL"
String provider = info.provider();   // "kitchen"
Set<String> tags = info.tags();      // ["food", "stock"]

This is useful for building dynamic agent systems where an orchestrating ensemble does not know in advance what capabilities are available. It can discover capabilities at runtime, filter by category, and wire them into its workflow dynamically.

The registry abstraction

The capability registry is part of the transport SPI, which means it has pluggable implementations:

Implementation	Backing	Use case
In-memory	`ConcurrentHashMap`	Development, testing
WebSocket-broadcast	Network messages	Multi-process, simple mode
Kafka-backed	Kafka topics	Production, durable

In development, capabilities are registered and discovered within a single process or across WebSocket connections. In production, the registry can be backed by Kafka for durability and horizontal scaling.

The application code that registers and discovers capabilities does not change between implementations.

Dynamic vs. static wiring

The choice between static and dynamic wiring is not binary. A practical network often uses both:

Static wiring for well-known, stable relationships (the front desk always calls the kitchen)
Dynamic discovery for capabilities that may be provided by different ensembles depending on deployment, capacity, or availability

// Static: known relationship
NetworkTask knownMealTask = NetworkTask.of("kitchen", "prepare-meal",
    "ws://kitchen:7329/ws");

// Dynamic: discover at runtime
NetworkTool discovered = NetworkTool.discover("check-inventory", registry);

The two approaches coexist. Static tasks bypass the registry entirely. Dynamic tasks use the registry for resolution. The agent using the task or tool does not know which approach was used to create it.

Capability lifecycle

Capabilities have a lifecycle that mirrors the ensemble lifecycle:

Registration -- when the ensemble starts and calls shareTask or shareTool
Discovery -- when other ensembles query the registry
Deregistration -- when the ensemble stops or the capability is removed

In simple mode, deregistration happens when the ensemble process exits and the in-memory registry is garbage collected. With WebSocket transport, the ensemble broadcasts a deregistration message on shutdown. With Kafka, a tombstone record is produced.

The lifecycle matters for production systems. A stale registry entry (pointing to an ensemble that no longer exists) causes request failures. The registry needs to handle stale entries, either through explicit deregistration, heartbeat-based expiry, or health-check-based cleanup.

Tradeoffs

Discovery adds a lookup step. Every dynamically discovered capability requires a registry query. In practice, this is cached -- the first lookup queries the registry, subsequent uses of the same capability reuse the resolved provider. But the initial resolution adds latency.

Tag semantics are convention-based. There is no schema for tags. If one ensemble tags a capability as "food" and another tags it as "cuisine", they will not discover each other. Tag conventions need to be agreed upon across teams.

Multiple providers create ambiguity. When two ensembles offer the same capability, the registry needs a selection strategy. The current implementation supports least-loaded selection (when capacity information is available), but more sophisticated strategies (affinity, cost-based, latency-based) would need to be built.

Registry availability is a dependency. If the registry is unavailable, dynamic discovery fails. Static wiring works regardless of registry state. For critical paths, consider falling back to static wiring when discovery is unavailable.

The design principle

The useful abstraction is separating what from who. An ensemble that needs a meal preparation capability should express that need ("I need prepare-meal") without specifying the provider ("specifically from the kitchen ensemble at ws://kitchen:7329/ws").

This separation enables the network to evolve. New providers can come online. Existing providers can be replaced. Capacity can be redistributed. The callers do not need to change.

Discovery is the mechanism. Tags make it searchable. The transport SPI makes it portable across deployment environments.

Capability discovery is part of AgentEnsemble. The discovery guide covers the full API including tag-based filtering.

I'd be interested in how others handle capability discovery in multi-agent systems -- whether you use service registries, hardcoded routes, or something else entirely.

Durable Transport for Agent Networks: Moving from In-Process Queues to Kafka

mgd43b — Sat, 09 May 2026 14:00:00 +0000

In-process queues are fine for development. They are fast, deterministic, and require zero infrastructure. But they have a property that becomes a liability in production: when the process dies, the queue contents disappear.

For agent networks that run as long-lived services -- handling work requests over hours or days -- losing queued requests on restart is not acceptable. The transport layer needs durability, and that means moving from in-process data structures to something that survives process failures.

What durability means for agent networks

An agent ensemble network has three communication patterns that need durable backing:

Work request delivery -- a request from one ensemble to another should not be lost if the receiving ensemble is temporarily unavailable
Response routing -- when an ensemble completes a request, the response needs to reach the original caller even if the caller restarted
Capability advertisement -- shared tasks and tools should remain discoverable across process restarts

Each of these has different durability requirements. Work requests are the most critical -- a lost request means lost work. Response routing needs correlation (matching responses to requests). Capability advertisement needs eventual consistency but not strict durability.

Kafka as the transport backing

The agentensemble-transport-kafka module implements the transport SPIs against Apache Kafka. All components share a single configuration:

KafkaTransportConfig config = KafkaTransportConfig.builder()
    .bootstrapServers("kafka:9092")
    .consumerGroupId("kitchen-ensemble")
    .topicPrefix("agentensemble.")
    .build();

Request queues

The KafkaRequestQueue produces work requests to a Kafka topic and consumes them with manual offset commits:

KafkaRequestQueue queue = KafkaRequestQueue.builder()
    .config(config)
    .ensembleName("kitchen")
    .build();

// Enqueue a work request (produces to Kafka)
queue.enqueue(workRequest);

// Poll for requests (consumes from Kafka)
Optional<WorkRequest> request = queue.poll(Duration.ofSeconds(5));

The topic name is derived from the ensemble name and prefix: agentensemble.kitchen.requests. Manual offset commits ensure that a request is only acknowledged after the ensemble has finished processing it. If the ensemble crashes mid-processing, the request will be redelivered on restart.

Delivery registry

The KafkaDeliveryRegistry tracks pending deliveries and routes responses back to callers:

KafkaDeliveryRegistry registry = KafkaDeliveryRegistry.builder()
    .config(config)
    .build();

// Register a pending delivery (before sending request)
CompletableFuture<String> future = registry.register(requestId);

// Complete the delivery when response arrives
registry.complete(requestId, responsePayload);

// Caller awaits the result
String response = future.get(30, TimeUnit.SECONDS);

The registry uses a Kafka topic for durability: pending deliveries are produced as records, and completions are produced as tombstones. On restart, the registry rebuilds its state by replaying the topic from the beginning.

Priority queues with aging

For workloads where some requests are more urgent than others, the PriorityRequestQueue adds priority levels with aging:

PriorityRequestQueue priorityQueue = PriorityRequestQueue.builder()
    .requestQueue(kafkaQueue)
    .levels(3)                    // 3 priority levels (0 = highest)
    .agingInterval(Duration.ofMinutes(5))
    .build();

// Enqueue with priority
priorityQueue.enqueue(urgentRequest, 0);   // highest priority
priorityQueue.enqueue(normalRequest, 1);   // normal priority
priorityQueue.enqueue(batchRequest, 2);    // lowest priority

Aging prevents starvation: requests that have waited longer than the aging interval are promoted to the next higher priority level. A batch request that has been waiting for 10 minutes (two aging intervals) gets promoted twice, eventually reaching the highest priority.

This is implemented as a layer on top of any RequestQueue implementation, so it works with both in-process and Kafka-backed queues.

What changes operationally

Moving from in-process to Kafka transport changes the operational profile of the ensemble network:

Startup behavior changes. With in-process queues, an ensemble starts with an empty queue. With Kafka, it may start with a backlog of unprocessed requests from before the restart. The ensemble needs to handle this gracefully -- processing the backlog before accepting new work, or processing both concurrently.

Failure modes change. In-process queue failures are process-fatal (if the process dies, the queue is gone). Kafka failures are infrastructure-level (broker unavailable, topic not found, authorization errors). The error handling needs to distinguish between transient failures (retry) and permanent failures (alert and skip).

Monitoring needs change. With in-process queues, queue depth is a simple counter. With Kafka, you need to monitor consumer lag, topic partition health, and broker connectivity. The ensemble's health check needs to include Kafka reachability.

Ordering semantics change. In-process queues provide strict FIFO. Kafka provides per-partition ordering, which means requests may be processed out of order if the topic has multiple partitions. For most agent workloads, this is fine -- requests are independent. But if your workflow depends on ordering, you need single-partition topics or application-level sequencing.

The configuration boundary

One design decision worth calling out: the Kafka transport configuration is separate from the ensemble configuration. The ensemble does not know it is using Kafka -- it interacts with the transport SPIs. The Kafka-specific configuration (bootstrap servers, consumer groups, topic prefixes) lives in the infrastructure layer.

// Infrastructure layer: Kafka-specific setup
KafkaTransportConfig kafkaConfig = KafkaTransportConfig.builder()
    .bootstrapServers("kafka:9092")
    .consumerGroupId("kitchen-ensemble")
    .build();

KafkaRequestQueue queue = KafkaRequestQueue.builder()
    .config(kafkaConfig)
    .ensembleName("kitchen")
    .build();

// Application layer: ensemble setup (transport-agnostic)
Ensemble kitchen = Ensemble.builder()
    .chatLanguageModel(model)
    .task(Task.of("Manage kitchen operations"))
    .requestQueue(queue)
    .build();

This separation means the same ensemble code works in development (with in-process queues) and production (with Kafka) without changes. The transport choice is an infrastructure decision, not an application decision.

Tradeoffs

Operational complexity. Kafka is infrastructure that needs to be provisioned, monitored, and maintained. For small deployments, the operational overhead may not be justified. The in-process transport with periodic state snapshots might be a simpler alternative.

Latency. Kafka adds millisecond-scale latency to every request delivery. For agent workloads where task execution takes seconds or minutes, this is negligible. For sub-second workflows, it may not be.

Topic proliferation. Each ensemble gets its own request topic. A network of 20 ensembles means 20+ Kafka topics. This is manageable but requires topic lifecycle management (creation, cleanup, retention policies).

Exactly-once is hard. The current implementation provides at-least-once delivery. A request may be processed twice if the ensemble crashes after completing the work but before committing the offset. For most agent workloads (which are non-deterministic anyway), this is acceptable. For workloads that require exactly-once, additional deduplication logic is needed.

When to use durable transport

The decision is straightforward:

Development and testing: in-process transport. Zero setup, fast, deterministic.
Single-node production: in-process transport with periodic state persistence. Simple, no external dependencies.
Multi-node production: Kafka transport. Durability, horizontal scaling, replay capability.
Edge or embedded: in-process transport. No infrastructure dependency.

The transport SPI lets you make this decision per-deployment without changing application code.

The Kafka transport module is part of AgentEnsemble. The durable transport guide covers the full configuration and operational details.

I'd be interested in whether others are using Kafka (or similar) for agent-to-agent communication, and what delivery guarantee level they find sufficient in practice.