<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: mgd43b</title>
    <description>The latest articles on DEV Community by mgd43b (@mgd43b).</description>
    <link>https://dev.to/mgd43b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817656%2Faab70ef8-8405-452d-aec5-004f74c0316b.png</url>
      <title>DEV Community: mgd43b</title>
      <link>https://dev.to/mgd43b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mgd43b"/>
    <language>en</language>
    <item>
      <title>Dynamic Discovery in Agent Networks: From Hardcoded Routes to Capability Catalogs</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Mon, 11 May 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/dynamic-discovery-in-agent-networks-from-hardcoded-routes-to-capability-catalogs-1mh</link>
      <guid>https://dev.to/agentensemble/dynamic-discovery-in-agent-networks-from-hardcoded-routes-to-capability-catalogs-1mh</guid>
      <description>&lt;p&gt;The simplest way to connect two agent ensembles is a direct reference: ensemble A knows ensemble B's address and calls it. This works when you have two or three ensembles with stable relationships.&lt;/p&gt;

&lt;p&gt;It stops working when you have ten ensembles, or when ensembles come and go, or when the same capability is provided by multiple ensembles and you want the caller to use whichever one is available. At that point, you need discovery -- a way for ensembles to find capabilities without knowing in advance who provides them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The static wiring problem
&lt;/h2&gt;

&lt;p&gt;In a statically wired agent network, every cross-ensemble call requires knowing the provider's identity and address:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Static: caller knows exactly who to call&lt;/span&gt;
&lt;span class="nc"&gt;NetworkTask&lt;/span&gt; &lt;span class="n"&gt;mealTask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NetworkTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"prepare-meal"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"ws://kitchen:7329/ws"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates coupling. If the kitchen ensemble moves to a different port, every caller needs updating. If you add a second kitchen for capacity, callers need load-balancing logic. If the kitchen goes down, callers need fallback logic.&lt;/p&gt;

&lt;p&gt;The fundamental issue is that callers should care about &lt;em&gt;what&lt;/em&gt; they need (a meal preparation capability), not &lt;em&gt;who&lt;/em&gt; provides it or &lt;em&gt;where&lt;/em&gt; it runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability advertisement with tags
&lt;/h2&gt;

&lt;p&gt;AgentEnsemble v3.0.0 introduces capability discovery. Ensembles advertise their shared tasks and tools with optional tags, and other ensembles discover providers at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advertising capabilities
&lt;/h3&gt;

&lt;p&gt;When building an ensemble, declare what it shares with the network:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Ensemble&lt;/span&gt; &lt;span class="n"&gt;kitchen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatLanguageModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Manage kitchen operations"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;shareTool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"check-inventory"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inventoryTool&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"food"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"stock"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;shareTask&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"prepare-meal"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mealTask&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"food"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"cooking"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;kitchen&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7329&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;shareTool&lt;/code&gt; and &lt;code&gt;shareTask&lt;/code&gt; methods register capabilities in the network's capability registry. The trailing string arguments are tags -- metadata that classifies the capability for filtered discovery.&lt;/p&gt;

&lt;h3&gt;
  
  
  Discovering capabilities
&lt;/h3&gt;

&lt;p&gt;Another ensemble can discover providers without knowing their identity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Discover by capability name&lt;/span&gt;
&lt;span class="nc"&gt;NetworkTool&lt;/span&gt; &lt;span class="n"&gt;inventoryCheck&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NetworkTool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;discover&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"check-inventory"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Discover by tag&lt;/span&gt;
&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CapabilityInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;foodCapabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findByTag&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"food"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The registry returns the provider that currently offers the requested capability. If multiple providers offer the same capability, the registry can apply selection logic (round-robin, least-loaded, affinity-based).&lt;/p&gt;

&lt;h2&gt;
  
  
  Tag-based catalogs
&lt;/h2&gt;

&lt;p&gt;Tags turn the capability registry into a searchable catalog. Rather than querying for specific capability names, you can query for categories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Find all capabilities tagged with "food"&lt;/span&gt;
&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CapabilityInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;food&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findByTag&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"food"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Find all capabilities tagged with both "food" and "stock"&lt;/span&gt;
&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CapabilityInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;stockChecks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findByTags&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"food"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"stock"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;CapabilityInfo&lt;/code&gt; includes the capability name, type (task or tool), provider ensemble name, and tags:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;CapabilityInfo&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;food&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;           &lt;span class="c1"&gt;// "check-inventory"&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;           &lt;span class="c1"&gt;// "TOOL"&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;   &lt;span class="c1"&gt;// "kitchen"&lt;/span&gt;
&lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;      &lt;span class="c1"&gt;// ["food", "stock"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful for building dynamic agent systems where an orchestrating ensemble does not know in advance what capabilities are available. It can discover capabilities at runtime, filter by category, and wire them into its workflow dynamically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The registry abstraction
&lt;/h2&gt;

&lt;p&gt;The capability registry is part of the transport SPI, which means it has pluggable implementations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;th&gt;Backing&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;In-memory&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ConcurrentHashMap&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Development, testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebSocket-broadcast&lt;/td&gt;
&lt;td&gt;Network messages&lt;/td&gt;
&lt;td&gt;Multi-process, simple mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka-backed&lt;/td&gt;
&lt;td&gt;Kafka topics&lt;/td&gt;
&lt;td&gt;Production, durable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In development, capabilities are registered and discovered within a single process or across WebSocket connections. In production, the registry can be backed by Kafka for durability and horizontal scaling.&lt;/p&gt;

&lt;p&gt;The application code that registers and discovers capabilities does not change between implementations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynamic vs. static wiring
&lt;/h2&gt;

&lt;p&gt;The choice between static and dynamic wiring is not binary. A practical network often uses both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static wiring&lt;/strong&gt; for well-known, stable relationships (the front desk always calls the kitchen)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic discovery&lt;/strong&gt; for capabilities that may be provided by different ensembles depending on deployment, capacity, or availability
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Static: known relationship&lt;/span&gt;
&lt;span class="nc"&gt;NetworkTask&lt;/span&gt; &lt;span class="n"&gt;knownMealTask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NetworkTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"prepare-meal"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"ws://kitchen:7329/ws"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Dynamic: discover at runtime&lt;/span&gt;
&lt;span class="nc"&gt;NetworkTool&lt;/span&gt; &lt;span class="n"&gt;discovered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NetworkTool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;discover&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"check-inventory"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two approaches coexist. Static tasks bypass the registry entirely. Dynamic tasks use the registry for resolution. The agent using the task or tool does not know which approach was used to create it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability lifecycle
&lt;/h2&gt;

&lt;p&gt;Capabilities have a lifecycle that mirrors the ensemble lifecycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Registration&lt;/strong&gt; -- when the ensemble starts and calls &lt;code&gt;shareTask&lt;/code&gt; or &lt;code&gt;shareTool&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery&lt;/strong&gt; -- when other ensembles query the registry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deregistration&lt;/strong&gt; -- when the ensemble stops or the capability is removed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In simple mode, deregistration happens when the ensemble process exits and the in-memory registry is garbage collected. With WebSocket transport, the ensemble broadcasts a deregistration message on shutdown. With Kafka, a tombstone record is produced.&lt;/p&gt;

&lt;p&gt;The lifecycle matters for production systems. A stale registry entry (pointing to an ensemble that no longer exists) causes request failures. The registry needs to handle stale entries, either through explicit deregistration, heartbeat-based expiry, or health-check-based cleanup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Discovery adds a lookup step.&lt;/strong&gt; Every dynamically discovered capability requires a registry query. In practice, this is cached -- the first lookup queries the registry, subsequent uses of the same capability reuse the resolved provider. But the initial resolution adds latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tag semantics are convention-based.&lt;/strong&gt; There is no schema for tags. If one ensemble tags a capability as "food" and another tags it as "cuisine", they will not discover each other. Tag conventions need to be agreed upon across teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multiple providers create ambiguity.&lt;/strong&gt; When two ensembles offer the same capability, the registry needs a selection strategy. The current implementation supports least-loaded selection (when capacity information is available), but more sophisticated strategies (affinity, cost-based, latency-based) would need to be built.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Registry availability is a dependency.&lt;/strong&gt; If the registry is unavailable, dynamic discovery fails. Static wiring works regardless of registry state. For critical paths, consider falling back to static wiring when discovery is unavailable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design principle
&lt;/h2&gt;

&lt;p&gt;The useful abstraction is separating &lt;em&gt;what&lt;/em&gt; from &lt;em&gt;who&lt;/em&gt;. An ensemble that needs a meal preparation capability should express that need ("I need prepare-meal") without specifying the provider ("specifically from the kitchen ensemble at ws://kitchen:7329/ws").&lt;/p&gt;

&lt;p&gt;This separation enables the network to evolve. New providers can come online. Existing providers can be replaced. Capacity can be redistributed. The callers do not need to change.&lt;/p&gt;

&lt;p&gt;Discovery is the mechanism. Tags make it searchable. The transport SPI makes it portable across deployment environments.&lt;/p&gt;




&lt;p&gt;Capability discovery is part of &lt;a href="https://github.com/AgentEnsemble/agentensemble" rel="noopener noreferrer"&gt;AgentEnsemble&lt;/a&gt;. The &lt;a href="https://agentensemble.net/guides/discovery/" rel="noopener noreferrer"&gt;discovery guide&lt;/a&gt; covers the full API including tag-based filtering.&lt;/p&gt;

&lt;p&gt;I'd be interested in how others handle capability discovery in multi-agent systems -- whether you use service registries, hardcoded routes, or something else entirely.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Durable Transport for Agent Networks: Moving from In-Process Queues to Kafka</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Sat, 09 May 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/durable-transport-for-agent-networks-moving-from-in-process-queues-to-kafka-43bi</link>
      <guid>https://dev.to/agentensemble/durable-transport-for-agent-networks-moving-from-in-process-queues-to-kafka-43bi</guid>
      <description>&lt;p&gt;In-process queues are fine for development. They are fast, deterministic, and require zero infrastructure. But they have a property that becomes a liability in production: when the process dies, the queue contents disappear.&lt;/p&gt;

&lt;p&gt;For agent networks that run as long-lived services -- handling work requests over hours or days -- losing queued requests on restart is not acceptable. The transport layer needs durability, and that means moving from in-process data structures to something that survives process failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  What durability means for agent networks
&lt;/h2&gt;

&lt;p&gt;An agent ensemble network has three communication patterns that need durable backing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Work request delivery&lt;/strong&gt; -- a request from one ensemble to another should not be lost if the receiving ensemble is temporarily unavailable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response routing&lt;/strong&gt; -- when an ensemble completes a request, the response needs to reach the original caller even if the caller restarted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability advertisement&lt;/strong&gt; -- shared tasks and tools should remain discoverable across process restarts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each of these has different durability requirements. Work requests are the most critical -- a lost request means lost work. Response routing needs correlation (matching responses to requests). Capability advertisement needs eventual consistency but not strict durability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kafka as the transport backing
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;agentensemble-transport-kafka&lt;/code&gt; module implements the transport SPIs against Apache Kafka. All components share a single configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;KafkaTransportConfig&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KafkaTransportConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;bootstrapServers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kafka:9092"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;consumerGroupId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen-ensemble"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;topicPrefix&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"agentensemble."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Request queues
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;KafkaRequestQueue&lt;/code&gt; produces work requests to a Kafka topic and consumes them with manual offset commits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;KafkaRequestQueue&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KafkaRequestQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensembleName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Enqueue a work request (produces to Kafka)&lt;/span&gt;
&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;enqueue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workRequest&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Poll for requests (consumes from Kafka)&lt;/span&gt;
&lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;WorkRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The topic name is derived from the ensemble name and prefix: &lt;code&gt;agentensemble.kitchen.requests&lt;/code&gt;. Manual offset commits ensure that a request is only acknowledged after the ensemble has finished processing it. If the ensemble crashes mid-processing, the request will be redelivered on restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  Delivery registry
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;KafkaDeliveryRegistry&lt;/code&gt; tracks pending deliveries and routes responses back to callers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;KafkaDeliveryRegistry&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KafkaDeliveryRegistry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Register a pending delivery (before sending request)&lt;/span&gt;
&lt;span class="nc"&gt;CompletableFuture&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;register&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requestId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Complete the delivery when response arrives&lt;/span&gt;
&lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requestId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;responsePayload&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Caller awaits the result&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;SECONDS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The registry uses a Kafka topic for durability: pending deliveries are produced as records, and completions are produced as tombstones. On restart, the registry rebuilds its state by replaying the topic from the beginning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Priority queues with aging
&lt;/h3&gt;

&lt;p&gt;For workloads where some requests are more urgent than others, the &lt;code&gt;PriorityRequestQueue&lt;/code&gt; adds priority levels with aging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;PriorityRequestQueue&lt;/span&gt; &lt;span class="n"&gt;priorityQueue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PriorityRequestQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;requestQueue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaQueue&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;levels&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                    &lt;span class="c1"&gt;// 3 priority levels (0 = highest)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;agingInterval&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMinutes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Enqueue with priority&lt;/span&gt;
&lt;span class="n"&gt;priorityQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;enqueue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urgentRequest&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// highest priority&lt;/span&gt;
&lt;span class="n"&gt;priorityQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;enqueue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalRequest&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// normal priority&lt;/span&gt;
&lt;span class="n"&gt;priorityQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;enqueue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batchRequest&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// lowest priority&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aging prevents starvation: requests that have waited longer than the aging interval are promoted to the next higher priority level. A batch request that has been waiting for 10 minutes (two aging intervals) gets promoted twice, eventually reaching the highest priority.&lt;/p&gt;

&lt;p&gt;This is implemented as a layer on top of any &lt;code&gt;RequestQueue&lt;/code&gt; implementation, so it works with both in-process and Kafka-backed queues.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changes operationally
&lt;/h2&gt;

&lt;p&gt;Moving from in-process to Kafka transport changes the operational profile of the ensemble network:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Startup behavior changes.&lt;/strong&gt; With in-process queues, an ensemble starts with an empty queue. With Kafka, it may start with a backlog of unprocessed requests from before the restart. The ensemble needs to handle this gracefully -- processing the backlog before accepting new work, or processing both concurrently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure modes change.&lt;/strong&gt; In-process queue failures are process-fatal (if the process dies, the queue is gone). Kafka failures are infrastructure-level (broker unavailable, topic not found, authorization errors). The error handling needs to distinguish between transient failures (retry) and permanent failures (alert and skip).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring needs change.&lt;/strong&gt; With in-process queues, queue depth is a simple counter. With Kafka, you need to monitor consumer lag, topic partition health, and broker connectivity. The ensemble's health check needs to include Kafka reachability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ordering semantics change.&lt;/strong&gt; In-process queues provide strict FIFO. Kafka provides per-partition ordering, which means requests may be processed out of order if the topic has multiple partitions. For most agent workloads, this is fine -- requests are independent. But if your workflow depends on ordering, you need single-partition topics or application-level sequencing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The configuration boundary
&lt;/h2&gt;

&lt;p&gt;One design decision worth calling out: the Kafka transport configuration is separate from the ensemble configuration. The ensemble does not know it is using Kafka -- it interacts with the transport SPIs. The Kafka-specific configuration (bootstrap servers, consumer groups, topic prefixes) lives in the infrastructure layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Infrastructure layer: Kafka-specific setup&lt;/span&gt;
&lt;span class="nc"&gt;KafkaTransportConfig&lt;/span&gt; &lt;span class="n"&gt;kafkaConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KafkaTransportConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;bootstrapServers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kafka:9092"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;consumerGroupId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen-ensemble"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;KafkaRequestQueue&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KafkaRequestQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaConfig&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensembleName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Application layer: ensemble setup (transport-agnostic)&lt;/span&gt;
&lt;span class="nc"&gt;Ensemble&lt;/span&gt; &lt;span class="n"&gt;kitchen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatLanguageModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Manage kitchen operations"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;requestQueue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation means the same ensemble code works in development (with in-process queues) and production (with Kafka) without changes. The transport choice is an infrastructure decision, not an application decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Operational complexity.&lt;/strong&gt; Kafka is infrastructure that needs to be provisioned, monitored, and maintained. For small deployments, the operational overhead may not be justified. The in-process transport with periodic state snapshots might be a simpler alternative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency.&lt;/strong&gt; Kafka adds millisecond-scale latency to every request delivery. For agent workloads where task execution takes seconds or minutes, this is negligible. For sub-second workflows, it may not be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topic proliferation.&lt;/strong&gt; Each ensemble gets its own request topic. A network of 20 ensembles means 20+ Kafka topics. This is manageable but requires topic lifecycle management (creation, cleanup, retention policies).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exactly-once is hard.&lt;/strong&gt; The current implementation provides at-least-once delivery. A request may be processed twice if the ensemble crashes after completing the work but before committing the offset. For most agent workloads (which are non-deterministic anyway), this is acceptable. For workloads that require exactly-once, additional deduplication logic is needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use durable transport
&lt;/h2&gt;

&lt;p&gt;The decision is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Development and testing:&lt;/strong&gt; in-process transport. Zero setup, fast, deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-node production:&lt;/strong&gt; in-process transport with periodic state persistence. Simple, no external dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-node production:&lt;/strong&gt; Kafka transport. Durability, horizontal scaling, replay capability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge or embedded:&lt;/strong&gt; in-process transport. No infrastructure dependency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The transport SPI lets you make this decision per-deployment without changing application code.&lt;/p&gt;




&lt;p&gt;The Kafka transport module is part of &lt;a href="https://github.com/AgentEnsemble/agentensemble" rel="noopener noreferrer"&gt;AgentEnsemble&lt;/a&gt;. The &lt;a href="https://agentensemble.net/guides/durable-transport/" rel="noopener noreferrer"&gt;durable transport guide&lt;/a&gt; covers the full configuration and operational details.&lt;/p&gt;

&lt;p&gt;I'd be interested in whether others are using Kafka (or similar) for agent-to-agent communication, and what delivery guarantee level they find sufficient in practice.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Transport SPI: Making Agent Network Infrastructure Pluggable</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Thu, 07 May 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/transport-spi-making-agent-network-infrastructure-pluggable-26pg</link>
      <guid>https://dev.to/agentensemble/transport-spi-making-agent-network-infrastructure-pluggable-26pg</guid>
      <description>&lt;p&gt;When agent ensembles become long-running services that communicate over a network, the communication layer becomes infrastructure. And infrastructure has a property that application code should not: it varies by deployment environment.&lt;/p&gt;

&lt;p&gt;Development uses in-process queues. Staging might use Redis. Production runs Kafka. The application code -- the agents, tasks, workflows -- should not change between these environments. The question is where to draw the abstraction line.&lt;/p&gt;

&lt;h2&gt;
  
  
  The transport problem
&lt;/h2&gt;

&lt;p&gt;An ensemble network needs several communication primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Request queues&lt;/strong&gt; -- how work requests arrive at an ensemble&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delivery registries&lt;/strong&gt; -- how responses get routed back to the requester&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability registries&lt;/strong&gt; -- how ensembles advertise and discover shared tasks and tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capacity tracking&lt;/strong&gt; -- how ensembles report their current load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these has a natural in-process implementation (maps, queues, lists) and at least one distributed implementation (Kafka topics, Redis streams, service registries). If these are hardcoded to a specific backing store, every deployment environment change requires code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SPI design
&lt;/h2&gt;

&lt;p&gt;AgentEnsemble defines transport as a set of Java interfaces -- a Service Provider Interface -- with pluggable implementations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The transport factory -- entry point for all transport primitives&lt;/span&gt;
&lt;span class="nc"&gt;Transport&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Transport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;websocket&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Or for production with delivery guarantees&lt;/span&gt;
&lt;span class="nc"&gt;Transport&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Transport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;simple&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deliveryRegistry&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Transport&lt;/code&gt; interface provides access to the individual primitives:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Primitive&lt;/th&gt;
&lt;th&gt;Interface&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Request queue&lt;/td&gt;
&lt;td&gt;&lt;code&gt;RequestQueue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Inbound work request buffering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delivery registry&lt;/td&gt;
&lt;td&gt;&lt;code&gt;DeliveryRegistry&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Response routing back to callers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Capability registry&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CapabilityRegistry&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shared task/tool advertisement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each interface has a simple contract. &lt;code&gt;RequestQueue&lt;/code&gt;, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;RequestQueue&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WorkRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;WorkRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The in-process implementation uses a &lt;code&gt;LinkedBlockingQueue&lt;/code&gt;. The Kafka implementation produces to a topic and consumes with manual offset commits. Same interface, different backing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple mode for development
&lt;/h2&gt;

&lt;p&gt;The default transport uses in-process data structures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Transport&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Transport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;websocket&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you a working ensemble network with WebSocket connections between ensembles, backed by in-process queues and maps. It is fast, requires no external infrastructure, and is appropriate for development and testing.&lt;/p&gt;

&lt;p&gt;For local development that needs delivery tracking (ensuring responses reach their intended recipients), use the simple transport with a delivery registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;DeliveryRegistry&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;InMemoryDeliveryRegistry&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="nc"&gt;Transport&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Transport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;simple&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why this matters for agent systems
&lt;/h2&gt;

&lt;p&gt;The transport SPI is not unusual as an architectural pattern -- it is a standard dependency inversion. What makes it interesting in the agent context is what it enables.&lt;/p&gt;

&lt;p&gt;Agent networks are inherently non-deterministic. Agents take variable time, produce variable output, and may fail in unpredictable ways. Adding infrastructure variability on top of that makes the system harder to reason about.&lt;/p&gt;

&lt;p&gt;By isolating transport from application logic, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Test with in-process transport&lt;/strong&gt; -- no containers, no network, deterministic ordering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Develop locally with WebSocket transport&lt;/strong&gt; -- real network behavior, zero infrastructure setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy to production with Kafka&lt;/strong&gt; -- durability, horizontal scaling, replay capability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch between environments&lt;/strong&gt; -- without touching agent code, task definitions, or workflow configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent code does not know whether its work requests arrive from an in-process queue or a Kafka topic. It processes them the same way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The capability registry
&lt;/h2&gt;

&lt;p&gt;One of the more interesting transport primitives is the capability registry. When an ensemble shares a task or tool on the network, that capability needs to be discoverable by other ensembles.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Ensemble advertises capabilities&lt;/span&gt;
&lt;span class="nc"&gt;CapabilityRegistry&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;capabilityRegistry&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;register&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"prepare-meal"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;CapabilityType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;TASK&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;register&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"check-inventory"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;CapabilityType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;TOOL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Another ensemble discovers capabilities&lt;/span&gt;
&lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findProvider&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"prepare-meal"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In simple mode, this is an in-memory map. In production, it could be backed by a service registry, a shared database, or Kafka's consumer group protocol. The application code that registers and discovers capabilities does not change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Abstraction leaks.&lt;/strong&gt; In-process queues have different ordering and delivery guarantees than Kafka topics. The SPI abstracts the interface but cannot fully abstract the semantics. Code that depends on strict FIFO ordering will behave differently with Kafka partitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration complexity.&lt;/strong&gt; Each transport implementation has its own configuration (bootstrap servers, consumer groups, topic prefixes). The SPI does not unify configuration -- you still need environment-specific setup for each backing store.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance characteristics vary.&lt;/strong&gt; In-process queues are nanosecond-scale. Kafka adds millisecond-scale latency. If your agent workflow is latency-sensitive, the transport choice matters and the abstraction cannot hide that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not all primitives are equally portable.&lt;/strong&gt; Request queues map cleanly to most message systems. Delivery registries (correlating responses to specific requests) are harder to implement efficiently in some message brokers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design principle
&lt;/h2&gt;

&lt;p&gt;The useful insight is that agent network communication has a small number of well-defined primitives (request queuing, response routing, capability registration), and these primitives have natural implementations at every scale (in-process, single-node, distributed).&lt;/p&gt;

&lt;p&gt;Rather than building the network layer directly on top of a specific infrastructure choice, defining the primitives as interfaces lets the infrastructure decision be made at deployment time rather than at development time.&lt;/p&gt;

&lt;p&gt;This is standard dependency inversion. It is not novel. But it is the foundation that makes everything else in the ensemble network possible -- durable transport, discovery, federation, and capacity management all build on these same interfaces.&lt;/p&gt;




&lt;p&gt;The transport SPI is part of &lt;a href="https://github.com/AgentEnsemble/agentensemble" rel="noopener noreferrer"&gt;AgentEnsemble&lt;/a&gt;. The &lt;a href="https://agentensemble.net/guides/durable-transport/" rel="noopener noreferrer"&gt;durable transport guide&lt;/a&gt; covers the Kafka implementation in detail.&lt;/p&gt;

&lt;p&gt;I'd be interested in what transport backends others are using for agent-to-agent communication, and whether the primitive set (request queue, delivery registry, capability registry) feels complete or whether there are missing pieces.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Bridging MCP into Java Agent Systems: Reusing the Tool Ecosystem Without Leaving the JVM</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Tue, 05 May 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/bridging-mcp-into-java-agent-systems-reusing-the-tool-ecosystem-without-leaving-the-jvm-211</link>
      <guid>https://dev.to/agentensemble/bridging-mcp-into-java-agent-systems-reusing-the-tool-ecosystem-without-leaving-the-jvm-211</guid>
      <description>&lt;p&gt;The Model Context Protocol has created a growing ecosystem of tool servers -- filesystem operations, git integration, database access, API connectors. Most of these servers are written in TypeScript and communicate over stdio or SSE.&lt;/p&gt;

&lt;p&gt;If you are building agent systems on the JVM, you face a choice: rewrite every tool in Java, or find a way to use what already exists. The useful answer is usually both -- and the bridge between them needs to be clean enough that the rest of your system does not care which approach a particular tool uses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The integration problem
&lt;/h2&gt;

&lt;p&gt;MCP servers expose tools through a well-defined protocol. LangChain4j (which AgentEnsemble builds on) already has MCP client support via &lt;code&gt;McpClient&lt;/code&gt; and &lt;code&gt;McpToolProvider&lt;/code&gt;. But there is a gap: LangChain4j's MCP integration produces tools for its &lt;code&gt;AiServices&lt;/code&gt; abstraction, not for AgentEnsemble's &lt;code&gt;AgentTool&lt;/code&gt; interface.&lt;/p&gt;

&lt;p&gt;The bridge needs to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connect to any MCP server (stdio or SSE transport)&lt;/li&gt;
&lt;li&gt;Discover available tools from the server&lt;/li&gt;
&lt;li&gt;Adapt each MCP tool to the &lt;code&gt;AgentTool&lt;/code&gt; interface&lt;/li&gt;
&lt;li&gt;Manage the server subprocess lifecycle&lt;/li&gt;
&lt;li&gt;Allow MCP tools and Java-native tools to coexist in the same agent's tool list&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  McpToolFactory
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;agentensemble-mcp&lt;/code&gt; module provides &lt;code&gt;McpToolFactory&lt;/code&gt; as the primary entry point. Connect to any MCP-compatible server and get back standard &lt;code&gt;AgentTool&lt;/code&gt; instances:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StdioMcpTransport&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioMcpTransport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"npx"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"--yes"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"@modelcontextprotocol/server-filesystem"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/workspace"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromServer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;Agent&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"File analyst"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Analyze project structure"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The factory connects to the server, enumerates its tools, and wraps each one as an &lt;code&gt;McpAgentTool&lt;/code&gt;. Because MCP tools already have typed parameter schemas, the wrapper passes those schemas through to LangChain4j's &lt;code&gt;ToolSpecification&lt;/code&gt; directly -- no intermediate Java record needed.&lt;/p&gt;

&lt;p&gt;You can also filter to specific tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromServer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"read_file"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"search_files"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"directory_tree"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful when a server exposes tools you do not want the agent to have access to -- write operations, for instance, when the agent should only read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Convenience factories for common servers
&lt;/h2&gt;

&lt;p&gt;The two most common MCP servers for coding workflows are the filesystem and git reference servers. &lt;code&gt;McpToolFactory&lt;/code&gt; provides convenience methods that handle the subprocess setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;McpServerLifecycle&lt;/span&gt; &lt;span class="n"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filesystem&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;projectDir&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;McpServerLifecycle&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;git&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;projectDir&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;allTools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayList&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
    &lt;span class="n"&gt;allTools&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addAll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;allTools&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addAll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="c1"&gt;// Use allTools in any agent&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The filesystem server provides: &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;write_file&lt;/code&gt;, &lt;code&gt;edit_file&lt;/code&gt;, &lt;code&gt;search_files&lt;/code&gt;, &lt;code&gt;list_directory&lt;/code&gt;, &lt;code&gt;directory_tree&lt;/code&gt;, &lt;code&gt;get_file_info&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The git server provides: &lt;code&gt;git_status&lt;/code&gt;, &lt;code&gt;git_diff_unstaged&lt;/code&gt;, &lt;code&gt;git_diff_staged&lt;/code&gt;, &lt;code&gt;git_diff&lt;/code&gt;, &lt;code&gt;git_commit&lt;/code&gt;, &lt;code&gt;git_add&lt;/code&gt;, &lt;code&gt;git_log&lt;/code&gt;, &lt;code&gt;git_branch&lt;/code&gt;, &lt;code&gt;git_create_branch&lt;/code&gt;, &lt;code&gt;git_checkout&lt;/code&gt;, &lt;code&gt;git_show&lt;/code&gt;, &lt;code&gt;git_reset&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lifecycle management
&lt;/h2&gt;

&lt;p&gt;MCP servers run as subprocesses. If you do not shut them down, you leak processes. &lt;code&gt;McpServerLifecycle&lt;/code&gt; implements &lt;code&gt;AutoCloseable&lt;/code&gt; so try-with-resources handles cleanup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;McpServerLifecycle&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filesystem&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dir&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="c1"&gt;// Use server.tools() ...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="c1"&gt;// server is shut down here, subprocess is killed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For long-running ensembles, &lt;code&gt;McpServerLifecycle&lt;/code&gt; also integrates with the ensemble's lifecycle listener. When the ensemble stops, any attached MCP servers are shut down automatically.&lt;/p&gt;

&lt;p&gt;The lifecycle object exposes health checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isAlive&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Mixing MCP and Java-native tools
&lt;/h2&gt;

&lt;p&gt;The most practical pattern is combining MCP tools with Java-native tools in the same agent. MCP provides the filesystem and git operations; Java-native tools handle domain-specific logic, calculations, or API calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;McpServerLifecycle&lt;/span&gt; &lt;span class="n"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filesystem&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;projectDir&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="nc"&gt;Agent&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Code reviewer"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Review code changes and check style compliance"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;                    &lt;span class="c1"&gt;// MCP filesystem tools&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;                       &lt;span class="c1"&gt;// Java-native tools&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;StyleCheckerTool&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;MetricsCalculatorTool&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both tool types implement the same &lt;code&gt;AgentTool&lt;/code&gt; interface. The agent sees a flat list of tools with names and descriptions. It does not know or care which ones are backed by an MCP subprocess and which are pure Java.&lt;/p&gt;

&lt;p&gt;This composability is the point. You can start with MCP servers for rapid capability acquisition, then replace individual tools with Java implementations when you need more control, better performance, or fewer runtime dependencies -- without changing the agent configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The adapter pattern
&lt;/h2&gt;

&lt;p&gt;Under the hood, each MCP tool is wrapped in an &lt;code&gt;McpAgentTool&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;McpAgentTool&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;AgentTool&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;McpClient&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;toolDescription&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;JsonObjectSchema&lt;/span&gt; &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;description&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;toolDescription&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Parse input JSON, call client.executeTool(), wrap result&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The adapter preserves the MCP tool's name, description, and parameter schema. The parameter schema flows through to the LLM's function-calling interface, so the model sees the same tool signature regardless of whether the tool is MCP-backed or Java-native.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting to custom MCP servers
&lt;/h2&gt;

&lt;p&gt;Any MCP-compatible server works -- not just the reference implementations. If you have a custom server that exposes domain-specific tools (database queries, API operations, internal services), connect it the same way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Custom MCP server over stdio&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StdioMcpTransport&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioMcpTransport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"python"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-m"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"my_custom_mcp_server"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromServer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Use tools...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SSE transport works similarly for remote servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SseMcpTransport&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SseMcpTransport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sseUrl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://mcp-server:8080/sse"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromServer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Subprocess overhead.&lt;/strong&gt; Each MCP server is a separate process. For the reference servers, this means Node.js must be installed. The startup cost is measurable (typically 1-2 seconds for &lt;code&gt;npx&lt;/code&gt; to resolve and start the server). For long-running agents, this is negligible; for one-shot scripts, it adds latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debugging across process boundaries.&lt;/strong&gt; When an MCP tool fails, the error comes back as a string from the subprocess. You lose Java stack traces and structured exception types. The bridge logs tool inputs and outputs at DEBUG level, but cross-process debugging is inherently harder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema fidelity.&lt;/strong&gt; MCP tool schemas are JSON Schema. The bridge passes these through as-is, which works well for LangChain4j's function-calling support. But if you need to validate inputs in Java before sending them to the server, you would need to add that validation layer yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No hot-reload.&lt;/strong&gt; If the MCP server crashes, the tools become unavailable. The bridge does not automatically restart servers. For production deployments, you would want health-check and restart logic around the lifecycle objects.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use MCP vs. Java-native tools
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Consideration&lt;/th&gt;
&lt;th&gt;MCP&lt;/th&gt;
&lt;th&gt;Java-native&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ecosystem breadth&lt;/td&gt;
&lt;td&gt;Large and growing&lt;/td&gt;
&lt;td&gt;You build what you need&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime dependency&lt;/td&gt;
&lt;td&gt;Node.js (for reference servers)&lt;/td&gt;
&lt;td&gt;Pure JVM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Startup latency&lt;/td&gt;
&lt;td&gt;1-2s per server&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;Cross-process&lt;/td&gt;
&lt;td&gt;Same-process stack traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customization&lt;/td&gt;
&lt;td&gt;Limited to server's API&lt;/td&gt;
&lt;td&gt;Full control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integration with Java types&lt;/td&gt;
&lt;td&gt;String-based&lt;/td&gt;
&lt;td&gt;Native records, type safety&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The practical pattern: start with MCP for rapid capability bootstrapping, move to Java-native tools for anything performance-sensitive or deeply integrated with your domain model.&lt;/p&gt;




&lt;p&gt;The MCP bridge is part of &lt;a href="https://github.com/AgentEnsemble/agentensemble" rel="noopener noreferrer"&gt;AgentEnsemble&lt;/a&gt;. The &lt;a href="https://agentensemble.net/guides/mcp/" rel="noopener noreferrer"&gt;MCP bridge guide&lt;/a&gt; covers the full API and transport options.&lt;/p&gt;

&lt;p&gt;Curious whether others are mixing MCP and native tools in their agent systems, and where the boundary between the two tends to settle in practice.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Coding Agents on the JVM: Project Detection, Workspace Isolation, and Tool Composition</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Sun, 03 May 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/coding-agents-on-the-jvm-project-detection-workspace-isolation-and-tool-composition-25li</link>
      <guid>https://dev.to/agentensemble/coding-agents-on-the-jvm-project-detection-workspace-isolation-and-tool-composition-25li</guid>
      <description>&lt;p&gt;Most agent frameworks treat coding tasks the same as any other task: give the agent a file-read tool and a file-write tool and hope for the best.&lt;/p&gt;

&lt;p&gt;In practice, an agent that can read and write files is not the same as an agent that can reliably work on a codebase. The gap between "can modify files" and "can fix a bug in a Gradle project" is significant, and it is mostly about context that the agent needs but does not have.&lt;/p&gt;

&lt;h2&gt;
  
  
  The missing context
&lt;/h2&gt;

&lt;p&gt;A coding agent needs to know things that a general-purpose agent does not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What kind of project is this?&lt;/strong&gt; Is it Java with Gradle, Python with pip, TypeScript with npm? The build command, test command, and source layout all follow from this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Where is the code?&lt;/strong&gt; Source roots like &lt;code&gt;src/main/java&lt;/code&gt; are conventions, not universal truths. The agent needs to know where to look.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How do I verify my changes?&lt;/strong&gt; Running &lt;code&gt;./gradlew test&lt;/code&gt; is fundamentally different from running &lt;code&gt;npm test&lt;/code&gt;. The agent needs the right command for the project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How do I avoid breaking things?&lt;/strong&gt; If the agent edits files directly in the user's working tree, a failed experiment leaves half-finished code behind.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this context, agents make predictable mistakes: they guess at build commands, search in wrong directories, and leave the codebase in a worse state than they found it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project detection as a first-class concern
&lt;/h2&gt;

&lt;p&gt;The approach I've been working on in &lt;a href="https://github.com/AgentEnsemble/agentensemble" rel="noopener noreferrer"&gt;AgentEnsemble&lt;/a&gt; treats project detection as an explicit step before tool assembly.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ProjectDetector.analyze(Path)&lt;/code&gt; scans the project root for build-file markers and returns a &lt;code&gt;ProjectContext&lt;/code&gt; that captures language, build system, source roots, and the commands needed to build and test:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Marker file&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Build command&lt;/th&gt;
&lt;th&gt;Test command&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;build.gradle.kts&lt;/code&gt; / &lt;code&gt;build.gradle&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Java&lt;/td&gt;
&lt;td&gt;&lt;code&gt;./gradlew build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;./gradlew test&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pom.xml&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Java&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mvn compile&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mvn test&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;package.json&lt;/code&gt; + &lt;code&gt;tsconfig.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm run build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm test&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;pyproject.toml&lt;/code&gt; / &lt;code&gt;requirements.txt&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;&lt;code&gt;python -m build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;python -m pytest&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;go.mod&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&lt;code&gt;go build ./...&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;go test ./...&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cargo.toml&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cargo build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cargo test&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is not magic -- it is a lookup table backed by file-existence checks. But it means the agent's system prompt includes the correct build and test commands for the specific project, rather than generic instructions that may or may not apply.&lt;/p&gt;

&lt;p&gt;The detected context is injected into the agent's instructions automatically. The agent knows it is working on a Java/Gradle project with source at &lt;code&gt;src/main/java&lt;/code&gt; and tests at &lt;code&gt;src/test/java&lt;/code&gt;, and it knows that &lt;code&gt;./gradlew test&lt;/code&gt; is the verification command.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workspace isolation via git worktrees
&lt;/h2&gt;

&lt;p&gt;The harder problem is safety. A coding agent that writes directly to the user's working tree is an agent that can break your build, conflict with your uncommitted work, or leave half-finished refactoring behind if it fails partway through.&lt;/p&gt;

&lt;p&gt;Git worktrees solve this cleanly. A worktree is a lightweight, branch-isolated copy of a repository that shares the same object store as the original. Creation is fast and disk-efficient because it does not duplicate the git history.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodingEnsemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;runIsolated&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;repoRoot&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;CodingTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;implement&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Add user profile endpoint"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;runIsolated&lt;/code&gt; call:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creates a git worktree from the current branch&lt;/li&gt;
&lt;li&gt;Runs the coding agent inside the worktree&lt;/li&gt;
&lt;li&gt;On success, preserves the worktree for review (you can inspect the changes, run tests again, then merge)&lt;/li&gt;
&lt;li&gt;On failure, cleans up the worktree automatically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key interface is &lt;code&gt;Workspace&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;Workspace&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AutoCloseable&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="nf"&gt;path&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;          &lt;span class="c1"&gt;// Absolute path to the isolated directory&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;         &lt;span class="c1"&gt;// Clean up (remove worktree)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For non-git projects, a &lt;code&gt;DirectoryWorkspace&lt;/code&gt; creates a temporary directory and optionally copies source files. But for the common case -- a git repository -- worktrees provide isolation without the cost of a full clone.&lt;/p&gt;

&lt;p&gt;The tradeoff is that worktrees require a git repository. If you are working on a non-git project or a freshly initialized directory, the fallback to temporary directories is less elegant. But for the vast majority of real codebases, worktrees are the right abstraction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Composable tool backends
&lt;/h2&gt;

&lt;p&gt;Different environments have different constraints. Some teams run pure-JVM deployments where Node.js is not available. Others already use MCP servers and want to reuse them. A coding agent framework should not force one approach.&lt;/p&gt;

&lt;p&gt;AgentEnsemble provides three tool backends, selected via &lt;code&gt;ToolBackend&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Requires&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AUTO&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Detect best available backend&lt;/td&gt;
&lt;td&gt;Nothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;JAVA&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Java-native coding tools (glob, search, edit, shell, git, build, test)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;agentensemble-tools-coding&lt;/code&gt; on classpath&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MCP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;MCP reference servers for filesystem + git&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;agentensemble-mcp&lt;/code&gt; + Node.js&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MINIMAL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;FileReadTool&lt;/code&gt; only&lt;/td&gt;
&lt;td&gt;Always available&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;AUTO&lt;/code&gt; resolves in order: MCP &amp;gt; JAVA &amp;gt; MINIMAL. If neither optional module is on the classpath, the agent works with file-read only -- limited, but functional for read-only analysis tasks.&lt;/p&gt;

&lt;p&gt;The Java backend provides purpose-built coding tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GlobTool&lt;/strong&gt; -- find files by pattern across the project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GrepTool&lt;/strong&gt; -- search file contents with regex&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CodeEditTool&lt;/strong&gt; -- surgical line-range replacement (not full-file overwrite)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ShellTool&lt;/strong&gt; -- execute build/test commands with output capture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitTool&lt;/strong&gt; -- status, diff, stage, commit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The MCP backend starts the official MCP filesystem and git reference servers as subprocesses and adapts their tools to the &lt;code&gt;AgentTool&lt;/code&gt; interface. Both backends produce the same tool interface, so the rest of the framework does not care which one is active.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one-liner and the builder
&lt;/h2&gt;

&lt;p&gt;For the common case, a single call handles everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodingEnsemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/my/project"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;CodingTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fix&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"NullPointerException in UserService.getById()"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That call detects the project, assembles tools, generates a coding-specific system prompt, and runs the agent with a higher iteration limit (75 vs the default 25 -- coding tasks typically need more rounds).&lt;/p&gt;

&lt;p&gt;For more control, the builder exposes every knob:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Agent&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodingAgent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;workingDirectory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/my/project"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolBackend&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ToolBackend&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;JAVA&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;requireApproval&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxIterations&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;additionalTools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myCustomTool&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The builder returns a standard &lt;code&gt;Agent&lt;/code&gt; -- no subclassing, no special execution path. You can use it with &lt;code&gt;Task&lt;/code&gt;, &lt;code&gt;Ensemble&lt;/code&gt;, phases, or any other framework feature. The coding agent is composed from the same primitives as every other agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pre-configured task types
&lt;/h2&gt;

&lt;p&gt;Common coding workflows have predictable shapes. A bug-fix task needs different instructions than a feature implementation or a refactoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;fix&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodingTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fix&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"NullPointerException in handler"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;implement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodingTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;implement&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Add pagination to /api/users"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;refactor&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodingTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;refactor&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Extract UserRepository interface"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each returns a standard &lt;code&gt;Task&lt;/code&gt; with appropriate description and expected-output templates. They can be further customized:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodingTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fix&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Some bug"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toBuilder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;expectedOutput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Custom expected output"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a convenience, not a requirement. You can always construct a &lt;code&gt;Task&lt;/code&gt; manually and pass it to a coding agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tradeoffs and limitations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project detection is heuristic.&lt;/strong&gt; It works for standard project layouts but will not detect custom build systems or unconventional directory structures. The fallback is explicit configuration via the builder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Iteration limits are a blunt instrument.&lt;/strong&gt; A higher limit gives the agent more chances to iterate, but it also means higher token costs if the agent goes in circles. There is no substitute for good prompting and appropriate task scoping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workspace isolation adds a step.&lt;/strong&gt; The agent works in a worktree, but the user still needs to review and merge the changes. This is deliberate -- automated merge would undermine the safety guarantee -- but it does add friction to the workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool backend selection is build-time.&lt;/strong&gt; You choose your backend by including the right dependency. Runtime switching between Java and MCP backends is possible via &lt;code&gt;AUTO&lt;/code&gt;, but you cannot hot-swap mid-execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design principle
&lt;/h2&gt;

&lt;p&gt;The useful abstraction is not "an agent that can code" but "a standard agent with the right tools and context for coding tasks." The coding agent is not a special type -- it is a regular agent, assembled with project-aware tools, operating in an isolated workspace, and configured with appropriate iteration limits.&lt;/p&gt;

&lt;p&gt;This matters because it means coding agents compose with everything else in the framework: phases, delegation, human review, metrics, traces. There is no separate execution path to maintain.&lt;/p&gt;




&lt;p&gt;The coding agent modules are part of &lt;a href="https://github.com/AgentEnsemble/agentensemble" rel="noopener noreferrer"&gt;AgentEnsemble&lt;/a&gt;. The &lt;a href="https://agentensemble.net/guides/coding-agents/" rel="noopener noreferrer"&gt;coding agents guide&lt;/a&gt; and &lt;a href="https://agentensemble.net/guides/workspace-isolation/" rel="noopener noreferrer"&gt;workspace isolation guide&lt;/a&gt; cover the full API.&lt;/p&gt;

&lt;p&gt;I'd be interested in feedback on the tool backend abstraction -- whether three levels (MCP, Java, minimal) feels like the right granularity, or whether there are intermediate points that would be useful.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Humans as Participants, Not Controllers: Designing Agent Systems That Run Without You</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Thu, 30 Apr 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/humans-as-participants-not-controllers-designing-agent-systems-that-run-without-you-1c8m</link>
      <guid>https://dev.to/agentensemble/humans-as-participants-not-controllers-designing-agent-systems-that-run-without-you-1c8m</guid>
      <description>&lt;p&gt;Most human-in-the-loop designs treat humans as gatekeepers. The agent pipeline pauses, a notification fires, a human reviews and approves, the pipeline continues. If the human is not there, the system waits. If the human takes too long, the system times out.&lt;/p&gt;

&lt;p&gt;This works for simple approval workflows. It does not work for systems that need to run autonomously for hours or days while humans come and go.&lt;/p&gt;

&lt;p&gt;The harder design problem is: how do you build agent systems where humans are participants in the system rather than controllers of it? Where the system runs without them, benefits from their presence, and does not break when they leave?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Controller Model vs the Participant Model
&lt;/h2&gt;

&lt;p&gt;In the controller model, the human is a required step in the pipeline. The system cannot proceed without them. If the human is unavailable, the system blocks. Every approval gate is a potential bottleneck.&lt;/p&gt;

&lt;p&gt;In the participant model, the human connects to a running system, observes its current state, provides input where useful, makes decisions that require their authority, and disconnects. The system keeps running.&lt;/p&gt;

&lt;p&gt;The distinction is not about removing humans from the loop. It is about changing the default from "blocked, waiting for human" to "running autonomously, human welcome."&lt;/p&gt;




&lt;h2&gt;
  
  
  The Interaction Spectrum
&lt;/h2&gt;

&lt;p&gt;Not all human interactions have the same urgency or the same blocking requirement. The design uses a five-level spectrum:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Autonomous&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Housekeeping cleans rooms after checkout&lt;/td&gt;
&lt;td&gt;No human needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Advisory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manager says "prioritize VIP guest"&lt;/td&gt;
&lt;td&gt;Human input welcomed but not required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Notifiable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Water leak detected in room 305"&lt;/td&gt;
&lt;td&gt;Alert a human, proceed with best-effort response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Approvable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Guest requests late checkout&lt;/td&gt;
&lt;td&gt;Ask human if available, auto-approve on timeout&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opening the hotel safe&lt;/td&gt;
&lt;td&gt;Cannot proceed without human authorization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most interactions in a well-designed system should fall in the first three levels. The system handles them autonomously. Humans are notified of important events but do not need to take action for the system to continue.&lt;/p&gt;

&lt;p&gt;The gated level is reserved for decisions that genuinely require human authority -- security decisions, compliance gates, large financial commitments. These are intentionally rare and intentionally blocking.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gated Reviews with Role Requirements
&lt;/h2&gt;

&lt;p&gt;When a task requires human authorization, the review specifies who can approve:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;openSafe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Open the hotel safe for cash reconciliation"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Review&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Manager authorization required to open the safe"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;requiredRole&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"manager"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ZERO&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// no timeout -- wait until a human decides&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When this review fires and no qualified human is connected:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The review is queued&lt;/li&gt;
&lt;li&gt;An out-of-band notification is sent (Slack, email, webhook)&lt;/li&gt;
&lt;li&gt;The task waits&lt;/li&gt;
&lt;li&gt;When a qualified human connects to the dashboard, they see the pending review immediately&lt;/li&gt;
&lt;li&gt;They approve or reject, and the task resumes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key design choice: &lt;code&gt;timeout(Duration.ZERO)&lt;/code&gt; means the system waits indefinitely. This is appropriate for decisions that genuinely cannot be made without human authority. For less critical approvals, a timeout with auto-approve provides the fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Review&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Guest requests late checkout -- approve?"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;requiredRole&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"front-desk"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMinutes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeoutDecision&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ReviewDecision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;APPROVE&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If no human responds within 10 minutes, the system auto-approves. The human can still intervene within the window, but the system does not block indefinitely for a non-critical decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  Human Directives
&lt;/h2&gt;

&lt;p&gt;Humans can inject guidance into any ensemble they have access to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"directive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"room-service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"manager:human"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Guest in 801 is VIP, prioritize all their requests"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Directives are non-blocking. They do not pause the system or wait for acknowledgment. They are injected as additional context for future task executions. The next time room service processes a request related to room 801, the directive is included in the prompt context.&lt;/p&gt;

&lt;p&gt;This models how human managers actually work. A hotel manager does not approve every room service order. They walk through the hotel, observe what is happening, and give occasional direction: "That table needs attention." "The VIP in the penthouse gets priority." Then they move on.&lt;/p&gt;




&lt;h2&gt;
  
  
  Control Plane Directives
&lt;/h2&gt;

&lt;p&gt;Beyond natural language guidance, humans (or automated policies) can send structured control plane directives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"directive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kitchen"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cost-policy:automated"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SET_MODEL_TIER"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FALLBACK"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This switches the kitchen ensemble to a cheaper LLM model without restarting. The ensemble has configurable model tiers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatLanguageModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpt4&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;// primary&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fallbackModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpt4Mini&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;// cheaper fallback&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other control plane actions include pausing an ensemble, adjusting priority weights, enabling or disabling specific shared tasks, and changing queue depth limits. These are operational controls that affect ensemble behavior at runtime without redeployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Late-Join State Synchronization
&lt;/h2&gt;

&lt;p&gt;When a human connects to the dashboard -- whether it is their first time today or they are reconnecting after a network interruption -- they need to see the current state of the system immediately. They should not have to wait for events to stream in before understanding what is happening.&lt;/p&gt;

&lt;p&gt;The existing late-join mechanism (from v2.1.0's &lt;code&gt;agentensemble-web&lt;/code&gt; module) extends to the network level. When a human connects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The dashboard sends a &lt;code&gt;hello&lt;/code&gt; message with the human's identity and roles&lt;/li&gt;
&lt;li&gt;Each ensemble the human has access to sends a &lt;code&gt;snapshotTrace&lt;/code&gt; -- the current state of all active tasks, pending reviews, queue depths, and recent events&lt;/li&gt;
&lt;li&gt;Live events start streaming immediately&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The human is caught up within seconds of connecting. Pending reviews that match their role are highlighted. They can start making decisions immediately without waiting for context to accumulate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Operational Resilience
&lt;/h2&gt;

&lt;p&gt;The participant model enables several operational patterns that the controller model cannot support:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Elastic scaling with human oversight.&lt;/strong&gt; A conference weekend means higher load. The system scales automatically (K8s HPA watching queue depth). The human manager connects, observes the scaled-up state, adjusts priorities if needed, and disconnects. The system handles the load autonomously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational profiles.&lt;/strong&gt; Predefined configurations for known scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;NetworkProfile&lt;/span&gt; &lt;span class="n"&gt;sportingEvent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NetworkProfile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sporting-event-weekend"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensemble&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"front-desk"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Capacity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;maxConcurrent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensemble&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Capacity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;maxConcurrent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensemble&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"room-service"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Capacity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;maxConcurrent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;preload&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"inventory"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Extra beer and ice stocked"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;applyProfile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sportingEvent&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A human can apply a profile, or profiles can activate on a schedule or via rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simulation and chaos engineering.&lt;/strong&gt; Before the conference, simulate the expected load: "What happens if kitchen goes down during peak dinner service?" Run a simulation with mock LLMs, time-compressed. Get a capacity report. Then inject a kitchen failure as a chaos test. Assert that room service's circuit breaker opens within 30 seconds and the fallback activates within 1 minute. These are built into the framework, not bolted on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Federation.&lt;/strong&gt; Hotel A is at capacity. Hotel B across town has idle kitchen capacity. Overflow requests route to Hotel B automatically. The human manager sees both hotels on the same dashboard. This is the network-of-networks level -- multiple independent agent systems sharing capacity when needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Autonomy vs oversight.&lt;/strong&gt; The more autonomous the system, the less opportunity for human correction before a mistake propagates. The mitigation is observability: the system runs autonomously but every decision is traced, logged, and visible. Humans review after the fact and inject directives to adjust future behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gating cost.&lt;/strong&gt; Every gated review is a potential bottleneck and a source of latency. The design pressure is to minimize gated interactions -- reserve them for decisions that genuinely require human authority. If you find yourself gating routine operations, the system design needs revision, not more human approvals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Notification fatigue.&lt;/strong&gt; A system that notifies humans about everything trains them to ignore notifications. The notification levels (autonomous, advisory, notifiable, approvable, gated) exist to keep the signal-to-noise ratio high. Most things should be autonomous. Notifications should be reserved for things that actually need attention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simulation fidelity.&lt;/strong&gt; Simulations use mock LLMs and time compression. The behavior will not perfectly match production. The value is in finding structural problems -- capacity bottlenecks, missing fallbacks, broken circuit breakers -- not in predicting exact outcomes.&lt;/p&gt;




&lt;p&gt;This is the third and final post in the Ensemble Network architecture arc. The architecture is planned for AgentEnsemble v3.0.0. The &lt;a href="https://agentensemble.net/blog/ensembles-as-services/" rel="noopener noreferrer"&gt;previous posts&lt;/a&gt; cover &lt;a href="https://agentensemble.net/blog/ensembles-as-services/" rel="noopener noreferrer"&gt;ensembles as services&lt;/a&gt; and &lt;a href="https://agentensemble.net/blog/cross-ensemble-delegation/" rel="noopener noreferrer"&gt;cross-ensemble delegation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The design document at &lt;a href="https://agentensemble.net/design/ensemble-network/" rel="noopener noreferrer"&gt;agentensemble.net/design/ensemble-network/&lt;/a&gt; covers the full architecture including discovery, error handling, versioning, security, testing, and the phased delivery plan.&lt;/p&gt;

&lt;p&gt;AgentEnsemble is open-source under the MIT license.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Task Sharing vs Tool Sharing: Cross-Ensemble Delegation in Distributed Agent Systems</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Mon, 27 Apr 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/task-sharing-vs-tool-sharing-cross-ensemble-delegation-in-distributed-agent-systems-30kc</link>
      <guid>https://dev.to/agentensemble/task-sharing-vs-tool-sharing-cross-ensemble-delegation-in-distributed-agent-systems-30kc</guid>
      <description>&lt;p&gt;MCP (Model Context Protocol) gives agents the ability to call tools hosted by other services. This is useful -- it is function-level interoperability. An agent calls a function, gets a result, continues.&lt;/p&gt;

&lt;p&gt;But there is a level above function calls that most frameworks have not addressed: what happens when one autonomous agent system needs to delegate a complex, multi-step process to another autonomous agent system?&lt;/p&gt;

&lt;p&gt;The distinction matters. Calling a tool is like borrowing a calculator. Delegating a task is like hiring a department.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Kinds of Sharing
&lt;/h2&gt;

&lt;p&gt;When agent ensembles run as long-lived services on a network (as described in the &lt;a href="https://agentensemble.net/blog/ensembles-as-services/" rel="noopener noreferrer"&gt;previous post&lt;/a&gt;), they need to share capabilities with each other. There are two fundamentally different kinds of sharing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool sharing&lt;/strong&gt; exposes a single function. The calling agent invokes it in its ReAct loop, gets a result, and continues reasoning. The tool executes atomically -- there is no multi-step process, no internal agents, no review gates. This is what MCP provides.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task sharing&lt;/strong&gt; exposes a complete process. The calling ensemble delegates work to another ensemble, which runs its own agents, tools, memory, and review gates to produce the result. The caller does not know or control the internal process. It hands off work and gets back a result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Room service uses both kinds of sharing from kitchen&lt;/span&gt;
&lt;span class="nc"&gt;Ensemble&lt;/span&gt; &lt;span class="n"&gt;roomService&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"room-service"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatLanguageModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Handle guest room service request"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="c1"&gt;// Task sharing: delegates the full meal preparation process&lt;/span&gt;
            &lt;span class="nc"&gt;NetworkTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"prepare-meal"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;

            &lt;span class="c1"&gt;// Tool sharing: calls a single function for inventory check&lt;/span&gt;
            &lt;span class="nc"&gt;NetworkTool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"check-inventory"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;NetworkTool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"dietary-check"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;

            &lt;span class="c1"&gt;// Task sharing: delegates repair work to maintenance&lt;/span&gt;
            &lt;span class="nc"&gt;NetworkTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"maintenance"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"repair-request"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both &lt;code&gt;NetworkTask&lt;/code&gt; and &lt;code&gt;NetworkTool&lt;/code&gt; implement the same &lt;code&gt;AgentTool&lt;/code&gt; interface. The agent calling them does not know whether a tool is local or remote, or whether it triggers a single function or an entire pipeline. The existing ReAct loop, tool executor, metrics, and tracing all work unchanged.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Delegation Works
&lt;/h2&gt;

&lt;p&gt;When an agent calls a shared tool, the flow is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent calls &lt;code&gt;check-inventory("wagyu beef")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NetworkTool&lt;/code&gt; serializes the call into a WorkRequest&lt;/li&gt;
&lt;li&gt;Request is sent to the kitchen ensemble (WebSocket or queue)&lt;/li&gt;
&lt;li&gt;Kitchen executes &lt;code&gt;inventoryTool.execute("wagyu beef")&lt;/code&gt; locally&lt;/li&gt;
&lt;li&gt;Result flows back: &lt;code&gt;"Yes, 3 portions available"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Agent continues its ReAct loop&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When an agent calls a shared task, the flow involves a full pipeline on the other side:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent calls &lt;code&gt;prepare-meal("Wagyu steak, medium-rare, room 403")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NetworkTask&lt;/code&gt; serializes a WorkRequest with the full task context&lt;/li&gt;
&lt;li&gt;Request is sent to kitchen&lt;/li&gt;
&lt;li&gt;Kitchen runs its complete task pipeline -- agent synthesis, tool calls, execution, review gates&lt;/li&gt;
&lt;li&gt;Result flows back: &lt;code&gt;"Preparing now, estimated 25 minutes, ticket #4071"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Agent continues&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The critical difference: in step 4 of the task delegation, the kitchen ensemble is running its own agents with its own tools and its own review gates. The room service agent is not involved in any of that. It delegated the work and is waiting for a result -- or continuing with other work if the request was async.&lt;/p&gt;




&lt;h2&gt;
  
  
  The WorkRequest Envelope
&lt;/h2&gt;

&lt;p&gt;Every cross-ensemble message uses a standardized envelope:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;WorkRequest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;requestId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// Correlation + idempotency key&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;                &lt;span class="c1"&gt;// Requesting ensemble name&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;                &lt;span class="c1"&gt;// Shared task or tool name to execute&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;// Natural language input/context&lt;/span&gt;
    &lt;span class="nc"&gt;Priority&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;// CRITICAL / HIGH / NORMAL / LOW&lt;/span&gt;
    &lt;span class="nc"&gt;Duration&lt;/span&gt; &lt;span class="n"&gt;deadline&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;// Caller's SLA ("I need this within...")&lt;/span&gt;
    &lt;span class="nc"&gt;DeliverySpec&lt;/span&gt; &lt;span class="n"&gt;delivery&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// How and where to return the result&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;traceContext&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// W3C traceparent for distributed tracing&lt;/span&gt;
    &lt;span class="nc"&gt;CachePolicy&lt;/span&gt; &lt;span class="n"&gt;cachePolicy&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// USE_CACHED / FORCE_FRESH&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;cacheKey&lt;/span&gt;             &lt;span class="c1"&gt;// Optional, for result caching&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few design choices in this envelope are worth noting:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The context field is natural language.&lt;/strong&gt; When maintenance asks procurement to order parts, the context is: "Order replacement valve for building 2 boiler." Not a typed JSON schema. Not a protobuf message. Natural language that the receiving ensemble's LLM interprets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The deadline belongs to the caller, not the provider.&lt;/strong&gt; The requester sets the SLA: "I need this within 30 minutes." The provider responds with an estimated completion time. If the estimate exceeds the deadline, the caller decides: accept the longer wait, try another provider (federation), or continue without.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delivery is caller-specified.&lt;/strong&gt; The requester tells the provider how to return the result -- WebSocket for real-time, a durable queue for reliability, a webhook for external integration, or a shared store for polling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Natural Language as Contract
&lt;/h2&gt;

&lt;p&gt;This is the design choice I find most interesting and most debatable.&lt;/p&gt;

&lt;p&gt;In traditional microservice architectures, services communicate via typed schemas -- protobuf, OpenAPI, GraphQL. Schema versioning is a constant source of friction. A field name change breaks callers. A new required field breaks backwards compatibility. Teams spend significant effort on schema evolution, versioning policies, and migration tooling.&lt;/p&gt;

&lt;p&gt;In the Ensemble Network, the contract between services is natural language. When maintenance tells procurement "order replacement parts for the boiler valve," it does not matter whether procurement's internal schema changed. The LLM on the receiving side interprets the request. Minor changes in wording do not break callers.&lt;/p&gt;

&lt;p&gt;This works because the participants are LLMs, not deterministic parsers. An LLM that receives "order parts for the boiler" and an LLM that receives "purchase replacement components for the heating system" will produce equivalent behavior. The semantic intent is preserved even when the exact phrasing varies.&lt;/p&gt;

&lt;p&gt;The tradeoff is real: you lose type safety. A typed schema guarantees that the data conforms to a specific shape. Natural language does not. If the receiving ensemble misinterprets the request, you get a wrong result, not a compile error. The mitigation is the same as elsewhere in agent systems: review gates, guardrails, and observability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Request Modes
&lt;/h2&gt;

&lt;p&gt;The caller decides how to wait for the result:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Await&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Block until result&lt;/td&gt;
&lt;td&gt;Critical path: "Can't continue without this"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Async&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Submit and continue; result delivered later&lt;/td&gt;
&lt;td&gt;Non-critical: "Order towels when you get to it"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Await with deadline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wait up to N; then continue with partial/no result&lt;/td&gt;
&lt;td&gt;Balanced: "Wait 30 min, then proceed with what I know"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The await-with-deadline mode is the most operationally useful. It lets the caller set a budget for how long to wait before continuing. If the provider delivers within the deadline, the caller uses the result. If not, it makes a decision: retry, use a fallback, or proceed without.&lt;/p&gt;




&lt;h2&gt;
  
  
  Capacity Management
&lt;/h2&gt;

&lt;p&gt;The provider's default response to load is accept and queue, not reject. LLM tasks are not real-time request/response -- they take seconds to hours. Everyone expects latency. The provider accepts the work into a priority queue and returns an estimated completion time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task_accepted"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"requestId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"maint-7721"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"queuePosition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"estimatedCompletion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PT45M"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rejection only happens at hard limits -- the queue itself is full. This "bend, don't break" approach matches the reality of LLM workloads: capacity is elastic, latency is expected, and it is almost always better to queue work than to reject it.&lt;/p&gt;

&lt;p&gt;Priority queuing ensures critical requests are processed first (CRITICAL &amp;gt; HIGH &amp;gt; NORMAL &amp;gt; LOW). Within the same priority, FIFO. Low-priority items age over time to prevent starvation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Distributed Tracing
&lt;/h2&gt;

&lt;p&gt;Every WorkRequest carries a W3C &lt;code&gt;traceparent&lt;/code&gt; header. When maintenance delegates to procurement, which delegates to logistics, the trace context propagates across all three. Open Jaeger (or any W3C-compatible tracing backend) and you see the full chain: which ensemble originated the request, how long each step took, where the bottleneck was.&lt;/p&gt;

&lt;p&gt;This is standard distributed tracing, not a custom solution. The same infrastructure teams use for HTTP microservices works here. The difference is that each span may represent an LLM call that takes 30 seconds instead of a database query that takes 3 milliseconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Loose coupling vs type safety.&lt;/strong&gt; Natural language contracts are resilient to change but do not guarantee correctness. Typed schemas guarantee correctness but are brittle to change. The right choice depends on how stable the interface is. For evolving, exploratory agent systems, natural language is pragmatic. For stable, high-volume interfaces, a typed schema wrapper may be worth the friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency tolerance.&lt;/strong&gt; Cross-ensemble delegation adds network hops and queuing delays. A task that takes 10 seconds locally may take 2 minutes when delegated across a network. The architecture assumes latency tolerance -- if your use case requires sub-second responses, delegation is the wrong pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure modes.&lt;/strong&gt; When the kitchen ensemble is down, room service's &lt;code&gt;prepare-meal&lt;/code&gt; call fails. The circuit breaker opens. The agent needs a fallback -- suggest alternatives, queue the request for later, or inform the guest. Distributed systems fail in distributed ways. The framework provides the circuit breaker and fallback mechanisms, but the failure strategy is application-specific.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability cost.&lt;/strong&gt; Every cross-ensemble request generates trace data, metrics, and log entries. In a busy network with many delegations, the observability overhead is non-trivial. The tracing infrastructure needs to handle the volume, and teams need dashboards that make sense of the flow.&lt;/p&gt;




&lt;p&gt;This is the second post in a three-part arc on the Ensemble Network architecture. The next post covers human participation -- how humans connect to and interact with a network of autonomous ensembles without becoming bottlenecks.&lt;/p&gt;

&lt;p&gt;The design document at &lt;a href="https://agentensemble.net/design/ensemble-network/" rel="noopener noreferrer"&gt;agentensemble.net/design/ensemble-network/&lt;/a&gt; covers the full architecture.&lt;/p&gt;

&lt;p&gt;AgentEnsemble is open-source under the MIT license.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>From Run-and-Exit to Always-On: When Agent Ensembles Become Services</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Fri, 24 Apr 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/from-run-and-exit-to-always-on-when-agent-ensembles-become-services-28po</link>
      <guid>https://dev.to/agentensemble/from-run-and-exit-to-always-on-when-agent-ensembles-become-services-28po</guid>
      <description>&lt;p&gt;Every multi-agent framework works the same way at its core. You define some agents, give them tasks, press go, get output. The agents exist for the duration of the run and then disappear.&lt;/p&gt;

&lt;p&gt;This is fine for bounded problems: "research this topic and write a report." But it does not model how real work gets done in production systems that need to be always-on, multi-domain, and human-augmented.&lt;/p&gt;

&lt;p&gt;The question I kept coming back to was: what changes when an ensemble stops being a script and starts being a service?&lt;/p&gt;




&lt;h2&gt;
  
  
  Scripts vs Services
&lt;/h2&gt;

&lt;p&gt;A script runs and exits. You invoke it, it does work, it returns a result, the process terminates. Every multi-agent framework today -- CrewAI, AutoGen, LangGraph, AgentEnsemble v2.x -- operates in this mode.&lt;/p&gt;

&lt;p&gt;A service runs continuously. It handles work as it arrives, communicates with peers, maintains state between requests, and survives restarts. The difference is not just about uptime -- it changes the entire interaction model.&lt;/p&gt;

&lt;p&gt;When an ensemble is a script, it is invoked by something external. When an ensemble is a service, it participates in a network of other services. It can accept work from multiple sources, share capabilities with peers, and run proactive tasks on a schedule -- all without an external orchestrator telling it what to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hotel Model
&lt;/h2&gt;

&lt;p&gt;Consider a hotel. It is composed of departments: front desk, housekeeping, kitchen, room service, maintenance, procurement. Each department is autonomous -- it has its own staff, processes, and expertise. These departments communicate with each other directly. Room service calls the kitchen to prepare a meal. Maintenance calls procurement to order spare parts.&lt;/p&gt;

&lt;p&gt;The hotel runs continuously. The manager comes in at 8am, walks around, checks on things, gives some direction, handles decisions that require authority, and goes home at 6pm. The hotel does not stop when the manager leaves.&lt;/p&gt;

&lt;p&gt;This maps directly to a distributed agent architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hotel concept&lt;/th&gt;
&lt;th&gt;Agent system equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A department&lt;/td&gt;
&lt;td&gt;An ensemble -- long-running, autonomous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Staff within a department&lt;/td&gt;
&lt;td&gt;Agents and tasks within the ensemble&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The intercom / phone system&lt;/td&gt;
&lt;td&gt;WebSocket mesh -- the message transport&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A work order&lt;/td&gt;
&lt;td&gt;A WorkRequest -- the standard message envelope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The hotel directory&lt;/td&gt;
&lt;td&gt;Service registry -- ensembles discover each other&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The duty manager&lt;/td&gt;
&lt;td&gt;A human who connects via the dashboard to observe and intervene&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key observation: the hotel is not centrally orchestrated. There is no "manager agent" that routes every message. Departments handle their domain and communicate laterally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Execution Modes
&lt;/h2&gt;

&lt;p&gt;The existing one-shot mode remains unchanged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Research AI trends"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Write a report"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tasks execute, output is returned, the ensemble is done. This is a "gig" -- a bounded unit of work.&lt;/p&gt;

&lt;p&gt;The new long-running mode turns the ensemble into a service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Ensemble&lt;/span&gt; &lt;span class="n"&gt;kitchen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"kitchen"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatLanguageModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Manage kitchen operations"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;// Share capabilities to the network&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;shareTask&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"prepare-meal"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Prepare a meal as specified"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;expectedOutput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Confirmation with preparation details and timing"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;shareTool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"check-inventory"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inventoryTool&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Scheduled proactive task&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;scheduledTask&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ScheduledTask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"inventory-report"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Check current inventory levels and report shortages"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Schedule&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;every&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofHours&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;broadcastTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"hotel.inventory"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;

    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;kitchen&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7329&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// WebSocket server, K8s Service fronts this&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In long-running mode, the ensemble:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Registers shared tasks and tools on the network&lt;/li&gt;
&lt;li&gt;Accepts incoming work requests via WebSocket, queue, HTTP, or topic subscription&lt;/li&gt;
&lt;li&gt;Processes work through a priority queue&lt;/li&gt;
&lt;li&gt;Delivers results via the caller-specified delivery method&lt;/li&gt;
&lt;li&gt;Runs scheduled proactive tasks on configured intervals&lt;/li&gt;
&lt;li&gt;Continues until explicitly stopped or drained&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;start(port)&lt;/code&gt; call is the boundary between script and service. Before it, the ensemble is a configuration. After it, the ensemble is an active participant in a network.&lt;/p&gt;




&lt;h2&gt;
  
  
  Work Ingress
&lt;/h2&gt;

&lt;p&gt;When an ensemble becomes a service, work can arrive from multiple sources simultaneously:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WebSocket&lt;/td&gt;
&lt;td&gt;Direct from another ensemble (real-time)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue&lt;/td&gt;
&lt;td&gt;Pull from durable queue (Kafka, SQS, Redis Streams)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP API&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;POST /api/work&lt;/code&gt; (external systems, scripts, CI pipelines)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Topic subscription&lt;/td&gt;
&lt;td&gt;React to events from other ensembles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schedule&lt;/td&gt;
&lt;td&gt;Internal cron/interval (proactive tasks)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All sources normalize into the same internal format before entering the ensemble's priority queue. The ensemble processes work by priority (CRITICAL &amp;gt; HIGH &amp;gt; NORMAL &amp;gt; LOW), with FIFO ordering within the same priority level.&lt;/p&gt;

&lt;p&gt;This means an ensemble can simultaneously handle direct requests from peer ensembles, pull batch work from a queue, respond to events, and run scheduled health checks -- without any of these mechanisms knowing about each other.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Model
&lt;/h2&gt;

&lt;p&gt;Each ensemble deploys as a Kubernetes service -- one or more pods behind a K8s Service resource. Ensembles discover each other via DNS name. This is standard infrastructure that operations teams already know how to manage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Namespace: hotel-downtown
  +-- Service: kitchen
  +-- Service: room-service
  +-- Service: maintenance
  +-- Service: front-desk
  +-- Service: dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scaling is handled by Kubernetes HPA watching queue depth or request latency. Conference weekend with heavy kitchen load? Scale kitchen to 3 replicas. Off-peak Tuesday? Scale back to 1. The ensemble handles replica coordination through broadcast-claim delivery: a work request is offered to all replicas, and the first to claim it processes it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changes
&lt;/h2&gt;

&lt;p&gt;The shift from script to service changes several things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lifecycle management matters.&lt;/strong&gt; A script that crashes restarts from scratch. A service that crashes needs graceful shutdown, drain logic, and state recovery. The ensemble supports a drain mode where it stops accepting new work, finishes in-flight tasks, and shuts down cleanly. On restart, it picks up queued work from durable sources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proactive work becomes possible.&lt;/strong&gt; A script only does what you tell it to do. A service can schedule its own work -- periodic inventory checks, health assessments, report generation. These scheduled tasks run on internal timers and broadcast results to interested subscribers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability changes.&lt;/strong&gt; A script that runs for 30 seconds needs a log. A service that runs for months needs a dashboard. The existing web module (WebSocket server, live trace streaming, late-join snapshot) extends naturally to the long-running model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The human relationship changes.&lt;/strong&gt; A script blocks on human input and times out. A service has humans who connect and disconnect. They observe the current state, give direction, handle decisions that need authority, and leave. The system keeps running. This is a deep enough topic that the next post in this series will cover it in detail.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity vs capability.&lt;/strong&gt; A script is simple: invoke it, get a result. A service requires infrastructure -- Kubernetes, queues, monitoring, lifecycle management. If your workload is "run this pipeline once and give me the output," the service model is unnecessary overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always-on cost.&lt;/strong&gt; A script uses resources only while it runs. A service uses resources continuously, even when idle. For intermittent workloads, the cost calculus favors one-shot execution with on-demand scaling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State management.&lt;/strong&gt; Scripts are stateless by nature -- they start fresh every time. Services accumulate state: queued work, scheduled tasks, shared memory, connection state. This state needs to be durable, recoverable, and observable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use which.&lt;/strong&gt; The one-shot mode is right for discrete, bounded problems. The long-running mode is right when the workload is continuous, when multiple domains need to communicate, when humans need to observe and participate without blocking, and when the system needs to be always-on.&lt;/p&gt;

&lt;p&gt;Both modes coexist. An ensemble that runs as a long-running service can still execute individual tasks in one-shot mode internally. The architecture does not force a choice -- it extends the existing model.&lt;/p&gt;




&lt;p&gt;This is the first post in a three-part arc on the Ensemble Network architecture planned for v3.0.0. The next post covers cross-ensemble delegation -- how ensembles share tasks and tools across service boundaries, and why the contract between them is natural language, not typed schemas.&lt;/p&gt;

&lt;p&gt;The design document at &lt;a href="https://agentensemble.net/design/ensemble-network/" rel="noopener noreferrer"&gt;agentensemble.net/design/ensemble-network/&lt;/a&gt; covers the full architecture.&lt;/p&gt;

&lt;p&gt;AgentEnsemble is open-source under the MIT license.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Token-Efficient Context Passing: Pluggable Serialization for Multi-Agent Pipelines</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/token-efficient-context-passing-pluggable-serialization-for-multi-agent-pipelines-3afg</link>
      <guid>https://dev.to/agentensemble/token-efficient-context-passing-pluggable-serialization-for-multi-agent-pipelines-3afg</guid>
      <description>&lt;p&gt;In a multi-agent pipeline, structured data flows between tasks at every step. Task outputs become context for downstream tasks. Tool results are appended to the conversation. Memory entries are injected into prompts. All of this data is serialized as text and counted against the model's context window.&lt;/p&gt;

&lt;p&gt;The default serialization format is JSON. JSON is familiar, well-supported, and universally understood by LLMs. It is also verbose. Curly braces, quoted keys, commas, colons, and brackets consume tokens that carry no semantic value for the model. In a short pipeline with small payloads, this overhead is negligible. In a long pipeline with rich context -- multiple tool calls per task, structured outputs flowing forward, memory entries accumulating -- it compounds quickly.&lt;/p&gt;

&lt;p&gt;The question I kept coming back to was: where does the serialization format actually matter in this pipeline, and can the framework make it pluggable without leaking complexity?&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Tokens Go
&lt;/h2&gt;

&lt;p&gt;In a typical multi-agent workflow, structured data appears in four places:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;What gets serialized&lt;/th&gt;
&lt;th&gt;Token impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Outputs from prior tasks injected into downstream prompts&lt;/td&gt;
&lt;td&gt;Medium -- depends on output size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool results&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSON payloads returned by tool executions during ReAct loops&lt;/td&gt;
&lt;td&gt;High -- tool results are often large and iterate multiple times&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory entries&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured content from memory scopes&lt;/td&gt;
&lt;td&gt;Medium -- grows with pipeline length&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trace export&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Execution traces serialized for analysis&lt;/td&gt;
&lt;td&gt;None -- not sent to the LLM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tool results tend to dominate. A tool that queries a database or calls an API returns a JSON payload. In a ReAct loop with multiple iterations, these payloads accumulate in the conversation history. Each iteration adds more serialized data to the context window.&lt;/p&gt;

&lt;p&gt;The framework already controls every one of these serialization points. It builds prompts, formats tool results, injects memory, and exports traces. That means a single configuration point can control the format everywhere, without requiring changes to task definitions, tool implementations, or agent logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Design
&lt;/h2&gt;

&lt;p&gt;The goal was a pluggable serialization layer with three properties:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Opt-in&lt;/strong&gt;: JSON remains the default. No existing code changes behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail-fast&lt;/strong&gt;: If a format is selected but its runtime dependency is missing, the ensemble fails at build time with a clear error, not at runtime mid-pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single configuration point&lt;/strong&gt;: One builder call controls all serialization points. No per-task or per-tool format settings.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The API surface is small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;enum&lt;/span&gt; &lt;span class="nc"&gt;ContextFormat&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="no"&gt;JSON&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="no"&gt;TOON&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;ContextFormatter&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Object&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;formatJson&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ContextFormat&lt;/code&gt; is an enum selecting the serialization strategy. &lt;code&gt;ContextFormatter&lt;/code&gt; is an interface with two methods: &lt;code&gt;format(Object)&lt;/code&gt; for Java objects and &lt;code&gt;formatJson(String)&lt;/code&gt; for re-encoding existing JSON strings. The distinction matters because tool results arrive as JSON strings that need to be converted, while task outputs and memory entries are Java objects that need to be serialized from scratch.&lt;/p&gt;

&lt;p&gt;A factory class resolves the correct implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ContextFormatter&lt;/span&gt; &lt;span class="n"&gt;formatter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ContextFormatters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;forFormat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ContextFormat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;TOON&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;encoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;formatter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myObject&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  TOON as a Concrete Alternative
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/toon-format/spec" rel="noopener noreferrer"&gt;TOON&lt;/a&gt; (Token-Oriented Object Notation) is a compact, human-readable serialization format designed specifically for LLM contexts. It combines YAML-like indentation with CSV-like tabular arrays and achieves 30-60% token reduction versus JSON.&lt;/p&gt;

&lt;p&gt;JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:[{&lt;/span&gt;&lt;span class="nl"&gt;"sku"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"A1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"qty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;9.99&lt;/span&gt;&lt;span class="p"&gt;},{&lt;/span&gt;&lt;span class="nl"&gt;"sku"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"B2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"qty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;14.5&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TOON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;items[2]{sku,qty,price}&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;A1,2,9.99&lt;/span&gt;
  &lt;span class="s"&gt;B2,1,14.5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The token savings come from eliminating repeated key names in arrays (declared once in the header), removing quotes around keys, and using indentation instead of braces. For tabular data -- which tool results and structured outputs frequently contain -- the reduction is substantial.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/toon-format/toon-java" rel="noopener noreferrer"&gt;JToon&lt;/a&gt; is the Java implementation. It is MIT-licensed, available on Maven Central, requires Java 17+, and supports Jackson annotations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wiring It In
&lt;/h2&gt;

&lt;p&gt;Enabling TOON is one builder call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatLanguageModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contextFormat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ContextFormat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;TOON&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;researchTask&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysisTask&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reportTask&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At build time, if &lt;code&gt;TOON&lt;/code&gt; is selected, the framework verifies that the JToon class is loadable. If not, it throws an &lt;code&gt;IllegalStateException&lt;/code&gt; with Maven and Gradle coordinates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gradle"&gt;&lt;code&gt;&lt;span class="n"&gt;TOON&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt; &lt;span class="n"&gt;requires&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;JToon&lt;/span&gt; &lt;span class="n"&gt;library&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;classpath&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Add&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;your&lt;/span&gt; &lt;span class="nl"&gt;build:&lt;/span&gt;
  &lt;span class="nl"&gt;Gradle:&lt;/span&gt; &lt;span class="n"&gt;implementation&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"dev.toonformat:jtoon"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="nl"&gt;Maven:&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;dependency&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;groupId&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toonformat&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="s"&gt;/groupId&amp;gt;&amp;lt;artifactId&amp;gt;jtoon&amp;lt;/&lt;/span&gt;&lt;span class="n"&gt;artifactId&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="n"&gt;dependency&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a deliberate design choice. JToon is declared as &lt;code&gt;compileOnly&lt;/code&gt; in the framework, so applications that never use TOON pay no dependency cost. Applications that do use it add one dependency and one builder call.&lt;/p&gt;

&lt;p&gt;The resolved &lt;code&gt;ContextFormatter&lt;/code&gt; is stored in &lt;code&gt;ExecutionContext&lt;/code&gt; and passed to every component that serializes data for the LLM: the prompt builder, the tool result formatter, and the memory injector. No component needs to know which format is active -- it just calls &lt;code&gt;formatter.format(value)&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Integration Points
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prompt Building
&lt;/h3&gt;

&lt;p&gt;When the prompt builder constructs the user message, context from prior tasks and memory entries flows through the configured formatter. If the format is TOON, the data arrives in the prompt as TOON. The LLM reads it as context -- it does not need to produce TOON output.&lt;/p&gt;

&lt;p&gt;One important boundary: structured output schemas remain in JSON regardless of the context format. If a task has an &lt;code&gt;outputType&lt;/code&gt;, the JSON schema in the prompt stays in JSON because the LLM needs to produce parseable JSON that the framework can deserialize. The context around the schema uses whatever format is configured.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Results
&lt;/h3&gt;

&lt;p&gt;Tool execution results are the highest-impact integration point. When a tool returns a JSON string, the framework can re-encode it via &lt;code&gt;contextFormatter.formatJson(toolResultText)&lt;/code&gt; before appending it to the conversation. In a ReAct loop with multiple tool calls, each iteration's results are formatted, and the savings compound across the conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trace Export
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ExecutionTrace&lt;/code&gt; gains TOON export methods alongside the existing JSON ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// JSON (always available)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTrace&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toJson&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"trace.json"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;// TOON (requires JToon on classpath)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTrace&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toToon&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"trace.toon"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trace export is not sent to the LLM, so the token savings do not apply here. The benefit is smaller trace files for storage and analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Compatibility vs savings&lt;/strong&gt;: JSON is universally understood by every LLM. TOON is newer and less widely tested across models. For models that handle structured text well (GPT-4o, Claude, Gemini), TOON works reliably as context. For smaller or less capable models, JSON may be safer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debuggability vs compactness&lt;/strong&gt;: When you are inspecting prompts during development, JSON is more familiar. TOON is human-readable but less immediately obvious. During development, you might use JSON and switch to TOON for production workloads where cost matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost vs complexity&lt;/strong&gt;: TOON reduces token usage, which directly reduces API cost. In a production pipeline processing thousands of runs, 30-60% fewer tokens in context passing translates to measurable cost savings. The complexity cost is one dependency and one builder call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema boundary&lt;/strong&gt;: The structured output schema stays in JSON. This means a prompt with TOON context and a JSON schema contains mixed formats. In practice, this works because the LLM treats the context section and the schema section as separate concerns. But it is worth being aware of if you are debugging prompt construction.&lt;/p&gt;

&lt;p&gt;The format is opt-in and the default is unchanged. If you never set &lt;code&gt;contextFormat&lt;/code&gt;, nothing changes. The pluggable design means future formats can be added -- a custom &lt;code&gt;ContextFormatter&lt;/code&gt; implementation for a domain-specific format, or a future format optimized for a specific model family -- without changing the public API.&lt;/p&gt;




&lt;p&gt;The guide at &lt;a href="https://agentensemble.net/guides/toon-format/" rel="noopener noreferrer"&gt;agentensemble.net/guides/toon-format/&lt;/a&gt; covers setup, configuration, and usage patterns. The design document at &lt;a href="https://agentensemble.net/design/toon-context-format/" rel="noopener noreferrer"&gt;agentensemble.net/design/toon-context-format/&lt;/a&gt; covers the architectural decisions.&lt;/p&gt;

&lt;p&gt;AgentEnsemble is open-source under the MIT license.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Typed Tool Inputs in Java Agent Systems: Records as Contracts</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Sat, 18 Apr 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/typed-tool-inputs-in-java-agent-systems-records-as-contracts-4la8</link>
      <guid>https://dev.to/agentensemble/typed-tool-inputs-in-java-agent-systems-records-as-contracts-4la8</guid>
      <description>&lt;p&gt;There is a design tension at the boundary between an LLM and a tool. The LLM generates a structured JSON call. Your code receives it. Between those two points, something has to parse the JSON, validate the fields, and turn it into something the tool can actually work with.&lt;/p&gt;

&lt;p&gt;Most agent frameworks handle this with a single opaque string. The LLM gets a parameter named &lt;code&gt;input&lt;/code&gt;. The tool receives a raw string. The tool author writes their own JSON parsing. Required-field validation is up to them. If the LLM passes malformed JSON or omits a required field, the tool finds out at runtime.&lt;/p&gt;

&lt;p&gt;The question I kept coming back to was: why doesn't the framework own this? Java has records. Records have components with names and types. Those components can carry annotations. Everything needed to generate a typed JSON Schema, deserialize the call, and validate required fields is already there, waiting to be read.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With Single-String Tool Inputs
&lt;/h2&gt;

&lt;p&gt;The original model is simple and universal. A tool implements one method: &lt;code&gt;execute(String input)&lt;/code&gt;. The LLM is told to format the input as JSON if the tool needs structured data. The tool parses it.&lt;/p&gt;

&lt;p&gt;This works, but it puts a burden in the wrong place. Consider a web search tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WebSearchTool&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractAgentTool&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"web_search"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;description&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"Performs a web search. Input: JSON with 'query' field."&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;protected&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt; &lt;span class="nf"&gt;doExecute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;JsonNode&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readTree&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"query"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;asText&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// NPE if missing&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few problems here. The description must instruct the LLM about the expected JSON format, mixing tool contract with tool documentation. The tool code handles parsing. There is no schema — the LLM has no structured signal about what fields are expected, what their types are, or which are required. A missing &lt;code&gt;query&lt;/code&gt; field produces a null pointer exception rather than a clean validation failure.&lt;/p&gt;

&lt;p&gt;The tool input boundary is where type safety matters most. It is the handoff between model output and application code. A malformed call here does not just fail — it can silently corrupt state or produce errors that are hard to trace back to the model call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Records as Input Contracts
&lt;/h2&gt;

&lt;p&gt;The alternative is to declare the input contract as a Java record. The framework can derive everything it needs from the record's components and their annotations: the JSON Schema for the LLM, the deserialization logic, the required-field validation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@ToolInput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Parameters for a web search"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;WebSearchInput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nd"&gt;@ToolParam&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Search query string, e.g. 'Java 21 virtual threads'"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool then extends &lt;code&gt;AbstractTypedAgentTool&amp;lt;WebSearchInput&amp;gt;&lt;/code&gt; instead of &lt;code&gt;AbstractAgentTool&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WebSearchTool&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractTypedAgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;WebSearchInput&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Class&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;WebSearchInput&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;inputType&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;WebSearchInput&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WebSearchInput&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;trim&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isBlank&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;failure&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Search query must not be blank"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="c1"&gt;// ... perform search&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@ToolParam&lt;/code&gt; annotation marks fields as required by default. Optional fields use &lt;code&gt;@ToolParam(required = false)&lt;/code&gt;. There is no JSON parsing in the tool body. The framework handles it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Framework Does With the Record
&lt;/h2&gt;

&lt;p&gt;At startup, &lt;code&gt;ToolSchemaGenerator&lt;/code&gt; introspects the record class and produces a &lt;code&gt;JsonObjectSchema&lt;/code&gt; for LangChain4j. The mapping covers the types you would expect:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Java type&lt;/th&gt;
&lt;th&gt;JSON Schema type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;String&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;int&lt;/code&gt;, &lt;code&gt;long&lt;/code&gt;, &lt;code&gt;Integer&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;integer&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;double&lt;/code&gt;, &lt;code&gt;float&lt;/code&gt;, &lt;code&gt;BigDecimal&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;number&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;boolean&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;boolean&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;enum&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;string&lt;/code&gt; with &lt;code&gt;enum&lt;/code&gt; constraint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;List&amp;lt;T&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;array&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Map&amp;lt;String, V&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;object&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When the LLM calls the tool, &lt;code&gt;LangChain4jToolAdapter&lt;/code&gt; passes the full JSON arguments to &lt;code&gt;AbstractTypedAgentTool&lt;/code&gt;, which uses &lt;code&gt;ToolInputDeserializer&lt;/code&gt; to deserialize them into a record instance. Required fields that are missing or null produce a clean &lt;code&gt;ToolResult.failure&lt;/code&gt; before the tool body executes.&lt;/p&gt;

&lt;p&gt;The LLM receives a proper multi-parameter schema instead of a single &lt;code&gt;input&lt;/code&gt; string. It can see the field names, their types, and their descriptions directly in the tool specification.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Adapter Path
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;LangChain4jToolAdapter&lt;/code&gt; dispatches based on whether a tool implements &lt;code&gt;TypedAgentTool&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nc"&gt;TypedAgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;?&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;typed&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ToolSchemaGenerator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;generateSchema&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;typed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;inputType&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// original single-parameter schema, fully backward compatible&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JsonObjectSchema&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addStringProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"input"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"The input string for the tool"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"input"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same instanceof check applies at execution time — typed tools receive the full JSON arguments object; legacy tools receive the extracted &lt;code&gt;"input"&lt;/code&gt; field value. No changes required to existing tool implementations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before and After: The Schema Difference
&lt;/h2&gt;

&lt;p&gt;For the legacy &lt;code&gt;WebSearchTool&lt;/code&gt;, the LLM received this schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The input string for the tool"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;TypedAgentTool&lt;/code&gt;, it receives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search query string, e.g. 'Java 21 virtual threads'"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The field is named correctly. The description is the one written for that field. For tools with multiple inputs, each parameter is a separate, named, typed entry in the schema rather than buried inside an opaque string that the tool must document and parse.&lt;/p&gt;




&lt;h2&gt;
  
  
  Migrating a Multi-Field Tool
&lt;/h2&gt;

&lt;p&gt;The pattern is the same regardless of how many fields the record carries. A hypothetical HTTP request tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@ToolInput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Parameters for an HTTP request"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;HttpRequestInput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nd"&gt;@ToolParam&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"The URL to request"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nd"&gt;@ToolParam&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"HTTP method: GET, POST, PUT, DELETE"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nd"&gt;@ToolParam&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Request body (optional)"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HttpRequestTool&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractTypedAgentTool&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;HttpRequestInput&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Class&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;HttpRequestInput&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;inputType&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;HttpRequestInput&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HttpRequestInput&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// url and method are guaranteed non-null; body may be null&lt;/span&gt;
        &lt;span class="c1"&gt;// no parsing, no null checks on deserialization&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;body&lt;/code&gt; field is optional — it will be absent from the &lt;code&gt;required&lt;/code&gt; array in the schema, and will be null if the LLM omits it. &lt;code&gt;url&lt;/code&gt; and &lt;code&gt;method&lt;/code&gt; are required — a call missing either produces a failure before &lt;code&gt;execute&lt;/code&gt; is reached.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;The typed system is opt-in for a reason. There are cases where it does not fit well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Records only.&lt;/strong&gt; The framework introspects Java records specifically. If an existing tool has complex input logic tied to its own parsing, migrating to a record may not be a clean fit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reflection at startup.&lt;/strong&gt; Schema generation uses reflection on the record's components. For a large number of tools, this adds initialization cost. In practice it is small, but it is not zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enum constraints.&lt;/strong&gt; Enum fields generate a &lt;code&gt;string&lt;/code&gt; schema with an &lt;code&gt;enum&lt;/code&gt; list derived from &lt;code&gt;Enum.name()&lt;/code&gt;. This aligns with Jackson's default deserialization behavior. If you have custom serialization names, the generated enum list may not match the LLM's expectations without additional configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not a substitute for description quality.&lt;/strong&gt; The schema tells the LLM the structure. The field descriptions still determine whether the LLM understands how to fill that structure correctly. A typed schema with poor descriptions is only marginally better than a well-documented string input.&lt;/p&gt;

&lt;p&gt;Five of the built-in tools were migrated to the typed system (&lt;code&gt;FileReadTool&lt;/code&gt;, &lt;code&gt;FileWriteTool&lt;/code&gt;, &lt;code&gt;JsonParserTool&lt;/code&gt;, &lt;code&gt;WebSearchTool&lt;/code&gt;, &lt;code&gt;WebScraperTool&lt;/code&gt;). Four were left as legacy tools (&lt;code&gt;CalculatorTool&lt;/code&gt;, &lt;code&gt;DateTimeTool&lt;/code&gt;, &lt;code&gt;HttpAgentTool&lt;/code&gt;, &lt;code&gt;ProcessAgentTool&lt;/code&gt;) — either because their inputs are naturally single-valued, or because the migration cost was not justified by the gain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Takeaway
&lt;/h2&gt;

&lt;p&gt;The tool input boundary is a contract. Java records are a natural way to express that contract, and the language gives you enough reflective machinery to derive everything you need from it without additional code generation or build-time processing.&lt;/p&gt;

&lt;p&gt;The main gain is not ergonomics, though that improves. The main gain is that the LLM receives an accurate schema for the tool it is calling, which reduces hallucinated input formats and makes validation failures deterministic rather than buried in manual parsing code.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;AgentEnsemble&lt;/strong&gt; is a JVM-native framework for building multi-agent systems in Java. The typed tool input system is part of the core library.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guide: &lt;a href="https://agentensemble.net/guides/tools/" rel="noopener noreferrer"&gt;https://agentensemble.net/guides/tools/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://github.com/AgentEnsemble/agentensemble" rel="noopener noreferrer"&gt;https://github.com/AgentEnsemble/agentensemble&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MIT licensed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have been working with agent tool systems and have found a different approach to the input boundary problem — typed DSLs, code generation, something else — I would be interested in the tradeoffs you have seen.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Self-Optimizing Agent Tasks: Persistent Reflection Loops in Java</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Wed, 15 Apr 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/self-optimizing-agent-tasks-persistent-reflection-loops-in-java-2m1d</link>
      <guid>https://dev.to/agentensemble/self-optimizing-agent-tasks-persistent-reflection-loops-in-java-2m1d</guid>
      <description>&lt;p&gt;Task definitions are written at compile time. You describe what the task should do, wire up a model, and run. The prompt stays fixed unless you go back and edit it.&lt;/p&gt;

&lt;p&gt;In practice, you often discover after a few runs that the instructions could be more precise. The LLM misses an edge case you didn't anticipate. The output format drifts in ways you didn't specify. You revise the description, redeploy, and try again.&lt;/p&gt;

&lt;p&gt;The harder version of this problem is: what if the instructions could improve themselves?&lt;/p&gt;

&lt;p&gt;Task reflection is a persistent, automated feedback loop built into the task execution lifecycle. After a task completes successfully -- output accepted, guardrails passed, reviews approved -- an LLM-backed analysis step reviews whether the task's instructions could be improved. Improvements are stored in a &lt;code&gt;ReflectionStore&lt;/code&gt; and injected into the task's prompt on subsequent runs. The original task definition is never modified.&lt;/p&gt;

&lt;p&gt;This post covers how reflection works, what the API looks like, and where the tradeoffs sit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reflection vs Phase Review
&lt;/h2&gt;

&lt;p&gt;These two mechanisms are often confused because both involve quality analysis. The distinction is in scope and timing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Phase Review&lt;/th&gt;
&lt;th&gt;Task Reflection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trigger&lt;/td&gt;
&lt;td&gt;After phase completes&lt;/td&gt;
&lt;td&gt;After task output accepted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scope&lt;/td&gt;
&lt;td&gt;Within a single &lt;code&gt;Ensemble.run()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Across multiple &lt;code&gt;Ensemble.run()&lt;/code&gt; calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Purpose&lt;/td&gt;
&lt;td&gt;Fix inadequate output this run&lt;/td&gt;
&lt;td&gt;Improve instructions for future runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistence&lt;/td&gt;
&lt;td&gt;Transient -- lost after run&lt;/td&gt;
&lt;td&gt;Persistent -- stored between runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Initiated by&lt;/td&gt;
&lt;td&gt;External reviewer&lt;/td&gt;
&lt;td&gt;Automated LLM analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Phase review fixes output within a run. Task reflection improves instructions across runs. They compose: a task can have both.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;Enable reflection with &lt;code&gt;.reflect(true)&lt;/code&gt; on a task, and configure a &lt;code&gt;ReflectionStore&lt;/code&gt; on the ensemble:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ReflectionStore&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;InMemoryReflectionStore&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"research the top 5 trends in cloud-native Java for 2026"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Run 1: no prior reflections; task executes normally&lt;/span&gt;
&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;run1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflectionStore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Reflection fires after run 1 completes; improvements stored in `store`&lt;/span&gt;

&lt;span class="c1"&gt;// Run 2: prior reflections injected into the prompt automatically&lt;/span&gt;
&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;run2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflectionStore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The store is the key. Pass the same store instance across &lt;code&gt;run()&lt;/code&gt; calls and the accumulated reflections persist. The task definition -- the &lt;code&gt;Task&lt;/code&gt; object -- is the same every run. The difference is what the reflection store contributes to the prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Execution Lifecycle
&lt;/h2&gt;

&lt;p&gt;Reflection fires at the end of the task lifecycle, after all other post-processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Task executes (LLM call)
2. Guardrails evaluate output
3. Review gate runs (if configured)
4. Memory scopes write
5. [Reflection] LLM analyzes output; generates improvement; stores in ReflectionStore
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the task fails, guardrails reject the output, or the review gate retries, reflection does not fire. Reflection only fires on a fully accepted output.&lt;/p&gt;

&lt;p&gt;On the next run of the same task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. ReflectionStore loads prior reflections for this task identity
2. Reflections injected into prompt
3. Task executes with improved instructions
4. [Reflection] New analysis fires; improvement stored
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The task is identified by a &lt;code&gt;TaskIdentity&lt;/code&gt; derived from its description. Two tasks with the same description share the same reflection history.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Gets Injected
&lt;/h2&gt;

&lt;p&gt;The reflection store contributes an additional section to the task's prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;[original task description]

&lt;span class="gu"&gt;## Instruction Refinements&lt;/span&gt;

Based on previous runs, the following refinements have been found to improve output quality:
&lt;span class="p"&gt;-&lt;/span&gt; Be specific about the time range: results should cover events within the last 12 months only.
&lt;span class="p"&gt;-&lt;/span&gt; Structure the output as a numbered list with a one-sentence summary per trend.
&lt;span class="p"&gt;-&lt;/span&gt; For each trend, cite at least one concrete project or company as evidence.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The injection is additive. Original instructions are preserved. Reflections narrow, clarify, or extend them based on what previous outputs revealed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reflection Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bounding the History
&lt;/h3&gt;

&lt;p&gt;By default, the framework uses all stored reflections for a task. You can bound the number injected via &lt;code&gt;ReflectionConfig&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"analyze customer sentiment from support tickets"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflectionConfig&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ReflectionConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxReflections&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;maxReflections(5)&lt;/code&gt;, only the 5 most recent reflections are injected. Older reflections remain in the store but are not included in the prompt. This prevents prompt bloat as the number of runs grows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom Reflection Strategy
&lt;/h3&gt;

&lt;p&gt;The default strategy uses an LLM call to analyze the task output and generate an improvement. You can substitute a custom &lt;code&gt;ReflectionStrategy&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DomainReflectionStrategy&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ReflectionStrategy&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;reflect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ReflectionInput&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;taskOutput&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="c1"&gt;// custom analysis: check for required sections, format, length&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contains&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"## Summary"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Always include a ## Summary section as the first heading"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="c1"&gt;// no improvement identified this time&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Optional&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;empty&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"write a technical design document"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflectionConfig&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ReflectionConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DomainReflectionStrategy&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A custom strategy can use deterministic rules, call a different model, or apply domain-specific analysis. Returning &lt;code&gt;Optional.empty()&lt;/code&gt; skips storage for that run.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Reflection Store
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;ReflectionStore&lt;/code&gt; is an interface with two methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;ReflectionStore&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;TaskReflection&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TaskIdentity&lt;/span&gt; &lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TaskIdentity&lt;/span&gt; &lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;TaskReflection&lt;/span&gt; &lt;span class="n"&gt;reflection&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;InMemoryReflectionStore&lt;/code&gt; is included for development and testing. It holds reflections in a &lt;code&gt;ConcurrentHashMap&lt;/code&gt; and loses state when the process stops.&lt;/p&gt;

&lt;p&gt;For production, implement &lt;code&gt;ReflectionStore&lt;/code&gt; against whatever persistence layer makes sense for your system -- a relational database, a document store, or a key-value store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JdbcReflectionStore&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ReflectionStore&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;DataSource&lt;/span&gt; &lt;span class="n"&gt;dataSource&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;TaskReflection&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TaskIdentity&lt;/span&gt; &lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// SELECT content FROM task_reflections WHERE task_id = ?&lt;/span&gt;
        &lt;span class="c1"&gt;// ORDER BY created_at DESC LIMIT maxReflections&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TaskIdentity&lt;/span&gt; &lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;TaskReflection&lt;/span&gt; &lt;span class="n"&gt;reflection&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// INSERT INTO task_reflections (task_id, content, created_at) VALUES (?, ?, ?)&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;TaskReflection&lt;/code&gt; record holds the improvement text and a timestamp. &lt;code&gt;TaskIdentity&lt;/code&gt; includes the task description hash used for keying.&lt;/p&gt;




&lt;h2&gt;
  
  
  Disabling Reflection Selectively
&lt;/h2&gt;

&lt;p&gt;Reflection is opt-in per task. Tasks without &lt;code&gt;.reflect(true)&lt;/code&gt; are unaffected even if a &lt;code&gt;ReflectionStore&lt;/code&gt; is configured on the ensemble. You can enable reflection for high-value tasks and leave it off for tasks where the instructions are stable or where the cost of an extra LLM call isn't justified.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;stableDataFetchTask&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// no reflection&lt;/span&gt;
        &lt;span class="n"&gt;evolving&lt;/span&gt; &lt;span class="nc"&gt;AnalysisTask&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;// .reflect(true)&lt;/span&gt;
        &lt;span class="n"&gt;stableFormattingTask&lt;/span&gt;       &lt;span class="c1"&gt;// no reflection&lt;/span&gt;
    &lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reflectionStore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The store is queried and written only for tasks with reflection enabled.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;Reflection adds an LLM call per reflective task per run. For tasks that run thousands of times, this adds up. The cost is bounded if reflections converge -- if the task's instructions become stable after a few runs, reflections may produce no new improvements and &lt;code&gt;Optional.empty()&lt;/code&gt; returns more often.&lt;/p&gt;

&lt;p&gt;Reflections can drift. If a task's purpose changes -- the description is updated, the downstream context changes, the data it processes shifts -- earlier reflections may no longer apply. &lt;code&gt;maxReflections&lt;/code&gt; helps here by aging out old improvements. For significant task changes, clearing the stored reflections for that task is reasonable.&lt;/p&gt;

&lt;p&gt;Reflection is not a substitute for good initial instructions. A task with fundamentally unclear instructions will accumulate reflections that patch around the ambiguity. The better use is to start with reasonable instructions and use reflection to sharpen them in response to real outputs over time.&lt;/p&gt;

&lt;p&gt;The original task definition is never modified. All improvements live in the store. This is a deliberate choice: the source of truth for what a task does remains in code, not in a mutable prompt that silently drifts over time.&lt;/p&gt;




&lt;p&gt;The guide at &lt;a href="https://agentensemble.net/guides/task-reflection/" rel="noopener noreferrer"&gt;agentensemble.net/guides/task-reflection/&lt;/a&gt; covers the full lifecycle, store implementation patterns, and configuration options. The design document at &lt;a href="https://agentensemble.net/design/task-reflection/" rel="noopener noreferrer"&gt;agentensemble.net/design/task-reflection/&lt;/a&gt; covers the architectural decisions behind the feature.&lt;/p&gt;

&lt;p&gt;AgentEnsemble is open-source under the MIT license.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Quality Gates on Agent Pipelines: Phase Review and Feedback Injection in Java</title>
      <dc:creator>mgd43b</dc:creator>
      <pubDate>Sun, 12 Apr 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/agentensemble/quality-gates-on-agent-pipelines-phase-review-and-feedback-injection-in-java-g0k</link>
      <guid>https://dev.to/agentensemble/quality-gates-on-agent-pipelines-phase-review-and-feedback-injection-in-java-g0k</guid>
      <description>&lt;p&gt;Most agent pipelines treat quality as a post-run concern. The pipeline runs, you look at the output, you decide if it's acceptable. If not, you re-run the whole thing or manually patch the result. That approach gets harder to sustain as pipelines grow in complexity and the cost of a bad output increases.&lt;/p&gt;

&lt;p&gt;The question worth asking is: where in the pipeline should quality enforcement sit? And when a phase produces inadequate output, how should the feedback get back to the tasks responsible?&lt;/p&gt;

&lt;p&gt;&lt;code&gt;PhaseReview&lt;/code&gt; answers both questions. It attaches a quality gate to any phase, fires after the phase completes, and based on the reviewer's decision either approves the output, triggers a retry with injected feedback, pushes the work back to a predecessor phase, or rejects the pipeline entirely. The review task is itself a first-class AgentEnsemble task -- AI-backed, deterministic handler, or human-in-the-loop -- and the framework handles feedback injection and retry orchestration automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Attaching a Review to a Phase
&lt;/h2&gt;

&lt;p&gt;The minimal setup attaches a reviewer task to a phase via &lt;code&gt;PhaseReview.of(reviewTask)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;draftReport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"draft a summary report of the quarterly results"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;reviewReport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""
        Review the report for completeness and accuracy.
        Output exactly one of:
        APPROVE
        RETRY:&amp;lt;specific feedback for the author&amp;gt;
        REJECT:&amp;lt;reason&amp;gt;
        """&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draftReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Phase&lt;/span&gt; &lt;span class="n"&gt;reporting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Phase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"reporting"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draftReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PhaseReview&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reviewReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;phases&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reporting&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The review task reads the draft output via &lt;code&gt;.context(List.of(draftReport))&lt;/code&gt;. After the phase tasks complete, the framework runs the reviewer, parses its decision, and acts on it. &lt;code&gt;draftReport&lt;/code&gt; is retried with feedback. The reviewer never appears in the phase task list -- it runs as a gate after the phase completes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Decisions
&lt;/h2&gt;

&lt;p&gt;A review task produces one of four outcomes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;APPROVE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Phase output is accepted; pipeline continues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RETRY:&amp;lt;feedback&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Phase tasks are re-run; feedback injected into prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RETRY_PREDECESSOR:&amp;lt;feedback&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Predecessor phase re-runs first, then this phase re-runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;REJECT:&amp;lt;reason&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pipeline fails with a &lt;code&gt;PhaseReviewRejectionException&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The APPROVE/RETRY/REJECT parsing is handled by the framework. RETRY_PREDECESSOR is useful when the inadequate output in the current phase is caused by incomplete or incorrect work in an earlier phase.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feedback Injection
&lt;/h2&gt;

&lt;p&gt;When a retry is triggered, the framework injects reviewer feedback directly into the prompt of the tasks being retried. The injected section looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;[original task instructions here]

&lt;span class="gu"&gt;## Revision Instructions (Attempt 2)&lt;/span&gt;

The report is missing Q3 comparisons and does not address margin compression. Expand the analysis section with specific numbers.

Previous output:
[output from attempt 1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The task sees the original instructions, the feedback, and its prior output. No changes to the task definition are required. The injection happens entirely in the prompt construction layer.&lt;/p&gt;

&lt;p&gt;For attempt 3 and beyond, the section header updates to &lt;code&gt;Attempt 3&lt;/code&gt;, and both the latest feedback and the most recent prior output are included.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reviewer Types
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AI Reviewer
&lt;/h3&gt;

&lt;p&gt;The reviewer task is an AI-backed agent that reads the phase output via &lt;code&gt;context()&lt;/code&gt; and generates a decision:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;aiReviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""
        You are a quality reviewer for financial reports.
        Read the draft report provided in context.
        Check for: completeness, accurate numbers, professional tone.
        Output exactly: APPROVE, RETRY:&amp;lt;specific instructions&amp;gt;, or REJECT:&amp;lt;reason&amp;gt;.
        """&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draftReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reviewer's LLM call is separate from the main phase tasks. You can use a different model for the reviewer if appropriate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deterministic Reviewer
&lt;/h3&gt;

&lt;p&gt;For rule-based quality checks, a deterministic handler reads the phase output and returns a decision string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;qualityCheck&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"programmatic-quality-check"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draftReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contextOutputs&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getRaw&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"RETRY: output is too short -- expand each section to at least two paragraphs"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contains&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Q3"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contains&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Q4"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"RETRY: report must include both Q3 and Q4 data"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ToolResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"APPROVE"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deterministic reviewers are useful when the quality criteria are precise and don't require LLM judgment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Human Reviewer
&lt;/h3&gt;

&lt;p&gt;For steps that need a human sign-off, the human review API blocks until the reviewer responds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;humanGate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"human-review-gate"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draftReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Review&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The human reviewer sees the task output and enters APPROVE, RETRY with feedback, or REJECT in the console (or a custom reviewer UI). This integrates with the same retry loop.&lt;/p&gt;




&lt;h2&gt;
  
  
  Controlling Retry Limits
&lt;/h2&gt;

&lt;p&gt;By default, &lt;code&gt;PhaseReview.of(reviewTask)&lt;/code&gt; allows up to 2 self-retries and 2 predecessor retries. Both are configurable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Phase&lt;/span&gt; &lt;span class="n"&gt;reporting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Phase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"reporting"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draftReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PhaseReview&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reviewReport&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with the builder for full control:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Phase&lt;/span&gt; &lt;span class="n"&gt;reporting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Phase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"reporting"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draftReport&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PhaseReview&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reviewReport&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxRetries&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxPredecessorRetries&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the retry limit is exhausted, the framework treats the phase as failed and throws. The pipeline does not silently accept a low-quality output when retries are exhausted.&lt;/p&gt;




&lt;h2&gt;
  
  
  Predecessor Retry
&lt;/h2&gt;

&lt;p&gt;When the reviewer determines that the current phase's inadequate output is caused by problems in an earlier phase, it can request a predecessor retry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;analysisReviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""
        Review the analysis. If the data from the ingestion phase appears incomplete or incorrect,
        output: RETRY_PREDECESSOR:&amp;lt;what needs to be fixed in data ingestion&amp;gt;
        If the analysis itself is the problem, output: RETRY:&amp;lt;what needs to be revised&amp;gt;
        If acceptable, output: APPROVE
        """&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysisTask&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Phase&lt;/span&gt; &lt;span class="n"&gt;ingestion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Phase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ingestion"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingestTask&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Phase&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Phase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"analysis"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysisTask&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;after&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingestion&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PhaseReview&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysisReviewer&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the reviewer outputs &lt;code&gt;RETRY_PREDECESSOR:&lt;/code&gt;, the framework re-runs the ingestion phase with the feedback injected into ingestion tasks, then re-runs the analysis phase. The review fires again after the second analysis completes.&lt;/p&gt;

&lt;p&gt;The predecessor is the phase declared in &lt;code&gt;.after()&lt;/code&gt;. Predecessor retry does not cascade further back automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Accessing Review Results
&lt;/h2&gt;

&lt;p&gt;After the ensemble completes, review results are available in the phase output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EnsembleOutput&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ensemble&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;PhaseOutput&lt;/span&gt; &lt;span class="n"&gt;reportingOut&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getPhaseOutputs&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"reporting"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;ReviewRecord&lt;/span&gt; &lt;span class="n"&gt;reviewRecord&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reportingOut&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reviewRecord&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Attempts: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;reviewRecord&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;attemptCount&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Decision: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;reviewRecord&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;finalDecision&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful for logging, auditing, or downstream branching based on whether the output was approved on the first attempt or required iterations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;Review tasks add LLM calls to the pipeline. For AI reviewers, each retry cycle adds a reviewer call plus the retried task calls. In cost-sensitive pipelines, deterministic reviewers can enforce the most common criteria without adding model calls.&lt;/p&gt;

&lt;p&gt;Feedback injection is prompt-based. Tasks see the feedback as text in their prompt, not as a structured signal. The quality of the retry depends on how well the reviewer communicates what needs to change.&lt;/p&gt;

&lt;p&gt;Predecessor retry re-runs the entire predecessor phase, not individual tasks. If the predecessor phase is expensive, predecessor retries can be costly. Design predecessor phases with this in mind if predecessor retry is expected.&lt;/p&gt;

&lt;p&gt;The review task reads phase outputs via &lt;code&gt;context()&lt;/code&gt; declarations -- the same mechanism any task uses. Context resolution works across retries; the framework rebuilds task identity consistently so the reviewer always reads the most recent attempt's output.&lt;/p&gt;




&lt;p&gt;The guide at &lt;a href="https://agentensemble.net/guides/phase-review/" rel="noopener noreferrer"&gt;agentensemble.net/guides/phase-review/&lt;/a&gt; covers the full API including custom reviewer implementations and review event callbacks. The &lt;a href="https://github.com/AgentEnsemble/agentensemble/blob/main/agentensemble-examples/src/main/java/net/agentensemble/examples/PhaseReviewExample.java" rel="noopener noreferrer"&gt;example source&lt;/a&gt; is runnable from the repository.&lt;/p&gt;

&lt;p&gt;AgentEnsemble is open-source under the MIT license.&lt;/p&gt;

</description>
      <category>java</category>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
