<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maisie Ouyang</title>
    <description>The latest articles on DEV Community by Maisie Ouyang (@maisie_oy).</description>
    <link>https://dev.to/maisie_oy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3977145%2F1f5b09ce-e5e4-416a-9690-e60a760cc215.jpeg</url>
      <title>DEV Community: Maisie Ouyang</title>
      <link>https://dev.to/maisie_oy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maisie_oy"/>
    <language>en</language>
    <item>
      <title>From Agent Loop to Durable Execution: An Architecture Guide for Production Agents</title>
      <dc:creator>Maisie Ouyang</dc:creator>
      <pubDate>Wed, 10 Jun 2026 07:35:19 +0000</pubDate>
      <link>https://dev.to/maisie_oy/from-agent-loop-to-durable-execution-an-architecture-guide-for-production-agents-3fpp</link>
      <guid>https://dev.to/maisie_oy/from-agent-loop-to-durable-execution-an-architecture-guide-for-production-agents-3fpp</guid>
      <description>&lt;p&gt;Agent frameworks like Claude SDK, LangGraph, and Strands already give us a lot: the Agent Loop, error handling, task dispatch, memory management, and engineering scaffolding for getting agents to production quickly.&lt;/p&gt;

&lt;p&gt;So when I first encountered orchestration engines like Temporal and Lambda Durable Functions in production architectures, my reaction was: &lt;strong&gt;why do we need yet another layer?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After researching multiple implementations and reading through source code, I realized the confusion stems from three questions that rarely get addressed together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Frameworks already handle the Agent Loop and basic production needs. Why add orchestration on top?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;With orchestration managing execution, is the result still an "Agent" or just a "Workflow"?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What does the orchestration engine actually do — what problems does it solve that frameworks don't?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This post is my attempt to untangle these. Chapter 1 addresses questions 1 and 2; Chapter 2 addresses question 3 with concrete scenarios.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chapter 1: Agent Loop and Agent Orchestration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 Three Concepts: Agent Loop, Agent Framework, and Orchestration Engine
&lt;/h3&gt;

&lt;p&gt;To answer question 1, we first need to distinguish three concepts that often get conflated:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Loop&lt;/strong&gt; — application-level reasoning logic. How the agent selects tools, manages context, decides whether to plan or act, and when to stop. This is what makes an agent an agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Framework&lt;/strong&gt; (Claude SDK, LangGraph, Strands) — application-level engineering layer. Wraps the Agent Loop with production-ready scaffolding: error handling, memory ingestion, tool registration, logging, tracing, multi-agent task dispatch. Its value is acceleration — turning an Agent Loop into something deployable fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestration Engine&lt;/strong&gt; (Temporal, Lambda Durable Functions) — infrastructure layer. Externalizes execution state to solve problems that emerge at production scale: long waits (days for human approval), crash recovery across process boundaries, distributed coordination between agents, and fine-grained retry policies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│  Application Logic: Agent Loop                           │
│  "How the agent thinks and acts"                         │
│  LLM reasoning → select tool → execute → observe → loop │
│  Implemented by: developer-defined logic                 │
├─────────────────────────────────────────────────────────┤
│  Application Engineering: Agent Framework                │
│  "Get this agent to production fast"                     │
│  Error handling / memory / tracing / multi-agent dispatch│
│  Implemented by: Claude SDK / LangGraph / Strands        │
└─────────────────────────────────────────────────────────┘
                        ↕ complementary layers
┌─────────────────────────────────────────────────────────┐
│  Infrastructure: Orchestration Engine                    │
│  "Reliably run this agent at production scale"           │
│  State persistence / crash recovery / long waits (HITL) /│
│  distributed coordination / fine-grained retry           │
│  Implemented by: Temporal / Lambda Durable Functions     │
└─────────────────────────────────────────────────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Frameworks simplify development and accelerate time-to-production; orchestration engines ensure stability and reliability at scale. They are complementary — not competing.&lt;/p&gt;

&lt;p&gt;Why do agents specifically need this infrastructure layer? As Temporal's blog puts it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"AI applications and agents are distributed systems. I even suggest they are &lt;strong&gt;distributed systems on steroids&lt;/strong&gt; because your app may end up making an order of magnitude more remote requests to fulfill a user experience." — &lt;a href="https://temporal.io/blog/durable-execution-meets-ai-why-temporal-is-the-perfect-foundation-for-ai" rel="noopener noreferrer"&gt;Temporal Blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A single agent task might involve 10+ LLM calls, external APIs, tool executions, and human approval waits spanning days. All can fail, timeout, or be interrupted. Framework-level error handling covers basic retries, but it can't persist state across process boundaries, survive pod evictions, or coordinate multi-agent workflows over days. This is why we need orchestration engines — they fill the reliability gap that frameworks alone cannot close.&lt;/p&gt;




&lt;h3&gt;
  
  
  1.2 Does Orchestration Kill Agent Autonomy?
&lt;/h3&gt;

&lt;p&gt;When I first encountered orchestration engines, before fully understanding how they work, my immediate worry was: &lt;strong&gt;does orchestration mean the agent loses its autonomy?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think about what makes an agent an agent. Consider a PPT-generation Agent: you give it tools (create slides, search images, write text) and a goal ("make a presentation about Q2 results"). The agent autonomously decides its own plan — maybe it drafts an outline first, then generates content slide by slide, iterating and revising as it goes. The tool selection, execution order, and loop count are all determined by the LLM in real-time. This autonomous decision-making is the essence of "agent."&lt;/p&gt;

&lt;p&gt;Now introduce an orchestration engine. Does that mean the agent's decision-making has been pre-orchestrated — each step defined in advance, like "Step 1: generate outline → Step 2: write content → Step 3: produce file"? If so, the agent has lost its autonomy — what we have is no longer an "Agent" but a pre-defined "Workflow." &lt;strong&gt;Is the orchestrated result still an Agent, or has it become a Workflow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After studying multiple production implementations and reading through their source code, I found three patterns. My conclusion: &lt;strong&gt;orchestration does NOT have to kill agent autonomy. In all three patterns, the agent can retain full decision-making capability.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffq2b4una1s2rzvdxc235.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffq2b4una1s2rzvdxc235.png" alt="Three orchestration patterns" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Pattern A — External Orchestration (Agent Loop as Black Box)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most intuitive pattern. The orchestration layer treats each Agent invocation as a single, opaque step. It manages the &lt;em&gt;flow between&lt;/em&gt; agents but never looks inside.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Orchestration Layer (Temporal / Lambda DF)
  Step 1: Invoke Budget Agent  
    [Agent Loop (black box): LLM→Tool→LLM→Done]
  Step 2: Wait for user approval (Signal/waitForCallback)
  Step 3: Invoke Analysis Agent  
    [Agent Loop (black box)]
  Step 4: Merge outputs

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step internally runs a complete Agent Loop — the agent decides what tools to call, how many iterations to run, when to stop. The orchestration layer only sees "Budget Agent started → Budget Agent finished." It manages sequencing, retry on failure, and the human-in-the-loop wait between steps.&lt;/p&gt;

&lt;p&gt;Agent autonomy is fully preserved because orchestration operates entirely outside the reasoning loop. The trade-off is observability — it's coarse-grained: you can see "Agent-1 started → 5.1s → success," but you can't see what happened inside that 5.1s.&lt;/p&gt;

&lt;p&gt;This pattern has a concrete public example. The &lt;a href="https://aws.amazon.com/blogs/apn/how-temporal-uses-amazon-bedrock-agentcore-to-create-robust-ai-systems/" rel="noopener noreferrer"&gt;AWS APN Blog&lt;/a&gt; published a multi-agent financial advisor built with AgentCore + Temporal. In the original source code, the architecture looks like this — an Orchestrator Agent dispatching tasks to specialist agents (Budget Agent, Financial Analysis Agent), each running its own complete Agent Loop:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2ca6teadqwolt8ekzaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2ca6teadqwolt8ekzaf.png" alt="Original multi-agent financial advisory system architecture" width="799" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The blog author then refactored this into a Temporal-orchestrated version, where each agent invocation becomes a Temporal Activity — the entire Agent Loop wrapped as a single durable step:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpexodkb1ixacuakkhxxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpexodkb1ixacuakkhxxc.png" alt="Multi-agent financial advisory system architecture with Temporal" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is exactly Pattern A: orchestration manages the flow between agents (sequencing, retry, HITL waits), while each agent internally retains full autonomous decision-making.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Pattern B — Internal Orchestration (Every Step Visible)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pattern B fuses the Agent Loop with the orchestration engine — every LLM inference and every tool call is wrapped as an independent orchestration step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─ Orchestration Layer ──────────────────────────────────────────┐
│  Step 1: LLM inference → returns "call calculate_budget"       │
│  Step 2: Execute calculate_budget → {income: 6000}             │
│  Step 3: LLM inference → returns "call create_chart"           │
│  Step 4: Execute create_chart → chart_url                      │
│  Step 5: LLM inference → "Done"                                │
└────────────────────────────────────────────────────────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looking at this, it immediately feels like a pre-defined workflow — step 1, step 2, step 3... in fixed sequence. &lt;strong&gt;Hasn't the agent been reduced to a sequential pipeline? Is this just a Workflow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer is no — because Pattern B can be fully dynamic. Here's what it actually looks like in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Dynamic Pattern B (Lambda Durable Functions)&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;withDurableExecution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}];&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;stepCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;// ← loop count decided by LLM, not pre-defined&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;llmResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`llm-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;stepCount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;invokeBedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hasToolCall&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="s2"&gt;`tool-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;stepCount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// ← step name generated dynamically&lt;/span&gt;
          &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;executeTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolInput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="cm"&gt;/* tool result */&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="nx"&gt;stepCount&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// LLM decides to stop&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;while(true)&lt;/code&gt; loop + dynamically generated step names mean: the LLM retains complete autonomy. It decides which tool to call next, how many iterations to run, and when to stop. Nothing is pre-defined.&lt;/p&gt;

&lt;p&gt;What Pattern B actually does is move the Agent Loop from a black-box inside the framework to an explicit loop written with orchestration APIs. The &lt;em&gt;behavior&lt;/em&gt; is identical to a framework-managed Agent Loop. The &lt;em&gt;infrastructure benefit&lt;/em&gt; is that every step gets checkpointed — if the process crashes after step 3, it resumes from step 4 instead of starting over.&lt;/p&gt;

&lt;p&gt;It's still an Agent. Just a durable one.&lt;/p&gt;

&lt;p&gt;Official examples: &lt;br&gt;
&lt;a href="https://github.com/temporalio/ai-cookbook" rel="noopener noreferrer"&gt;Temporal AI Cookbook&lt;/a&gt; (Agentic Loop with Claude); &lt;br&gt;
community project &lt;a href="https://github.com/temporal-community/temporal-ai-agent" rel="noopener noreferrer"&gt;temporal-ai-agent&lt;/a&gt; (545 Stars).&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Pattern 3 — Hybrid (Composite)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Production systems often mix both patterns in the same workflow: black-box agent calls (Pattern A) for specialist sub-agents + fine-grained steps (Pattern B) for critical operations + HITL waits. This is the "Workflow-orchestrated Agent" pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Temporal Workflow (Composite Pattern)

┌─ Step 1: call_llm() — Initial analysis ─────────────────┐
│  Fine-grained (Pattern B)                                │
└──────────────────────────────────────────────────────────┘
                            ↓
┌─ Step 2: invoke_agent() — Call specialist Agent ─────────┐
│  Black-box (Pattern A) — Agent runs full Loop internally │
└──────────────────────────────────────────────────────────┘
                            ↓
┌─ Step 3: wait_signal("approval") ────────────────────────┐
│  HITL pause, zero resources, can wait days               │
└──────────────────────────────────────────────────────────┘
                            ↓
┌─ Step 4: while loop — Fine-grained execution ────────────┐
│  Every LLM/Tool Call = independent Activity              │
│  Pattern B, individually retryable and auditable         │
└──────────────────────────────────────────────────────────┘
                            ↓
┌─ Step 5: send_notification() — Deterministic step ───────┐
└──────────────────────────────────────────────────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Comparison and Decision Guide
&lt;/h3&gt;

&lt;p&gt;Having walked through all three patterns, here's how they compare across key dimensions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Pattern A (External)&lt;/th&gt;
&lt;th&gt;Pattern B (Internal)&lt;/th&gt;
&lt;th&gt;Hybrid&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent autonomy&lt;/td&gt;
&lt;td&gt;✔ Fully preserved&lt;/td&gt;
&lt;td&gt;✔ Preserved (dynamic)&lt;/td&gt;
&lt;td&gt;✔ Autonomy + controlled checkpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Coarse (agent-level)&lt;/td&gt;
&lt;td&gt;Fine (tool-call-level)&lt;/td&gt;
&lt;td&gt;Adjustable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fault tolerance&lt;/td&gt;
&lt;td&gt;Agent-level retry&lt;/td&gt;
&lt;td&gt;Tool-call-level retry&lt;/td&gt;
&lt;td&gt;Activity-level retry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Low (add decorator)&lt;/td&gt;
&lt;td&gt;High (decompose Agent Loop)&lt;/td&gt;
&lt;td&gt;Medium-High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flexibility&lt;/td&gt;
&lt;td&gt;High (agent self-adapts)&lt;/td&gt;
&lt;td&gt;Medium (depends on code dynamism)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Table 1: Pattern comparison across key dimensions&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In practice, the choice often comes down to your specific scenario. The following table summarizes which pattern fits which situation:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent distributed collaboration&lt;/td&gt;
&lt;td&gt;Pattern A&lt;/td&gt;
&lt;td&gt;Agents deploy independently; orchestration handles coordination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance audit every tool call&lt;/td&gt;
&lt;td&gt;Pattern B&lt;/td&gt;
&lt;td&gt;Tool-level visibility; every operation recorded and queryable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-risk tools (payment/deletion)&lt;/td&gt;
&lt;td&gt;Pattern B (partial)&lt;/td&gt;
&lt;td&gt;Only wrap critical steps to reduce overall complexity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quick prototype / PoC&lt;/td&gt;
&lt;td&gt;No orchestration&lt;/td&gt;
&lt;td&gt;Premature introduction adds complexity before value is validated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production general purpose&lt;/td&gt;
&lt;td&gt;Hybrid&lt;/td&gt;
&lt;td&gt;Balance reliability + flexibility; agents retain autonomy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Table 2: Pattern selection guide by scenario&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Of course, real-world systems rarely fit neatly into one box. Many production deployments evolve — starting with Pattern A for simplicity, then gradually wrapping critical steps with Pattern B as compliance or reliability requirements grow. The right pattern depends on your team's maturity, regulatory context, and where failures actually hurt.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chapter 2: Core Mechanism of the Orchestration Engine
&lt;/h2&gt;

&lt;p&gt;Chapter 1 established &lt;em&gt;why&lt;/em&gt; we need orchestration and &lt;em&gt;how&lt;/em&gt; it integrates with agents without killing autonomy. But what does the orchestration engine actually &lt;em&gt;do&lt;/em&gt; at the infrastructure level? This chapter answers question 3.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step → Checkpoint → Replay
&lt;/h3&gt;

&lt;p&gt;Regardless of which pattern you choose, all orchestration engines solve problems through the same fundamental mechanism: &lt;strong&gt;record each step's result in storage external to the process; on interruption, resume from records rather than re-executing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;① Step (mark key operations)&lt;/strong&gt; — Developers use &lt;code&gt;context.step()&lt;/code&gt; or Activity to mark "meaningful operations": each LLM call, each tool execution, each external API request. This tells the engine: "this step's result is worth remembering."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;② Checkpoint (automatic persistence)&lt;/strong&gt; — After each step completes, the engine automatically writes the result to persistent storage external to the process (Temporal: Event History; Lambda DF: Checkpoint Store). Developers write zero checkpoint code — the engine handles it at the infrastructure level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;③ Replay (deterministic recovery)&lt;/strong&gt; — After interruption (crash/pause/timeout): the engine replays the code, returns cached results for completed steps without re-executing them, skips all completed steps, and continues from the first incomplete step. Side effects don't repeat. Zero extra tokens.&lt;/p&gt;

&lt;p&gt;Now that we understand the core mechanism, let's look at five production scenarios to see how orchestration engines deliver reliability guarantees in practice.&lt;/p&gt;




&lt;h3&gt;
  
  
  Scenario 1: Crash Recovery — Don't Start Over
&lt;/h3&gt;

&lt;p&gt;Imagine that an agent ran for 20 minutes, completed 8 steps, and crashed on step 9.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without orchestration:&lt;/strong&gt; You have three options: (1) start over from zero (re-call every LLM, re-execute every tool, re-pay all tokens), (2) try reloading from trace data and re-construct context, hoping the LLM picks up correctly, or (3) build your own checkpoint logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With orchestration:&lt;/strong&gt; Every &lt;code&gt;context.step()&lt;/code&gt; is a checkpoint. Step 9 crashes → steps 1-8 results read directly from Event History → no LLM calls, no API calls → continues directly from step 9. Already-spent tokens are not re-spent; side effects don't repeat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How this differs from "save trace and reload context":&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some teams attempt crash recovery by saving the agent's conversation history and feeding it back as context on restart. The fundamental difference: Checkpoint + Replay is an infrastructure-layer mechanism — like a database's Write-Ahead Log or a game's save point, process-independent and business-logic-independent. Reloading context is an application-layer mechanism — like human "recall," dependent on trace completeness and LLM comprehension. This manifests in two concrete ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Zero extra LLM tokens&lt;/strong&gt; — Checkpoint + Replay returns cached results directly for completed steps without re-feeding history to the LLM. Reloading context requires sending the entire conversation back as input, consuming tokens proportional to history length.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code-level guarantee against re-execution&lt;/strong&gt; — The orchestration engine guarantees at the infrastructure layer: completed steps return cached values, function bodies never re-execute. This is deterministic and independent of LLM judgment. Reloading context depends on the LLM correctly understanding history and "choosing not to repeat" — works most of the time, but isn't a guarantee.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Scenario 2: Long-Running Tasks — Surviving Beyond Process Lifecycle
&lt;/h3&gt;

&lt;p&gt;Agents can run in various runtime environments — Lambda, EKS Pods, AgentCore, etc. But regardless of which runtime you choose, they all have lifecycle constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda: max 15 minutes, forced termination on timeout&lt;/li&gt;
&lt;li&gt;EKS Pod: can be evicted anytime (Spot instance reclaim, scaling, OOM)&lt;/li&gt;
&lt;li&gt;AgentCore: max 8 hours, idle 15 minutes auto-terminates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yet agent tasks can span days or weeks — continuous monitoring, multi-round approvals, complex research. If the process dies, the agent task dies with it. The solution is state externalization: the process is just a temporary carrier, while execution state and checkpoints persist at the infrastructure layer via the orchestration engine — independent of any single process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Temporal Server / Lambda DF Backend (permanent state storage)

Worker A (Pod 1)              Worker B (Pod 2)              Worker C (Pod 3)
Execute Steps 1-3     →      Replay Steps 1-3 (cache)  →   Replay (cache)
Results report to Server      Continue Steps 4-6            Continue Step 7 → Done ✔
⚡ Pod evicted                ⏸ HITL pause (wait 3 days)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution state lives in the server, not in any process. Processes are fungible workers — any worker can pick up any workflow by replaying from its checkpoints. A workflow can survive unlimited pod restarts, Lambda timeouts, and infrastructure disruptions.&lt;/p&gt;




&lt;h3&gt;
  
  
  Scenario 3: Human-in-the-Loop — Wait Without Burning Resources
&lt;/h3&gt;

&lt;p&gt;Many agent workflows require human approval at critical points — a manager sign-off before executing a payment, a compliance review before publishing content, or a multi-round approval chain that takes days. The challenge: how does the agent "pause" and wait without wasting resources or risking process death?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without orchestration:&lt;/strong&gt; Either keep the process alive waiting (wasting compute, risking being killed by timeout), or build your own "pause → serialize state → persist → wait for callback → deserialize → restore" logic from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With orchestration:&lt;/strong&gt; The engine provides a &lt;code&gt;waitForCallback&lt;/code&gt; primitive. When the agent reaches a point requiring approval, it calls this function — the current process immediately ends (zero compute cost), the manager is notified, and a configurable timeout is set (e.g., 7 days). The execution state persists externally. When the manager clicks "approve," the engine replays the code from the beginning, skips completed steps from cache, and resumes precisely at the approval point.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent executing Steps 1-3...
        │
        ▼
Step 4: waitForCallback("manager-approval")
        │
        ├──→ Process ends (zero cost)
        │    Manager notified, timeout: 7 days
        │
        │    ... 3 days pass ...
        │
        ├──← Manager clicks "Approve"
        │
        ▼
Engine replays → Steps 1-4 from cache → Continue Step 5

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The wait can last days or months with zero resource consumption — no thread held, no compute billed.&lt;/p&gt;




&lt;h3&gt;
  
  
  Scenario 4: Stability — Retry + Error Isolation
&lt;/h3&gt;

&lt;p&gt;Agent tasks involve many external calls — LLM APIs, third-party services, databases. Transient failures (API timeouts, rate limits, network jitter) are inevitable at scale. The question is: when one step fails, does it take down the entire agent run?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without orchestration:&lt;/strong&gt; You need to manually implement retry logic for each call: is-it-retryable checks, backoff strategies, max attempt counts, and whether context needs reloading after failure. This means scattering error-handling code across every tool function, every API wrapper, and every agent step — mixing infrastructure concerns into business logic. Without this tedious work, a single transient error kills the entire agent run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With orchestration:&lt;/strong&gt; Each step can be configured with a declarative retry policy (e.g., max 3 attempts, exponential backoff). If a tool call fails transiently, the engine automatically retries that specific step — without re-executing any previously completed steps, and without consuming additional LLM tokens for the unaffected portions. More importantly, error handling is unified at the infrastructure layer: you define retry policies once, declaratively, rather than writing try/catch logic in every function. Your business code stays focused on what the agent actually does, not on how to survive failures.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1 ✔  →  Step 2 ✔  →  Step 3 ✘↻↻✔  →  Step 4 ✔  →  Done

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 3 fails then succeeds after 2 retries. Steps 1-2 are completely unaffected. Each step's retry policy is independent — one failure doesn't cascade. The policy is declarative configuration, not code mixed into business logic.&lt;/p&gt;




&lt;h3&gt;
  
  
  Scenario 5: Observability — Event History + Visibility + LLM Tracing
&lt;/h3&gt;

&lt;p&gt;Most teams today rely on tracing tools (Langfuse, Braintrust) for agent observability — tracking prompts, completions, token usage, and latency. This is valuable, but in my view it's only half the picture. Production agent observability should have two complementary layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Execution-level observability — from the orchestration engine (Event History + Visibility)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Event History records every step's start/complete/fail/retry, Signal send/receive, timing, and return values — a complete audit trail of execution flow. On top of this, a Visibility layer provides SQL-like queries across all workflows (e.g., "find all Failed workflows for customer C123"). This tells you &lt;em&gt;what happened&lt;/em&gt; during execution: which steps ran, which failed, how long the wait was, how many retries occurred. (&lt;a href="https://docs.temporal.io/workflow-execution/event" rel="noopener noreferrer"&gt;Temporal Events Docs&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Inference-level observability — from LLM tracing tools (Langfuse/Braintrust)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This records prompt/completion content, token usage + cost, latency, and evaluation scores. This tells you &lt;em&gt;how the agent reasoned&lt;/em&gt;: what it was asked, what it answered, how much it cost, and whether the output quality was good. It does NOT record workflow execution state, retry logic, or Signal handling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connecting the two layers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljzvracalav2ajr1to5j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljzvracalav2ajr1to5j.png" alt="(Source: Langfuse Temporal Integration)" width="799" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The two layers can be connected via OpenTelemetry and shared &lt;code&gt;trace_id&lt;/code&gt;, enabling full-stack agent observability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent output quality is poor → check Langfuse for prompt/completion to debug reasoning&lt;/li&gt;
&lt;li&gt;Agent execution stuck/failed → check Temporal UI for execution state and retry logs&lt;/li&gt;
&lt;li&gt;Jump between them via trace correlation — from Temporal's Activity directly into Langfuse to see the specific LLM call content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(&lt;a href="https://langfuse.com/integrations/frameworks/temporal" rel="noopener noreferrer"&gt;Source: Langfuse Temporal Integration&lt;/a&gt;)&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Let me circle back to the three questions that started this research:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Why orchestration on top of frameworks?&lt;/strong&gt; — Because they operate at different layers. The Agent Loop defines reasoning logic; frameworks simplify development; orchestration ensures reliability at scale. They complement, not compete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent or Workflow?&lt;/strong&gt; — Orchestration does not kill agent autonomy. Whether Pattern A, B, or Hybrid, the LLM retains full decision-making. Orchestration provides durability, not rigidity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What does orchestration actually do?&lt;/strong&gt; — Through Step → Checkpoint → Replay, it delivers crash recovery, long-running execution, zero-cost human waits, failure isolation, and multi-layer observability.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Not every agent needs orchestration. But if yours coordinates multiple specialists, waits for human decisions, or simply needs to survive beyond a single process — it's worth understanding deeply.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Temporal Blog — &lt;em&gt;Durable Execution Meets AI: Why Temporal Is the Perfect Foundation for AI&lt;/em&gt;. &lt;a href="https://temporal.io/blog/durable-execution-meets-ai-why-temporal-is-the-perfect-foundation-for-ai" rel="noopener noreferrer"&gt;temporal.io/blog&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS APN Blog — &lt;em&gt;How Temporal Uses Amazon Bedrock AgentCore to Create Robust AI Systems&lt;/em&gt;. &lt;a href="https://aws.amazon.com/blogs/apn/how-temporal-uses-amazon-bedrock-agentcore-to-create-robust-ai-systems/" rel="noopener noreferrer"&gt;aws.amazon.com/blogs/apn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Temporal Documentation — &lt;em&gt;Workflow Execution Events&lt;/em&gt;. &lt;a href="https://docs.temporal.io/workflow-execution/event" rel="noopener noreferrer"&gt;docs.temporal.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Langfuse — &lt;em&gt;Temporal Integration&lt;/em&gt;. &lt;a href="https://langfuse.com/integrations/frameworks/temporal" rel="noopener noreferrer"&gt;langfuse.com/integrations&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Temporal — &lt;em&gt;AI Cookbook (Agentic Loop with Claude)&lt;/em&gt;. &lt;a href="https://github.com/temporalio/ai-cookbook" rel="noopener noreferrer"&gt;github.com/temporalio/ai-cookbook&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;temporal-community — &lt;em&gt;temporal-ai-agent&lt;/em&gt;. &lt;a href="https://github.com/temporal-community/temporal-ai-agent" rel="noopener noreferrer"&gt;github.com/temporal-community/temporal-ai-agent&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>aws</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
