<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gunnar Grosch</title>
    <description>The latest articles on DEV Community by Gunnar Grosch (@gunnargrosch).</description>
    <link>https://dev.to/gunnargrosch</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F348349%2F213d7254-998a-413f-b7af-c96c087508b3.png</url>
      <title>DEV Community: Gunnar Grosch</title>
      <link>https://dev.to/gunnargrosch</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gunnargrosch"/>
    <language>en</language>
    <item>
      <title>Visualizing AWS Lambda Durable Function Workflows with durable-viz</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Fri, 27 Mar 2026 15:20:58 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/visualizing-aws-lambda-durable-function-workflows-with-durable-viz-1838</link>
      <guid>https://dev.to/gunnargrosch/visualizing-aws-lambda-durable-function-workflows-with-durable-viz-1838</guid>
      <description>&lt;p&gt;If you're new to durable functions, start with the &lt;a href="https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3"&gt;durable functions post&lt;/a&gt; for the core concepts. The short version: your handler re-runs from the beginning on every resume, but completed steps return cached results instantly instead of re-executing. The SDK handles this checkpointing and replay transparently.&lt;/p&gt;

&lt;p&gt;Durable functions encourage you to write sequential code. But the execution flow isn't always sequential. You have parallel branches that fan out and converge. Conditionals that route to callbacks or skip to the end. Invocations that call other Lambda functions. The more primitives you use, the harder it gets to see the full picture just by reading the handler.&lt;/p&gt;

&lt;p&gt;I hit this when building the &lt;a href="https://dev.to/gunnargrosch/multi-agent-systems-on-aws-lambda-with-durable-functions-2gg3"&gt;purchasing coordinator&lt;/a&gt;. Five specialist agents dispatched in parallel, a conditional approval callback, plan and synthesis steps on either side. The code reads top to bottom, but the workflow branches and converges in ways that aren't obvious from the source. Two primitives in particular: &lt;code&gt;context.invoke()&lt;/code&gt; calls another Lambda function with automatic checkpointing (unlike the AWS SDK's &lt;code&gt;lambda.invoke()&lt;/code&gt;, the result is cached so the target function isn't called again on replay), and &lt;code&gt;waitForCallback&lt;/code&gt; suspends the workflow until an external signal arrives. This is how the purchasing coordinator pauses for human approval.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/gunnargrosch/durable-viz" rel="noopener noreferrer"&gt;durable-viz&lt;/a&gt;: a static analysis tool that turns durable function handlers into flowcharts. No deployment, no execution, no AWS credentials. Point it at a source file and it extracts the workflow structure from the code.&lt;/p&gt;

&lt;p&gt;It supports TypeScript, Python, and Java. You can run it as a CLI (Mermaid output, browser, or JSON), or as a VS Code extension with a live diagram panel next to your code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;Run &lt;code&gt;durable-viz&lt;/code&gt; against any file containing a durable function handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz handler.ts &lt;span class="nt"&gt;--open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It parses the file, extracts the durable primitives (&lt;code&gt;step&lt;/code&gt;, &lt;code&gt;parallel&lt;/code&gt;, &lt;code&gt;invoke&lt;/code&gt;, &lt;code&gt;waitForCallback&lt;/code&gt;, conditionals), builds a directed graph, and renders it as a Mermaid flowchart. The &lt;code&gt;--open&lt;/code&gt; flag generates an interactive HTML page with zoom, pan, and PNG export.&lt;/p&gt;

&lt;p&gt;Here's what the purchasing coordinator from the &lt;a href="https://dev.to/gunnargrosch/multi-agent-systems-on-aws-lambda-with-durable-functions-2gg3"&gt;multi-agent post&lt;/a&gt; looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz src/handlers/coordinator.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vwn5ix6wy0tlbtb5lde.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vwn5ix6wy0tlbtb5lde.png" alt="Purchasing coordinator workflow diagram" width="800" height="766"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The four-phase flow is immediately visible: plan, five specialists fanning out from a parallel node, synthesize, and the conditional approval callback gated by a diamond. The "no" branch skips straight to End. Each node shape encodes the primitive type (the &lt;code&gt;--open&lt;/code&gt; browser view and VS Code extension add color coding). These primitives (&lt;code&gt;step&lt;/code&gt;, &lt;code&gt;parallel&lt;/code&gt;, &lt;code&gt;invoke&lt;/code&gt;, &lt;code&gt;waitForCallback&lt;/code&gt;) are the SDK methods that automatically checkpoint their results. On replay, completed primitives return cached results without re-executing. That's what makes them "durable."&lt;/p&gt;

&lt;p&gt;The tool didn't execute the code or read a deployment. It parsed the TypeScript AST, found &lt;code&gt;withDurableExecution()&lt;/code&gt;, walked the handler body, and extracted every durable primitive with its name and structure. The five specialist branches came from resolving the &lt;code&gt;SPECIALISTS&lt;/code&gt; registry object at module scope.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLI
&lt;/h2&gt;

&lt;p&gt;The default output is Mermaid flowchart syntax printed to stdout. Paste it into GitHub Markdown, Notion, Confluence, or any Mermaid-compatible renderer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz handler.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can change the graph direction from top-down to left-right:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz handler.ts &lt;span class="nt"&gt;--direction&lt;/span&gt; LR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--open&lt;/code&gt; flag generates a self-contained HTML page and opens it in your browser. Dark theme, scroll-to-zoom, click-drag panning, and fit-to-view. You can save the diagram as a high-resolution PNG for documentation, pull requests, or presentations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz handler.ts &lt;span class="nt"&gt;--open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For custom tooling, &lt;code&gt;--json&lt;/code&gt; outputs the raw workflow graph (nodes, edges, source line numbers):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz handler.ts &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  VS Code Extension
&lt;/h2&gt;

&lt;p&gt;The extension renders the diagram in a side panel next to your code. Install from the &lt;a href="https://marketplace.visualstudio.com/items?itemName=gunnargrosch.durable-viz" rel="noopener noreferrer"&gt;VS Code Marketplace&lt;/a&gt;, then open a durable function handler and run &lt;strong&gt;Durable Viz: Open Lambda Durable Function Workflow&lt;/strong&gt; from the command palette.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Click-to-navigate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Click any node to jump to that line in the source file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-refresh&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Diagram updates on file save&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Save PNG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Export the diagram as a high-resolution transparent PNG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source view&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;View the raw Mermaid syntax or JSON graph&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The extension supports zoom, pan, direction toggle, and fit-to-view. See the &lt;a href="https://marketplace.visualstudio.com/items?itemName=gunnargrosch.durable-viz" rel="noopener noreferrer"&gt;Marketplace listing&lt;/a&gt; for the full feature list.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Language Support
&lt;/h2&gt;

&lt;p&gt;The tool supports TypeScript/JavaScript, Python, and Java. Each language has its own parser, but the graph model, edge builder, and renderers are shared.&lt;/p&gt;

&lt;h3&gt;
  
  
  TypeScript / JavaScript
&lt;/h3&gt;

&lt;p&gt;Uses &lt;a href="https://github.com/dsherret/ts-morph" rel="noopener noreferrer"&gt;ts-morph&lt;/a&gt; for full AST parsing. This is the most capable parser with two features the others don't have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Function-reference following.&lt;/strong&gt; If your handler calls a helper function that accepts &lt;code&gt;DurableContext&lt;/code&gt;, the parser resolves the call and inlines the helper's durable primitives at the call site. Only works for functions defined in the same file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Registry key resolution.&lt;/strong&gt; For &lt;code&gt;context.parallel()&lt;/code&gt; calls that use &lt;code&gt;.map()&lt;/code&gt; over a module-scope registry object, the parser enumerates the registry keys to show all possible parallel branches. This is how the purchasing coordinator's five specialists appear in the diagram even though the code dispatches them dynamically.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz examples/order_processor.py &lt;span class="nt"&gt;--direction&lt;/span&gt; LR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn527zxi3b39y26ozn6tq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn527zxi3b39y26ozn6tq.png" alt="Python order processor workflow diagram" width="800" height="115"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finds &lt;code&gt;@durable_execution&lt;/code&gt; decorated handlers and extracts &lt;code&gt;context.&amp;lt;method&amp;gt;()&lt;/code&gt; calls. Uses indentation to determine block boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Java (preview)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz Handler.java &lt;span class="nt"&gt;--open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finds classes extending &lt;code&gt;DurableHandler&lt;/code&gt; and extracts &lt;code&gt;ctx.&amp;lt;method&amp;gt;()&lt;/code&gt; calls from the &lt;code&gt;handleRequest&lt;/code&gt; method. Some primitives (&lt;code&gt;parallel&lt;/code&gt;, &lt;code&gt;waitForCallback&lt;/code&gt;, &lt;code&gt;waitForCondition&lt;/code&gt;) are still in development in the Java durable execution SDK.&lt;/p&gt;

&lt;p&gt;Both the Python and Java parsers use regex rather than full AST parsing. This keeps the tool as a single Node.js package without requiring Python or Java parser dependencies. The trade-off: standard single-line call patterns work well, but method calls split across many lines or unusual argument formatting may not be detected. For most idiomatic durable function code, it works without issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Primitives
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Primitive&lt;/th&gt;
&lt;th&gt;TypeScript&lt;/th&gt;
&lt;th&gt;Python&lt;/th&gt;
&lt;th&gt;Java (preview)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Step&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.step()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.step()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ctx.step()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Invoke&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.invoke()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.invoke()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ctx.invoke()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.parallel()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.parallel()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;in development&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Map&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.map()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.map()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;in development&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wait&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.wait()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.wait()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ctx.wait()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wait for Callback&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.waitForCallback()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.wait_for_callback()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;in development&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create Callback&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.createCallback()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.create_callback()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ctx.createCallback()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wait for Condition&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.waitForCondition()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.wait_for_condition()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;in development&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Child Context&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.runInChildContext()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;context.run_in_child_context()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ctx.runInChildContext()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;TypeScript also detects &lt;code&gt;context.promise.all()&lt;/code&gt;, &lt;code&gt;context.promise.any()&lt;/code&gt;, &lt;code&gt;context.promise.race()&lt;/code&gt;, and &lt;code&gt;context.promise.allSettled()&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual encoding
&lt;/h3&gt;

&lt;p&gt;Each primitive type has a distinct shape and color:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Node&lt;/th&gt;
&lt;th&gt;Shape&lt;/th&gt;
&lt;th&gt;Color&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Start / End&lt;/td&gt;
&lt;td&gt;Stadium&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Step&lt;/td&gt;
&lt;td&gt;Rectangle&lt;/td&gt;
&lt;td&gt;Green&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Invoke&lt;/td&gt;
&lt;td&gt;Trapezoid&lt;/td&gt;
&lt;td&gt;Amber&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel / Map&lt;/td&gt;
&lt;td&gt;Hexagon&lt;/td&gt;
&lt;td&gt;Purple&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wait / Callback&lt;/td&gt;
&lt;td&gt;Circle&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Condition&lt;/td&gt;
&lt;td&gt;Diamond&lt;/td&gt;
&lt;td&gt;Indigo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Child Context&lt;/td&gt;
&lt;td&gt;Subroutine&lt;/td&gt;
&lt;td&gt;Teal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The tool performs static analysis on your source file. It never imports, executes, or deploys your code.&lt;/p&gt;

&lt;p&gt;The architecture is a three-stage pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[source file] → Parser → WorkflowGraph → Renderer → [output]
                  │                          │
          TypeScript / Python / Java    Mermaid / JSON
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The parser interface
&lt;/h3&gt;

&lt;p&gt;Adding a new language means implementing two methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Parser&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;extensions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nf"&gt;parseFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;ParseOptions&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;WorkflowGraph&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;extensions&lt;/code&gt; declares which file types the parser handles. &lt;code&gt;parseFile&lt;/code&gt; takes a file path and returns a &lt;code&gt;WorkflowGraph&lt;/code&gt; with nodes, edges, and source line numbers. The dispatcher selects the right parser by file extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  The graph model
&lt;/h3&gt;

&lt;p&gt;The parser produces a &lt;code&gt;WorkflowGraph&lt;/code&gt;: an ordered list of nodes with branches (for parallel blocks) and metadata (for conditionals). Here's a simplified view of what each node looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;WorkflowNode&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;start&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;end&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;step&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;invoke&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;parallel&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;map&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wait&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;waitForCallback&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;condition&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt;  &lt;span class="c1"&gt;// maps to the primitives table above&lt;/span&gt;
  &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;branches&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;WorkflowBranch&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;   &lt;span class="c1"&gt;// for parallel/map nodes&lt;/span&gt;
  &lt;span class="nx"&gt;thenCount&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;            &lt;span class="c1"&gt;// for conditions: nodes in the then-branch&lt;/span&gt;
  &lt;span class="nx"&gt;thenReturns&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;         &lt;span class="c1"&gt;// for conditions: does then-branch return?&lt;/span&gt;
  &lt;span class="nx"&gt;sourceLine&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;           &lt;span class="c1"&gt;// 1-based line number for click-to-navigate&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The edge builder constructs edges from the node list, handling sequential flow, parallel fan-out/fan-in, and conditional routing.&lt;/p&gt;

&lt;p&gt;For conditionals, the parser tracks whether the &lt;code&gt;if&lt;/code&gt; block ends with a &lt;code&gt;return&lt;/code&gt;. If it does, the "yes" branch connects to End instead of falling through. The "no" branch skips the conditional block and continues to the next node.&lt;/p&gt;

&lt;h3&gt;
  
  
  Registry key resolution in practice
&lt;/h3&gt;

&lt;p&gt;Here's a pattern from the purchasing coordinator. Don't worry about the specifics of the durable function code. The key thing is that the parallel branches are built dynamically at runtime from a registry object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SPECIALISTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;price-research&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PRICE_RESEARCH_FUNCTION&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Price Research&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;financing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FINANCING_FUNCTION&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Financing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// ... 3 more&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;specialists&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;specialists&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;.map()&lt;/code&gt; call means the parallel branches are determined at runtime. The parser can't execute the code, but it can look at the &lt;code&gt;SPECIALISTS&lt;/code&gt; object and enumerate its keys. It finds five keys, creates five invoke branches, and labels them with the key names. This is how the diagram shows all five specialists even though the code builds the branch list dynamically.&lt;/p&gt;

&lt;p&gt;If the parser can't resolve the registry (the object is imported from another file, or the pattern doesn't match), it falls back to showing a single representative branch.&lt;/p&gt;

&lt;p&gt;If you point the tool at a file that isn't a durable function handler, or a file with syntax errors, it exits with a clear error message. The VS Code extension shows the error in the webview panel instead of a diagram.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Static analysis over runtime tracing
&lt;/h3&gt;

&lt;p&gt;The main design choice: parse the code, don't execute it. Runtime tracing would give you the actual execution path for a specific input, but it requires deployment, credentials, and a real invocation. Static analysis gives you all possible paths from the source alone. You see every parallel branch, every conditional route, every callback. The trade-off is that dynamic branches (like the specialist &lt;code&gt;.map()&lt;/code&gt;) require heuristics to resolve.&lt;/p&gt;

&lt;p&gt;For documentation and code review, seeing all possible paths is usually more useful than seeing one specific execution. For debugging a specific run, the durable execution history API (&lt;code&gt;get-durable-execution&lt;/code&gt;) is the right tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Language-agnostic graph model
&lt;/h3&gt;

&lt;p&gt;The parsers are language-specific. The graph model, edge builder, and renderers are not. Adding a new language means writing a parser that produces &lt;code&gt;WorkflowGraph&lt;/code&gt; nodes. Everything downstream is shared. The TypeScript parser uses ts-morph for full AST analysis. The Python and Java parsers use regex, which handles standard patterns well but can miss unusual formatting. The regex approach was a deliberate trade-off: full AST parsing for Python would require a Python parser dependency, and Java would need a Java parser. Regex keeps the tool as a single Node.js package.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual encoding for primitive types
&lt;/h3&gt;

&lt;p&gt;Each primitive type gets a unique shape and color combination so you can identify the primitive at a glance without reading labels. Steps are green rectangles (the most common node). Invocations are amber trapezoids (they call out to external functions). Parallel blocks are purple hexagons (they branch). Callbacks are red circles (they suspend execution). Conditionals are indigo diamonds (standard flowchart convention). The color palette is optimized for dark backgrounds since most developer tools use dark themes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;h3&gt;
  
  
  You'll need:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Run against the examples
&lt;/h3&gt;

&lt;p&gt;Clone the repo and run against the included examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gunnargrosch/durable-viz.git
&lt;span class="nb"&gt;cd &lt;/span&gt;durable-viz

&lt;span class="c"&gt;# TypeScript order workflow&lt;/span&gt;
npx durable-viz examples/order-workflow.ts &lt;span class="nt"&gt;--open&lt;/span&gt;

&lt;span class="c"&gt;# Python order processor&lt;/span&gt;
npx durable-viz examples/order_processor.py &lt;span class="nt"&gt;--open&lt;/span&gt;

&lt;span class="c"&gt;# Java order processor&lt;/span&gt;
npx durable-viz examples/OrderProcessor.java &lt;span class="nt"&gt;--open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run against the purchasing coordinator
&lt;/h3&gt;

&lt;p&gt;If you have the &lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing" rel="noopener noreferrer"&gt;multi-agent purchasing demo&lt;/a&gt; cloned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;durable-multi-agent-purchasing
npx durable-viz src/handlers/coordinator.ts &lt;span class="nt"&gt;--open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run against your own handler
&lt;/h3&gt;

&lt;p&gt;For your own durable function handlers, &lt;code&gt;npx&lt;/code&gt; downloads and runs the tool directly. No cloning needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx durable-viz path/to/your-handler.ts &lt;span class="nt"&gt;--open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install the VS Code extension
&lt;/h3&gt;

&lt;p&gt;Search &lt;strong&gt;"Durable Viz"&lt;/strong&gt; in the Extensions panel, or run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ext &lt;span class="nb"&gt;install &lt;/span&gt;gunnargrosch.durable-viz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open a durable function handler, open the command palette, and run &lt;strong&gt;Durable Viz: Open Lambda Durable Function Workflow&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/durable-viz" rel="noopener noreferrer"&gt;durable-viz on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/durable-viz" rel="noopener noreferrer"&gt;durable-viz on npm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://marketplace.visualstudio.com/items?itemName=gunnargrosch.durable-viz" rel="noopener noreferrer"&gt;VS Code Marketplace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gunnargrosch/multi-agent-systems-on-aws-lambda-with-durable-functions-2gg3"&gt;Multi-Agent Systems on AWS Lambda with Durable Functions&lt;/a&gt;: The purchasing coordinator used as the primary example&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3"&gt;AWS Lambda Durable Functions: Building Long-Running Workflows in Code&lt;/a&gt;: Durable execution primitives and the support triage demo&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" rel="noopener noreferrer"&gt;AWS Lambda Durable Functions documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run &lt;code&gt;npx durable-viz&lt;/code&gt; against your handler and share the diagram. I'd love to see what your workflows look like!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Multi-Agent Systems on AWS Lambda with Durable Functions</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Wed, 25 Mar 2026 13:27:42 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/multi-agent-systems-on-aws-lambda-with-durable-functions-2gg3</link>
      <guid>https://dev.to/gunnargrosch/multi-agent-systems-on-aws-lambda-with-durable-functions-2gg3</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/gunnargrosch/building-multi-agent-systems-with-risen-prompts-and-strands-agents-52bd"&gt;previous post on multi-agent systems&lt;/a&gt;, I built a purchasing coordinator where a coordinator agent routes requests to specialist agents based on &lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;RISEN&lt;/a&gt; prompt contracts. RISEN structures system prompts into five components: Role, Instructions, Steps, Expectation, and Narrowing. In a multi-agent system, the Steps section encodes the routing logic and the Narrowing section prevents agents from doing each other's work. A laptop triggers Price Research and Delivery. A used car triggers all five specialists. The routing logic lives in the prompts, not in code. It works, but it runs in a single process. No fault isolation, no independent scaling, no durability. If the process crashes halfway through specialist consultations, you start over.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3"&gt;durable functions post&lt;/a&gt; solved the durability problem for a support ticket triage workflow: checkpoint each step, suspend for human review, resume where you left off. But that was a single-agent workflow.&lt;/p&gt;

&lt;p&gt;This post combines the two. The same purchasing coordinator, deployed to AWS Lambda, with each specialist as its own Lambda function. The coordinator is a durable function that checkpoints every specialist call. If it's interrupted after consulting three of five specialists, it resumes from the fourth. When a high-value purchase needs human approval, the function suspends, compute charges stop, and it picks up exactly where it left off when the approver responds.&lt;/p&gt;

&lt;p&gt;The two SDKs have distinct roles. The &lt;a href="https://github.com/strands-agents/sdk-typescript" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt; handles the AI reasoning: the planning agent decides which specialists to call, and the synthesis agent produces the recommendation. The &lt;a href="https://github.com/aws/aws-durable-execution-sdk-js" rel="noopener noreferrer"&gt;durable execution SDK&lt;/a&gt; handles the infrastructure: checkpointing, parallel dispatch, suspension, and replay. The coordinator uses both. The specialists only use Strands.&lt;/p&gt;

&lt;p&gt;The complete source code is on GitHub: &lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing" rel="noopener noreferrer"&gt;github.com/gunnargrosch/durable-multi-agent-purchasing&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes from the In-Process Demo
&lt;/h2&gt;

&lt;p&gt;The multi-agent post used &lt;code&gt;tool()&lt;/code&gt; callbacks for specialist dispatch: each specialist was defined as a Strands tool, and the coordinator agent called them as functions within the same process. That's the simplest possible architecture, and it's fine for development. Here's what changes when you deploy:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;In-Process (previous post)&lt;/th&gt;
&lt;th&gt;Lambda + Durable (this post)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Specialist invocation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;tool()&lt;/code&gt; callback, in-process&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;context.invoke()&lt;/code&gt;, separate Lambda function&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Execution model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sequential (one specialist at a time)&lt;/td&gt;
&lt;td&gt;Parallel via &lt;code&gt;context.parallel()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fault tolerance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None. Process crash = restart&lt;/td&gt;
&lt;td&gt;Checkpointed. Resume from last completed step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single process&lt;/td&gt;
&lt;td&gt;Each specialist scales independently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Human-in-the-loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;waitForCallback()&lt;/code&gt; with zero-cost suspension&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IAM isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shared process permissions&lt;/td&gt;
&lt;td&gt;Per-specialist least-privilege policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Routing visibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time hook as each tool is called&lt;/td&gt;
&lt;td&gt;Checkpointed plan step with routing summary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm start&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SAM template, 6 Lambda functions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The RISEN prompts carry over with only minor adjustments. The coordinator's prompt splits into two phases (plan and synthesis) instead of a single invocation, because the durable function needs to checkpoint the plan before dispatching specialists. The specialist prompts are unchanged.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;If you haven't read the &lt;a href="https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3"&gt;durable functions post&lt;/a&gt;, here's the key mental model: durable functions use checkpoint and replay. Your handler re-executes from the top on every resume, but completed steps return their cached results instantly without re-executing. New work picks up from where it left off. The SDK manages this transparently. You write sequential code and the infrastructure handles the rest.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[CoordinatorFunction] — Lambda Durable Function (Sonnet, 1536MB)
  ├→ context.step('plan')            → Planning agent selects specialists
  ├→ context.parallel('specialists') → Runs selected specialists concurrently:
  │     ├─ context.invoke(PriceResearchFunction)  (Haiku)   ← always
  │     ├─ context.invoke(FinancingFunction)       (Haiku)   ← if value &amp;gt; $5K
  │     ├─ context.invoke(DeliveryFunction)        (Haiku)   ← if physical product
  │     ├─ context.invoke(RiskAssessmentFunction)  (Sonnet)  ← if value &amp;gt; $10K / used
  │     └─ context.invoke(ContractReviewFunction)  (Haiku)   ← if subscription/lease
  ├→ context.step('synthesize')      → Synthesis agent combines findings
  └→ context.waitForCallback()       → Human approval (when requireApproval=true)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;context.invoke()&lt;/code&gt;&lt;/strong&gt; is a new primitive not covered in the durable functions post. It's the SDK's built-in method for calling other Lambda functions. It checkpoints the result automatically and suspends the coordinator while waiting, so you don't pay for compute during specialist execution. Despite the SDK's API reference describing it as invoking "another durable function," &lt;code&gt;context.invoke()&lt;/code&gt; works with any Lambda function. The specialists here are standard functions without &lt;code&gt;DurableConfig&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;context.step()&lt;/code&gt;&lt;/strong&gt; wraps the planning and synthesis phases with retry strategies, just like the Bedrock calls in the support triage demo. Each step checkpoints its result. On replay, it returns the cached result without re-executing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;context.parallel()&lt;/code&gt;&lt;/strong&gt; wraps the specialist invocations so they run concurrently, each independently checkpointed. If specialist 3 of 5 fails, the other four results are preserved.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The SAM Template
&lt;/h2&gt;

&lt;p&gt;Here are the key parts of the coordinator's definition in the &lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing/blob/main/template.yaml" rel="noopener noreferrer"&gt;SAM template&lt;/a&gt;. The template defines 6 functions total: the coordinator plus 5 specialists that follow the same pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;CoordinatorFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
  &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;coordinator.handler&lt;/span&gt;
    &lt;span class="na"&gt;MemorySize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1536&lt;/span&gt;
    &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
    &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;COORDINATOR_MODEL_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s"&gt;global.${CoordinatorModelId}&lt;/span&gt;
        &lt;span class="na"&gt;PRICE_RESEARCH_FUNCTION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;PriceResearchFunction&lt;/span&gt;
        &lt;span class="c1"&gt;# ... remaining specialist function references&lt;/span&gt;
    &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::aws:policy/service-role/AWSLambdaBasicDurableExecutionRolePolicy&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
            &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;bedrock:InvokeModel&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;bedrock:InvokeModelWithResponseStream&lt;/span&gt;
            &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Bedrock model + inference profile ARNs&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
            &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lambda:InvokeFunction&lt;/span&gt;
            &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;PriceResearchFunction.Arn&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;FinancingFunction.Arn&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;DeliveryFunction.Arn&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;RiskAssessmentFunction.Arn&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;ContractReviewFunction.Arn&lt;/span&gt;
    &lt;span class="na"&gt;DurableConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;ExecutionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;86400&lt;/span&gt;
      &lt;span class="na"&gt;RetentionPeriodInDays&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;7&lt;/span&gt;
    &lt;span class="na"&gt;AutoPublishAlias&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;live&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No API Gateway, no function URLs, no public endpoints.&lt;/strong&gt; The multi-agent post previewed two deployment options: Lambda with HTTP endpoints, or AgentCore containers. This implementation goes a simpler route. &lt;code&gt;context.invoke()&lt;/code&gt; calls specialists directly via the Lambda API. The coordinator's IAM policy grants &lt;code&gt;lambda:InvokeFunction&lt;/code&gt; on each specialist ARN. Specialists are unreachable from outside the coordinator's execution role.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared handler, different prompts.&lt;/strong&gt; All five specialists use the same &lt;code&gt;specialist.ts&lt;/code&gt; handler. The &lt;code&gt;PROMPT_NAME&lt;/code&gt; environment variable selects which RISEN prompt to load. This keeps the template repetitive but predictable: each specialist block differs only in its name, &lt;code&gt;PROMPT_NAME&lt;/code&gt;, and model ID.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-specialist model selection.&lt;/strong&gt; Risk Assessment gets &lt;code&gt;AdvancedSpecialistModelId&lt;/code&gt; (Sonnet) for stronger reasoning. The other four get &lt;code&gt;SpecialistModelId&lt;/code&gt; (Haiku). Same pattern as the in-process demo, now enforced at the infrastructure level.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;DurableConfig&lt;/code&gt; with 24-hour execution timeout.&lt;/strong&gt; The used car scenario with human approval can sit overnight. Each individual invocation is bounded by &lt;code&gt;Timeout: 300&lt;/code&gt; (the coordinator's per-replay limit). &lt;code&gt;ExecutionTimeout: 86400&lt;/code&gt; is the outer wall-clock limit across all replays and suspensions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;InvokeModelWithResponseStream&lt;/code&gt;&lt;/strong&gt; is included because the Strands SDK's &lt;code&gt;BedrockModel&lt;/code&gt; may use streaming internally for token generation. Without it, the coordinator would get &lt;code&gt;AccessDeniedException&lt;/code&gt; on agent invocations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ESM build.&lt;/strong&gt; The coordinator uses ESM (&lt;code&gt;Format: esm&lt;/code&gt;) with esbuild. The full template includes a &lt;code&gt;Banner&lt;/code&gt; that injects a &lt;code&gt;createRequire&lt;/code&gt; shim because some dependencies expect CommonJS &lt;code&gt;require&lt;/code&gt;. This is a known pattern for ESM Lambda functions with mixed dependencies. See the &lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing/blob/main/template.yaml" rel="noopener noreferrer"&gt;repo's template.yaml&lt;/a&gt; for the complete Metadata block.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Coordinator Handler
&lt;/h2&gt;

&lt;p&gt;The coordinator is the only durable function. Here's the handler skeleton showing the four durable phases. The &lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing/blob/main/src/handlers/coordinator.ts" rel="noopener noreferrer"&gt;full source&lt;/a&gt; includes types, the specialist registry, retry strategies, and routing summary logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;withDurableExecution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CoordinatorEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DurableContext&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="c1"&gt;// Phase 1: Plan — agent decides which specialists to consult&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;step&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;AnalysisPlan&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;plan&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;capturedPlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AnalysisPlan&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;planTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;create_analysis_plan&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;specialists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;price-research&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;financing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;delivery&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;risk-assessment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;contract-review&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
          &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;})),&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="na"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;capturedPlan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Plan created.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;planPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;planTool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="na"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;capturedPlan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Planning agent did not call create_analysis_plan&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;capturedPlan&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;retryStrategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bedrockRetry&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// Phase 2: Consult specialists in parallel via context.invoke()&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parallel&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;specialists&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;specialists&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;specialist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;SPECIALISTS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
          &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Unknown specialist: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoke&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;display&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;SPECIALISTS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;display&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;display&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; unavailable: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unknown error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c1"&gt;// Phase 3: Synthesize specialist findings into a recommendation&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;synthesize&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Build findings from checkpointed parallel results, invoke synthesis agent&lt;/span&gt;
    &lt;span class="c1"&gt;// (see full source for details)&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;retryStrategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bedrockRetry&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// Phase 4: Human approval (optional)&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requireApproval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;approval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;waitForCallback&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ApprovalPayload&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;approval&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;approval_callback_created&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;callbackId&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="na"&gt;serdes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;defaultSerdes&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;approval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rejected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requireApproval&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;approved&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's walk through what's happening in each phase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Plan
&lt;/h3&gt;

&lt;p&gt;The planning phase is the biggest change from the in-process demo. In the previous post, the coordinator was a single agent that both decided which specialists to call and synthesized their findings. Here, planning and synthesis are separate agents wrapped in separate &lt;code&gt;context.step()&lt;/code&gt; calls.&lt;/p&gt;

&lt;p&gt;Why split them? Checkpointing. In the in-process demo, if the coordinator crashes after calling three specialists, you lose the routing decision and all three results. With durable functions, the plan is checkpointed as a single unit. If the function replays after the plan step, it returns the cached plan instantly without calling Bedrock again.&lt;/p&gt;

&lt;p&gt;The planning agent uses &lt;code&gt;tool()&lt;/code&gt; with a Zod schema to produce structured output. The &lt;code&gt;create_analysis_plan&lt;/code&gt; tool captures the plan into a closure variable, and the step returns it as its checkpointed result. If the agent doesn't call the tool, the step throws an error. The retry strategy will attempt it three times, but this is a non-transient failure: if the model didn't call the tool on the first attempt, retries won't help. After retries are exhausted, the execution fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Parallel specialists
&lt;/h3&gt;

&lt;p&gt;This is where &lt;code&gt;context.invoke()&lt;/code&gt; replaces &lt;code&gt;tool()&lt;/code&gt; callbacks. Each specialist invocation is a branch inside &lt;code&gt;context.parallel()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoke&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;// step name for checkpoint history&lt;/span&gt;
  &lt;span class="nx"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Lambda function name from environment&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// payload&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;context.invoke()&lt;/code&gt; calls the specialist Lambda function directly, checkpoints the result, and suspends the coordinator while waiting. You don't pay for coordinator compute while specialists are executing. On replay, completed invocations return their cached results without re-invoking the specialist.&lt;/p&gt;

&lt;p&gt;Each branch has a try/catch for graceful degradation. If the Delivery specialist times out, the coordinator still gets results from Price Research, Financing, Risk Assessment, and Contract Review. The synthesis agent notes the gap and advises the buyer to investigate delivery independently. This is a meaningful improvement over the in-process demo, where a failed specialist tool call could derail the entire coordinator.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Synthesize
&lt;/h3&gt;

&lt;p&gt;The synthesis agent receives all specialist findings and produces a structured recommendation. Like the plan step, the entire synthesis is one checkpointed unit. The synthesis prompt's Narrowing section prevents it from overriding specialist findings or filling in gaps from failed specialists.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4: Human approval
&lt;/h3&gt;

&lt;p&gt;When &lt;code&gt;requireApproval&lt;/code&gt; is true (the used car scenario sets this), the function suspends at &lt;code&gt;waitForCallback&lt;/code&gt;. Compute charges stop. The approver reviews the recommendation and sends a callback via the Lambda API or the demo's interactive prompt. The function resumes and returns the final status.&lt;/p&gt;

&lt;p&gt;Note the &lt;code&gt;serdes: defaultSerdes&lt;/code&gt; option on the callback. As covered in the &lt;a href="https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3"&gt;durable functions post's gotchas&lt;/a&gt;, &lt;code&gt;waitForCallback&lt;/code&gt; defaults to passthrough serialization (not &lt;code&gt;JSON.parse&lt;/code&gt;). Without &lt;code&gt;defaultSerdes&lt;/code&gt;, &lt;code&gt;approval.approved&lt;/code&gt; would be &lt;code&gt;undefined&lt;/code&gt; at runtime even though TypeScript thinks it's a boolean.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Specialist Handler
&lt;/h2&gt;

&lt;p&gt;All five specialists share one handler. The &lt;code&gt;PROMPT_NAME&lt;/code&gt; environment variable selects the behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promptName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PROMPT_NAME&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promptName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;SPECIALIST_MODEL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SpecialistEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Context&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;makeLogger&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;promptName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;awsRequestId&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadSpecialistTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promptName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Specialists are standard Lambda functions, not durable. They don't need checkpointing because each one completes in a single invocation (a Bedrock call and response processing). The coordinator's &lt;code&gt;context.invoke()&lt;/code&gt; handles the durability: if a specialist invocation times out, the coordinator can retry from the checkpoint without re-running specialists that already succeeded.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;loadSpecialistTools&lt;/code&gt; returns specialist-specific tools (see &lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing/blob/main/src/lib/specialist-tools.ts" rel="noopener noreferrer"&gt;&lt;code&gt;src/lib/specialist-tools.ts&lt;/code&gt;&lt;/a&gt;). Most specialists are pure reasoning (no tools). The Price Research specialist has a &lt;code&gt;save_price_snapshot&lt;/code&gt; tool that logs structured price data. In production, that tool could write to DynamoDB or call a pricing API. The coordinator never sees these tools. They're scoped to each specialist's domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Prompt Split
&lt;/h2&gt;

&lt;p&gt;The in-process demo had one coordinator prompt with routing in the Steps section. The durable version splits this into two prompts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;coordinator-plan&lt;/code&gt;&lt;/strong&gt; handles routing. Here's the Steps and Expectation sections:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Steps
1. Read the purchase request and identify what is being purchased, its likely category
   (vehicle, electronics, real estate, software, etc.), the approximate value range,
   and any special circumstances mentioned or implied (used/secondhand, financing needed,
   physical delivery, contract or subscription, high value).
2. Select specialists using these routing rules:
   - Always include price-research to compare options and assess market value.
   - Include financing if the estimated value exceeds $5,000, or if financing or a loan
     is mentioned or implied.
   - Include delivery if the item is a tangible physical product that must be physically
     received — electronics, appliances, vehicles, furniture, machinery. Vehicles always
     need delivery planning (transport, pickup, or test drive logistics). Do not include
     for purely digital purchases (software, SaaS subscriptions, downloadable content).
   - Include contract-review if the purchase involves a subscription, lease, warranty
     agreement, service contract, purchase agreement, or any multi-year financial
     commitment. Vehicle and real estate purchases always involve contracts.
   - Include risk-assessment if the estimated value exceeds $10,000, the item is used or
     secondhand, or the category carries known risk (vehicles, real estate, machinery).
     Do not include for new electronics, appliances, or standard retail under $10,000.
3. For each selected specialist, write a focused prompt describing what to analyze about
   this specific purchase. Include relevant details from the request (item, price,
   condition, location, urgency). The specialist's own system prompt defines its role —
   do not repeat the role description in your prompt.
4. Call create_analysis_plan exactly once with the selected specialists and their prompts.

# Expectation
A single call to create_analysis_plan containing:
- An array of specialists, each with a name (from the allowed set) and a prompt string.
- Only specialists whose routing criteria are met.
- Prompts that are specific to this purchase, not generic templates.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are the same routing rules from the original coordinator prompt in the multi-agent post. The difference: instead of calling specialist tools directly (Steps 2-6 in the original said "invoke the research_prices tool," "invoke the evaluate_financing tool"), the plan agent calls &lt;code&gt;create_analysis_plan&lt;/code&gt; once with all selected specialists. The actual dispatch happens via &lt;code&gt;context.invoke()&lt;/code&gt; in Phase 2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;coordinator-synthesis&lt;/code&gt;&lt;/strong&gt; handles the final recommendation. It receives specialist findings and produces the buyer-facing output. Its Narrowing prevents it from contradicting specialists or filling in missing analysis.&lt;/p&gt;

&lt;p&gt;This split means the coordinator makes two agent invocations (plan + synthesize) instead of one. Each agent invocation may involve multiple Bedrock round-trips internally as the Strands SDK handles reasoning. The trade-off is worth it: the plan is checkpointed before any specialist is called, and the synthesis is checkpointed after all specialists complete. On replay, both return cached results instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing with the Local Runner
&lt;/h2&gt;

&lt;p&gt;The durable functions post showed &lt;code&gt;LocalDurableTestRunner&lt;/code&gt; for the support triage workflow. The multi-agent demo adds a new pattern: &lt;code&gt;registerFunction&lt;/code&gt; for mocking &lt;code&gt;context.invoke()&lt;/code&gt; targets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Register mock handlers for each specialist Lambda function&lt;/span&gt;
&lt;span class="nx"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LocalDurableTestRunner&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;handlerFunction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;runner&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PriceResearchFunction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;specialistHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;FinancingFunction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;specialistHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;DeliveryFunction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;specialistHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;RiskAssessmentFunction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;specialistHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ContractReviewFunction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;specialistHandler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Run the full workflow and verify the result&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;laptopPayload&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getStatus&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SUCCEEDED&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getResult&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;routing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;called&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toContain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Price Research&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;registerFunction&lt;/code&gt; maps function names to local handlers. When the coordinator calls &lt;code&gt;context.invoke("PriceResearchFunction", payload)&lt;/code&gt;, the test runner routes it to the registered mock instead of calling Lambda. This lets you test the full checkpoint/replay lifecycle without deploying or calling Bedrock. The &lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing/blob/main/src/handlers/coordinator.test.ts" rel="noopener noreferrer"&gt;full test suite&lt;/a&gt; also tests callback suspension and resumption using &lt;code&gt;runner.getOperation()&lt;/code&gt; and &lt;code&gt;sendCallbackSuccess()&lt;/code&gt;, the same pattern from the durable functions post.&lt;/p&gt;

&lt;p&gt;The test suite covers seven scenarios: standard flow, all-5-specialists flow, approval, rejection, specialist failure with graceful degradation, callback failure, and planning agent failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try the Demo
&lt;/h2&gt;

&lt;p&gt;The repo includes an interactive demo with three purchase scenarios:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Specialists&lt;/th&gt;
&lt;th&gt;Approval&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;$1,500&lt;/td&gt;
&lt;td&gt;Price Research + Delivery&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Used car&lt;/td&gt;
&lt;td&gt;$18,000&lt;/td&gt;
&lt;td&gt;All 5 specialists&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;waitForCallback&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SaaS subscription&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;Price Research + Contract Review&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;waitForCallback&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  You'll need:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 24+ and npm&lt;/li&gt;
&lt;li&gt;For cloud mode: &lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html" rel="noopener noreferrer"&gt;AWS SAM CLI&lt;/a&gt; 1.153.1+ and Bedrock access to Claude Sonnet 4.6 and Claude Haiku 4.5&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Local mode (no AWS credentials needed)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gunnargrosch/durable-multi-agent-purchasing.git
&lt;span class="nb"&gt;cd &lt;/span&gt;durable-multi-agent-purchasing
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run demo:local &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;used-car
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Local mode uses mocked Bedrock responses. The used car scenario exercises all four durable primitives: &lt;code&gt;step&lt;/code&gt; (plan and synthesis), &lt;code&gt;invoke&lt;/code&gt; (specialist calls), &lt;code&gt;parallel&lt;/code&gt; (concurrent dispatch), and &lt;code&gt;waitForCallback&lt;/code&gt; (human approval where you play the purchase approver).&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud mode (real Bedrock responses)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sam build
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;
npm run demo:cloud &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;used-car &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud mode invokes the deployed coordinator with real Bedrock calls. You'll see actual AI-generated specialist analyses and a synthesized recommendation. The demo polls execution history and prompts you when the approval callback is created.&lt;/p&gt;

&lt;p&gt;The demo uses direct &lt;code&gt;aws lambda invoke&lt;/code&gt; with &lt;code&gt;--invocation-type Event&lt;/code&gt; for simplicity. In production, the coordinator would typically sit behind an upstream service: an API Gateway endpoint receiving purchase requests, an EventBridge rule triggered by order events, or an SQS queue processing a backlog. The coordinator itself doesn't care how it's invoked. It receives the event payload and the durable SDK handles the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inspecting execution history
&lt;/h3&gt;

&lt;p&gt;After a cloud run, you can inspect the execution in the Lambda console under the &lt;strong&gt;Durable executions&lt;/strong&gt; tab, or via the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws lambda list-durable-executions-by-function &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--function-name&lt;/span&gt; durable-multi-agent-purchasing-CoordinatorFunction

aws lambda get-durable-execution &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--durable-execution-arn&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;arn-from-list&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution history shows each step's status and timing: when the plan completed, how long each specialist took, and whether the callback is pending or resolved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  context.invoke() over HTTP
&lt;/h3&gt;

&lt;p&gt;The multi-agent post previewed two deployment approaches: Lambda with API Gateway/Function URLs (HTTP), or AgentCore containers. This demo takes a third path: &lt;code&gt;context.invoke()&lt;/code&gt; for direct Lambda-to-Lambda invocation.&lt;/p&gt;

&lt;p&gt;The result is simpler than either preview option. No API Gateway resources, no function URLs, no SigV4 signing, no HTTP client configuration. The coordinator calls specialists via the Lambda API, and the durable SDK handles checkpointing and retry. Specialists are unreachable from outside the coordinator's execution role, which is better isolation than an HTTP endpoint with IAM auth.&lt;/p&gt;

&lt;p&gt;The trade-off: specialists are only callable from within a durable function. If you later need specialists accessible from other services (a REST API, a Step Functions state machine, another team's coordinator), you'd need to add API Gateway or function URLs at that point. For this use case, where one coordinator owns all specialist dispatch, direct invocation is the right call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Splitting plan from synthesis
&lt;/h3&gt;

&lt;p&gt;The in-process coordinator was a single agent invocation: read the request, call specialist tools, synthesize findings. With durable functions, that becomes two separate agents wrapped in separate &lt;code&gt;context.step()&lt;/code&gt; calls.&lt;/p&gt;

&lt;p&gt;This introduces an extra Bedrock call, but the checkpointing benefits are significant. The plan is preserved before any specialist runs. If the function replays after three of five specialists complete, the plan doesn't need to be regenerated. The synthesis is preserved after all specialists complete. If the function replays during the approval callback, the recommendation doesn't need to be regenerated.&lt;/p&gt;

&lt;p&gt;The alternative would be wrapping the entire coordinator in a single step. That would mean one Bedrock conversation (cheaper) but no intermediate checkpoints. A failure during synthesis would replay from the beginning, including all specialist invocations. With the split, a synthesis failure only retries the synthesis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Planning agent with tool capture
&lt;/h3&gt;

&lt;p&gt;The planning agent uses &lt;code&gt;tool()&lt;/code&gt; with a Zod schema to produce structured output. This is the same pattern from the in-process demo, but used differently. In the previous post, tools were the dispatch mechanism (calling tools = calling specialists). Here, the tool is purely for structured output capture. The plan step returns the captured plan as its checkpointed result, and &lt;code&gt;context.invoke()&lt;/code&gt; handles the actual specialist dispatch.&lt;/p&gt;

&lt;p&gt;Why not just have the planning agent return JSON directly? Tool calling with a schema gives you validation at the SDK level. If the agent returns a plan with an invalid specialist name, Zod catches it before the plan is checkpointed. Without the tool, you'd parse and validate the JSON yourself, and an invalid plan could be checkpointed and break on every subsequent replay.&lt;/p&gt;

&lt;h3&gt;
  
  
  Graceful degradation in parallel
&lt;/h3&gt;

&lt;p&gt;Each specialist branch in &lt;code&gt;context.parallel()&lt;/code&gt; has a try/catch that returns an error message instead of throwing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`[unavailable: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unknown error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means a failed specialist doesn't fail the entire parallel block. The synthesis agent receives partial results and notes the gap. Without the catch, a failed specialist would fail its branch. Whether that fails the entire parallel block depends on the completion config, but either way, synthesis would never run with partial results. With the catch, every branch succeeds (some with error messages), the parallel block always completes, and the synthesis agent can work with whatever it has.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost
&lt;/h2&gt;

&lt;p&gt;The main cost is Bedrock token usage, not Lambda compute.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Specialists&lt;/th&gt;
&lt;th&gt;Agent invocations&lt;/th&gt;
&lt;th&gt;Estimated cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Laptop (2 specialists)&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;~4 (plan + 2 specialists + synthesize)&lt;/td&gt;
&lt;td&gt;~$0.02-0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Used car (5 specialists)&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;~7 (plan + 5 specialists + synthesize)&lt;/td&gt;
&lt;td&gt;~$0.05-0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The coordinator uses Sonnet (~$3/$15 per million input/output tokens). Most specialists use Haiku (~$1/$5 per million input/output tokens). Risk Assessment uses Sonnet. Lambda compute for the active execution periods (plan, specialist dispatch, synthesis, replay) totals ~$0.0001. During the &lt;code&gt;waitForCallback&lt;/code&gt; suspension, compute charges are zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to Watch For
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Checkpoint size.&lt;/strong&gt; Each step result is serialized and stored as a checkpoint. The &lt;a href="https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3"&gt;durable functions post&lt;/a&gt; covered the 256KB limit per checkpoint. With 5 specialists returning Bedrock responses, the parallel result could get large if the model is verbose. Monitor response sizes. If you hit the limit, truncate or summarize specialist responses before returning them from the branch, or store full responses in S3 and return a reference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debugging failures.&lt;/strong&gt; If the plan step fails after exhausting retries, or a specialist consistently times out, the execution moves to &lt;code&gt;FAILED&lt;/code&gt; status. Use &lt;code&gt;get-durable-execution&lt;/code&gt; to see which step failed and the error message. The coordinator uses &lt;code&gt;context.logger&lt;/code&gt; (replay-aware), so CloudWatch Logs show each phase's progress without duplicate lines from replays. Specialist failures are easier to debug since each specialist has its own log group (&lt;code&gt;sam logs --name PriceResearchFunction --tail&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replay safety.&lt;/strong&gt; Durable functions re-execute your handler from the top on every resume. Completed &lt;code&gt;context.step()&lt;/code&gt; and &lt;code&gt;context.invoke()&lt;/code&gt; calls return cached results, but any code outside those primitives runs again on every replay. If you add a side effect (writing to DynamoDB, sending a notification, calling an external API), wrap it in a &lt;code&gt;context.step()&lt;/code&gt; so it executes exactly once. The coordinator's comment on the save-recommendation step shows this pattern. Without the step wrapper, you'd send duplicate notifications every time the function replays.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold starts.&lt;/strong&gt; The used car scenario invokes 5 specialist Lambda functions in parallel. If all 5 are cold, that's 5 concurrent cold starts: arm64 Node.js 24, the Strands SDK, and the Bedrock client. This can add several seconds to the first specialist round. The in-process demo had no cold start penalty for specialists since everything ran in one process. In practice, the cold start overhead is small relative to the Bedrock inference time that follows. For workflows that already include multi-second model calls and human approval, a few extra seconds on the first invocation is rarely the bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;This post showed how to take a multi-agent system from a single-process demo to a deployed, fault-tolerant system on Lambda with durable functions. The RISEN prompts carry over with minimal changes. The architectural shift is from in-process tool calls to checkpointed Lambda-to-Lambda invocations with independent scaling, failure isolation, and human-in-the-loop approval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/durable-multi-agent-purchasing" rel="noopener noreferrer"&gt;Demo repository for this post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gunnargrosch/building-multi-agent-systems-with-risen-prompts-and-strands-agents-52bd"&gt;Building Multi-Agent Systems with RISEN Prompts and Strands Agents&lt;/a&gt;: The in-process multi-agent demo this builds on&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3"&gt;AWS Lambda Durable Functions: Building Long-Running Workflows in Code&lt;/a&gt;: Durable execution primitives and the support triage demo&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" rel="noopener noreferrer"&gt;AWS Lambda Durable Functions documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws/aws-durable-execution-sdk-js" rel="noopener noreferrer"&gt;Durable Execution SDK for JavaScript (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/strands-agents/sdk-typescript" rel="noopener noreferrer"&gt;Strands Agents SDK (TypeScript)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What multi-agent workflow would you deploy with durable functions? Let me know in the comments!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>ai</category>
      <category>typescript</category>
    </item>
    <item>
      <title>AWS Lambda Durable Functions: Building Long-Running Workflows in Code</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Tue, 17 Mar 2026 22:06:16 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3</link>
      <guid>https://dev.to/gunnargrosch/aws-lambda-durable-functions-building-long-running-workflows-in-code-1ad3</guid>
      <description>&lt;p&gt;If you've built anything non-trivial on AWS Lambda, you've hit the wall. The function runs for 15 minutes and it's stateless. Any multi-step workflow requires stitching together Step Functions, SQS queues, DynamoDB tables for state, and a whole lot of glue. It works, but it's a lot of infrastructure for what should be straightforward sequential logic.&lt;/p&gt;

&lt;p&gt;AWS Lambda Durable Functions, &lt;a href="https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions/" rel="noopener noreferrer"&gt;launched at re:Invent 2025&lt;/a&gt;, change that. You write sequential code in a single Lambda function. The SDK handles checkpointing, failure recovery, and suspension. Your function can run for up to a year, and you only pay for active compute time. During waits (human approvals, timers, external callbacks), the function suspends and compute charges stop.&lt;/p&gt;

&lt;p&gt;In this post, I'll walk through what problem durable functions solve, how the checkpoint/replay model works, and then dig into a complete AI-powered support ticket workflow in TypeScript that demonstrates every primitive in action.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Problem This Solves
&lt;/h2&gt;

&lt;p&gt;Here's a scenario most teams deal with: a support ticket arrives, someone needs to triage it, figure out who should handle it, wait for them to respond, and then close the loop with the customer. Before durable functions, you had a few options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step Functions:&lt;/strong&gt; Define an ASL state machine with states for each step, configure IAM for each integration, manage the state machine as a separate resource. Great for cross-service orchestration, but heavyweight for application logic that naturally reads as sequential code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQS + multiple Lambda functions:&lt;/strong&gt; Break the workflow into separate functions connected by queues. Now you're managing message formats, dead-letter queues, idempotency, and correlating state across function boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Polling loop with DynamoDB:&lt;/strong&gt; One function writes state to DynamoDB, another polls for changes. Works, but you're paying for polling compute and managing your own state machine.&lt;/p&gt;

&lt;p&gt;All three approaches take what should be straightforward sequential logic and spread it across multiple services, IAM policies, and configuration files.&lt;/p&gt;

&lt;p&gt;With durable functions, that same workflow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TicketEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DurableContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;analyze&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;analyzeTicket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agent-review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;notifyAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;needsEscalation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;specialist-review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;notifySpecialist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;close-ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;reply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;send-reply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;sendReply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;survey&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;send-survey&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;sendSurvey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;resolved&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One function. Sequential code. The SDK handles checkpointing each step, suspending during the human review waits, and resuming when the callbacks arrive.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Checkpoint/Replay Works
&lt;/h2&gt;

&lt;p&gt;This is the part that makes everything else make sense. Durable functions use a checkpoint and replay model. Here's how it works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;First invocation:&lt;/strong&gt; Your handler runs from the beginning. Each &lt;code&gt;context.step()&lt;/code&gt; executes your code and checkpoints the result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suspension:&lt;/strong&gt; When &lt;code&gt;context.wait()&lt;/code&gt; (fixed-duration pause) or &lt;code&gt;context.waitForCallback()&lt;/code&gt; (external signal) is called, the function terminates. Compute charges stop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resumption:&lt;/strong&gt; When the wait completes or a callback arrives, Lambda invokes your handler again from the beginning. But this time, completed steps return their cached results instantly without re-executing. Execution picks up from the first non-checkpointed operation.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;First invocation:
  analyze     -&amp;gt;  [executes Bedrock call, checkpoints result]
  agent-review -&amp;gt;  [creates callback, function suspends]

Second invocation (agent responds):
  analyze     -&amp;gt;  [returns cached result, skips Bedrock call]
  agent-review -&amp;gt;  [returns callback result]
  close-ticket -&amp;gt;  [sends reply + survey in parallel]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's one critical rule that falls out of this: &lt;strong&gt;code outside steps re-executes on every replay and must be deterministic.&lt;/strong&gt; If you use &lt;code&gt;Date.now()&lt;/code&gt;, &lt;code&gt;Math.random()&lt;/code&gt;, or &lt;code&gt;crypto.randomUUID()&lt;/code&gt; outside a step, you'll get different values on each replay. Wrap non-deterministic operations in steps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Wrong: different value on each replay&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Right: checkpointed, same value on every replay&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gen-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You'll Build
&lt;/h2&gt;

&lt;p&gt;A support ticket triage workflow where AI handles the first pass and humans make the final call. This is the pattern that makes durable functions click: the AI analysis takes seconds, but the human reviews take hours or days. Without durable functions, you'd need to persist state somewhere and wire up resumption logic. With them, you just write &lt;code&gt;await context.waitForCallback()&lt;/code&gt; and the function suspends until the human responds.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Primitive&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Where You'll See It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;step()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute and checkpoint an atomic operation&lt;/td&gt;
&lt;td&gt;AI ticket analysis with Bedrock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;waitForCallback()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Suspend until an external system responds&lt;/td&gt;
&lt;td&gt;Agent review, specialist escalation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;parallel()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run multiple branches concurrently&lt;/td&gt;
&lt;td&gt;Customer reply + satisfaction survey&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry strategies&lt;/td&gt;
&lt;td&gt;Automatic retry with exponential backoff&lt;/td&gt;
&lt;td&gt;Bedrock API calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;context.logger&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Replay-aware structured logging (suppresses duplicate output during replay)&lt;/td&gt;
&lt;td&gt;Throughout&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The complete source code is on GitHub: &lt;a href="https://github.com/gunnargrosch/durable-support-triage" rel="noopener noreferrer"&gt;github.com/gunnargrosch/durable-support-triage&lt;/a&gt;. Clone the repo and run &lt;code&gt;npm run demo&lt;/code&gt; to try the full workflow locally with mocked Bedrock responses, or deploy to AWS and run it with real Bedrock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  You'll need:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account with credentials configured&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html" rel="noopener noreferrer"&gt;AWS SAM CLI&lt;/a&gt; 1.153.1 or later (minimum version with &lt;code&gt;DurableConfig&lt;/code&gt; support)&lt;/li&gt;
&lt;li&gt;Node.js 24 or later&lt;/li&gt;
&lt;li&gt;Access to Amazon Bedrock with Claude Haiku 4.5 enabled in your region&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Clone and install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gunnargrosch/durable-support-triage.git
&lt;span class="nb"&gt;cd &lt;/span&gt;durable-support-triage
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The SAM Template
&lt;/h2&gt;

&lt;p&gt;Here's the &lt;code&gt;template.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;AWSTemplateFormatVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2010-09-09"&lt;/span&gt;
&lt;span class="na"&gt;Transform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless-2016-10-31&lt;/span&gt;
&lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AI-powered support ticket triage with durable functions&lt;/span&gt;

&lt;span class="na"&gt;Parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;BedrockModelId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;String&lt;/span&gt;
    &lt;span class="na"&gt;Default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anthropic.claude-haiku-4-5-20251001-v1:0&lt;/span&gt;
    &lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bedrock foundation model ID (uses global inference profile prefix automatically)&lt;/span&gt;

&lt;span class="na"&gt;Globals&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Function&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;
    &lt;span class="na"&gt;MemorySize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;256&lt;/span&gt;
    &lt;span class="na"&gt;Runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nodejs24.x&lt;/span&gt;

&lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;SupportTriageFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;FunctionName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;durable-support-triage&lt;/span&gt;
      &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;index.handler&lt;/span&gt;
      &lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
      &lt;span class="na"&gt;DurableConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;ExecutionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;604800&lt;/span&gt;
        &lt;span class="na"&gt;RetentionPeriodInDays&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;14&lt;/span&gt;
      &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::aws:policy/service-role/AWSLambdaBasicDurableExecutionRolePolicy&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2012-10-17"&lt;/span&gt;
          &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
              &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;bedrock:InvokeModel&lt;/span&gt;
              &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:bedrock:*::foundation-model/${BedrockModelId}"&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:bedrock:${AWS::Region}:${AWS::AccountId}:inference-profile/global.${BedrockModelId}"&lt;/span&gt;
      &lt;span class="na"&gt;AutoPublishAlias&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;live&lt;/span&gt;
      &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;BEDROCK_MODEL_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global.${BedrockModelId}"&lt;/span&gt;
    &lt;span class="na"&gt;Metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;BuildMethod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;esbuild&lt;/span&gt;
      &lt;span class="na"&gt;BuildProperties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Minify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;Target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;es2022&lt;/span&gt;
        &lt;span class="na"&gt;EntryPoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;index.ts&lt;/span&gt;

&lt;span class="na"&gt;Outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;FunctionArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;SupportTriageFunction.Arn&lt;/span&gt;
  &lt;span class="na"&gt;AliasArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;SupportTriageFunctionAliaslive&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note about this template:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Globals.Timeout: 120&lt;/code&gt;&lt;/strong&gt; is the standard Lambda invocation timeout. It applies to each individual invocation (each replay round), not the overall workflow. Two minutes is plenty for each replay round. &lt;code&gt;ExecutionTimeout&lt;/code&gt; in &lt;code&gt;DurableConfig&lt;/code&gt; is the total wall-clock time for the entire durable execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;DurableConfig&lt;/code&gt;&lt;/strong&gt; is the only new property compared to a standard Lambda function. &lt;code&gt;ExecutionTimeout&lt;/code&gt; is in seconds (604,800 = 7 days). The individual callback timeouts handle the per-step boundaries, but the execution timeout is your outer safety net. A ticket that needs specialist review might sit over a weekend, so 7 days gives headroom. &lt;code&gt;RetentionPeriodInDays&lt;/code&gt; controls how long execution history is kept (1 to 90 days).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AutoPublishAlias: live&lt;/code&gt;&lt;/strong&gt; automatically creates a Lambda version and alias on each deploy. This is important for two reasons: durable functions require a qualified ARN (with version or alias) for invocation, and Lambda pins each execution to the version that started it. If you deploy new code while an execution is suspended, replay still uses the original version. This prevents inconsistencies from code changes mid-workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AWSLambdaBasicDurableExecutionRolePolicy&lt;/code&gt;&lt;/strong&gt; is an AWS managed policy that grants the checkpoint and state permissions your function needs (&lt;code&gt;lambda:CheckpointDurableExecutions&lt;/code&gt;, &lt;code&gt;lambda:GetDurableExecutionState&lt;/code&gt;) plus the standard CloudWatch Logs permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock IAM&lt;/strong&gt; uses a cross-region inference profile (&lt;code&gt;global.&lt;/code&gt; prefix on the model ID) so that Bedrock routes requests to whichever region has capacity. The policy needs two resource ARNs: the foundation model (wildcard region, no account ID) and the inference profile (your region and account). The &lt;code&gt;BedrockModelId&lt;/code&gt; parameter defaults to Claude Haiku 4.5 but you can override it at deploy time with &lt;code&gt;--parameter-overrides BedrockModelId=&amp;lt;model-id&amp;gt;&lt;/code&gt;. Check &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html" rel="noopener noreferrer"&gt;Bedrock model availability&lt;/a&gt; for what's enabled in your region.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The RISEN Prompt
&lt;/h2&gt;

&lt;p&gt;The AI triage uses Amazon Bedrock with Claude Haiku 4.5 to analyze incoming tickets. I'm using the &lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;RISEN framework&lt;/a&gt; for the system prompt. RISEN structures prompts into five components: Role, Instructions, Steps, Expectation, and Narrowing. Each component serves a specific purpose, and together they produce consistent, structured output that your code can reliably parse.&lt;/p&gt;

&lt;p&gt;Here's the system prompt for the triage agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TRIAGE_SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
# Role
You are a senior technical support analyst with 10 years of experience
triaging customer support tickets for a SaaS platform. You specialize in
categorizing issues by severity, identifying root causes, and drafting
professional responses.

# Instructions
Analyze the incoming support ticket and produce a structured triage
assessment with category, priority, sentiment, a suggested response,
and an escalation recommendation.

# Steps
1. Read the ticket subject and body to identify the core issue.
2. Categorize the issue (billing, technical, account, feature-request, other).
3. Assess priority based on business impact and urgency (critical, high, medium, low).
4. Evaluate customer sentiment (frustrated, neutral, positive).
5. Draft a suggested response that acknowledges the issue and outlines next steps.
6. Determine whether the ticket needs specialist escalation.

# Expectation
Return a JSON object with this exact structure:
{
  "category": "billing" | "technical" | "account" | "feature-request" | "other",
  "priority": "critical" | "high" | "medium" | "low",
  "sentiment": "frustrated" | "neutral" | "positive",
  "suggestedResponse": "string",
  "needsEscalation": boolean,
  "escalationReason": "string or null",
  "summary": "One-sentence summary of the issue"
}

# Narrowing
- Return only raw JSON. Do not wrap it in markdown code fences, backticks,
  or any other formatting. No explanation, no preamble, no commentary.
- Do not fabricate account details or order numbers not present in the ticket.
- Do not promise refunds, credits, or policy exceptions in the suggested response.
- needsEscalation MUST be false unless one of these exact conditions is met:
  1. The ticket describes confirmed or suspected data loss.
  2. The ticket describes a security breach, unauthorized access, or credential compromise.
  3. The ticket involves a legal or compliance issue.
  4. The customer tier is "enterprise".
  For all other tickets (billing issues, bugs, feature requests, general questions),
  needsEscalation MUST be false regardless of priority or sentiment.
- Keep the suggested response under 200 words.
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;Narrowing&lt;/strong&gt; section does the heavy lifting for reliability. The explicit numbered escalation conditions prevent the model from over-escalating standard tickets (without these, Haiku flagged a routine CSV bug for specialist review). The constraint against promising refunds keeps the AI from making commitments that only a human should make. The pipe syntax in the Expectation section (&lt;code&gt;"billing" | "technical" | ...&lt;/code&gt;) is instructional for the model, not literal JSON. It tells the model which values are valid without requiring a separate schema document. Note that "return only raw JSON" doesn't guarantee it: some models still wrap output in markdown code fences despite the instruction. The handler strips them defensively before parsing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Handler
&lt;/h2&gt;

&lt;p&gt;Here's the handler from &lt;code&gt;src/index.ts&lt;/code&gt;, trimmed to show the durable execution primitives. The full source (input validation, response parsing, integration stubs) is in the &lt;a href="https://github.com/gunnargrosch/durable-support-triage" rel="noopener noreferrer"&gt;repo&lt;/a&gt;. Types are defined in &lt;code&gt;src/types.ts&lt;/code&gt;. The &lt;code&gt;TRIAGE_SYSTEM_PROMPT&lt;/code&gt; shown in the RISEN section above is defined at module scope.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;withDurableExecution&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;DurableContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;createRetryStrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JitterStrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;defaultSerdes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws/durable-execution-sdk-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BedrockRuntimeClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;InvokeModelCommand&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-sdk/client-bedrock-runtime&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;TicketEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TriageResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;AgentReview&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;SpecialistReview&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TicketResolution&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./types&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BedrockRuntimeClient&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;closeTicket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DurableContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;send-reply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;reply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendCustomerReply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;send-survey&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;func&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;survey&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendSatisfactionSurvey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;withDurableExecution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TicketEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DurableContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TicketResolution&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;validateEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Ticket received&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;customerTier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerTier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 1: AI analyzes the ticket using Bedrock&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;analyze-ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;InvokeModelCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BEDROCK_MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="na"&gt;anthropic_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TRIAGE_SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
              &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Ticket ID: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\nCustomer Tier: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerTier&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\nSubject: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="p"&gt;}));&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;parseBedrockResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;retryStrategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createRetryStrategy&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="na"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;initialDelay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="na"&gt;maxDelay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="na"&gt;backoffRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;jitter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JitterStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2: Support agent reviews AI suggestion&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentReview&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;waitForCallback&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;AgentReview&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agent-review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;notifyAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;customerTier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerTier&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="na"&gt;serdes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;defaultSerdes&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;finalResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;agentReview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;editedResponse&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;suggestedResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;agentReview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rejected&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;finalResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 3: If escalation needed, wait for specialist&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;needsEscalation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;specialistResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;waitForCallback&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;SpecialistReview&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;specialist-review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;notifySpecialist&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;callbackId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;agentNotes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;agentReview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agentNotes&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;days&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="na"&gt;serdes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;defaultSerdes&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resolvedResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;specialistResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;finalResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;closeTicket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;close-escalated-ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contactEmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resolvedResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;escalated&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;finalResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;resolvedResponse&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 4: Send reply and survey in parallel&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;closeTicket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;close-ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contactEmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;finalResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;resolved&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;finalResponse&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's walk through what's happening:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;withDurableExecution&lt;/code&gt;&lt;/strong&gt; wraps your async function and returns a standard Lambda handler. The runtime calls it like any other handler; the SDK intercepts the execution to manage checkpoints. The &lt;code&gt;BedrockRuntimeClient&lt;/code&gt; is instantiated at module scope, which is standard Lambda practice for connection reuse across warm-start invocations. Each replay is a new Lambda invocation, but it may reuse a warm container just like any regular invocation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;validateEvent&lt;/code&gt;&lt;/strong&gt; runs before the first &lt;code&gt;context.step()&lt;/code&gt;. This is intentional: if the payload is malformed, the execution fails immediately instead of after the Bedrock step has already been checkpointed. Durable executions that fail after partial checkpointing are harder to reason about than ones that fail fast. Validation is deterministic and cheap, so re-running it on every replay is harmless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;context.step("analyze-ticket", ...)&lt;/code&gt;&lt;/strong&gt; calls Amazon Bedrock with the RISEN prompt and checkpoints the result. The retry strategy handles transient Bedrock API errors (throttling, temporary unavailability) with exponential backoff. &lt;code&gt;parseBedrockResponse&lt;/code&gt; handles the response parsing separately: it strips markdown code fences (some models wrap JSON output despite the prompt instruction), validates the response structure, and gives clear error messages on parse failures. If the function replays later, this step returns the cached analysis without calling Bedrock again. That matters for both cost and consistency: you don't want the AI to produce a different triage on replay.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;context.waitForCallback("agent-review", ...)&lt;/code&gt;&lt;/strong&gt; is where the function suspends. The SDK creates a callback ID and passes it to your submitter function (which sends it to Slack, email, or your ticketing UI). The submitter runs exactly once, on the invocation that creates the callback. On replay, the SDK skips the submitter entirely and returns the callback result directly. This is important: even though the submitter isn't wrapped in a &lt;code&gt;context.step()&lt;/code&gt;, it won't re-execute on replay. The SDK then terminates the Lambda function. Compute charges stop. The agent might respond in minutes or hours. When they do, an external system calls &lt;code&gt;SendDurableExecutionCallbackSuccess&lt;/code&gt; with their review. Lambda resumes the function from where it left off. The &lt;code&gt;serdes: defaultSerdes&lt;/code&gt; option is required for typed callbacks. Without it, the SDK uses passthrough serialization (not &lt;code&gt;JSON.parse&lt;/code&gt;) and &lt;code&gt;agentReview.approved&lt;/code&gt; would be &lt;code&gt;undefined&lt;/code&gt; at runtime even though TypeScript thinks it's a boolean. This isn't documented yet: &lt;code&gt;step()&lt;/code&gt; defaults to JSON serdes, but &lt;code&gt;waitForCallback&lt;/code&gt; defaults to passthrough. The SDK exports &lt;code&gt;defaultSerdes&lt;/code&gt; for this purpose. If the callback times out (8 hours for the agent, 3 days for the specialist), the SDK throws a &lt;code&gt;CallbackTimeoutError&lt;/code&gt;. In production, wrap the callback in a try/catch to handle the timeout (re-queue the ticket, notify a manager, or auto-escalate).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the agent rejects&lt;/strong&gt; the AI suggestion (&lt;code&gt;approved: false&lt;/code&gt;), the workflow returns early with a &lt;code&gt;rejected&lt;/code&gt; status without sending a customer reply. Your ticketing system handles the next step (re-queue, reassign, or manual response).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The escalation path&lt;/strong&gt; adds a second &lt;code&gt;waitForCallback&lt;/code&gt;. If the AI flagged the ticket for escalation (security concern, data loss, enterprise customer), the function suspends again waiting for a specialist. The specialist sends back a &lt;code&gt;SpecialistReview&lt;/code&gt; with a response and notes. This callback has a 3-day timeout because specialist reviews can take time. Without durable functions, you'd need a separate state machine or database to track which tickets are waiting for specialists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;context.logger&lt;/code&gt;&lt;/strong&gt; replaces &lt;code&gt;console.log&lt;/code&gt;. During replay, completed steps don't re-execute, but code outside steps does. &lt;code&gt;context.logger&lt;/code&gt; suppresses duplicate log output during replay so your CloudWatch Logs stay clean. With &lt;code&gt;console.log&lt;/code&gt;, you'd see the same log lines repeated on every replay invocation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;closeTicket&lt;/code&gt;&lt;/strong&gt; extracts the parallel close-out into a helper. Both the escalation and standard paths send a customer reply and satisfaction survey concurrently using &lt;code&gt;context.parallel&lt;/code&gt;. Each branch gets its own child context with isolated state tracking. The helper takes a dynamic context name so the escalated and standard close-out steps are distinguishable in the execution history.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Locally
&lt;/h2&gt;

&lt;p&gt;The testing SDK (&lt;code&gt;@aws/durable-execution-sdk-js-testing&lt;/code&gt;) lets you run durable functions locally without deploying. Here's the key pattern from &lt;code&gt;src/index.test.ts&lt;/code&gt;, showing how to drive a callback-based workflow in a test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LocalDurableTestRunner&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws/durable-execution-sdk-js-testing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// jest.mock replaces BedrockRuntimeClient so analyze-ticket returns controlled results&lt;/span&gt;
&lt;span class="c1"&gt;// (see full mock setup in the repo)&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./index&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Support Triage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LocalDurableTestRunner&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;LocalDurableTestRunner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setupTestEnvironment&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;skipTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;LocalDurableTestRunner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;teardownTestEnvironment&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LocalDurableTestRunner&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;handlerFunction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;afterEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;should resolve a standard ticket after agent review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TKT-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CUST-123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;customerTier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pro&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cannot export CSV reports&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;When I click the export button, nothing happens.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;contactEmail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// The handler suspends at waitForCallback("agent-review").&lt;/span&gt;
    &lt;span class="c1"&gt;// getOperation blocks until the callback is created.&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentCallback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOperation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agent-review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentDetails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agentCallback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForData&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agentDetails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendCallbackSuccess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;editedResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;We have identified the CSV export issue and a fix is rolling out today.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;agentNotes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Known bug, fix in deploy pipeline&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}));&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getStatus&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SUCCEEDED&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getResult&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;toMatchObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;resolved&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TKT-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Additional tests in the repo: escalation flow, agent rejection, callback failure&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;setupTestEnvironment&lt;/code&gt; and &lt;code&gt;teardownTestEnvironment&lt;/code&gt; are static methods that start and stop the local checkpoint server. They run once per test file in &lt;code&gt;beforeAll&lt;/code&gt;/&lt;code&gt;afterAll&lt;/code&gt;. The runner instance is created per test in &lt;code&gt;beforeEach&lt;/code&gt; and reset in &lt;code&gt;afterEach&lt;/code&gt;. The &lt;code&gt;skipTime: true&lt;/code&gt; option fast-forwards any &lt;code&gt;context.wait()&lt;/code&gt; calls so tests run instantly.&lt;/p&gt;

&lt;p&gt;The interesting part is the callback interaction: &lt;code&gt;runner.getOperation("agent-review")&lt;/code&gt; blocks until the handler reaches &lt;code&gt;waitForCallback("agent-review")&lt;/code&gt; and creates the callback. Then &lt;code&gt;sendCallbackSuccess&lt;/code&gt; simulates the external system responding. This lets you test the full suspend/resume lifecycle without deploying. The repo includes tests for all three paths: standard resolution, escalation with specialist review, and agent rejection. There's also a test that uses &lt;code&gt;sendCallbackFailure&lt;/code&gt; to verify error handling when an external system reports a failure.&lt;/p&gt;

&lt;p&gt;Run the tests with &lt;code&gt;npm test&lt;/code&gt;, or use the &lt;code&gt;run-durable&lt;/code&gt; CLI to run the handler directly with a payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx run-durable &lt;span class="nt"&gt;--skip-time&lt;/span&gt; &lt;span class="nt"&gt;--verbose&lt;/span&gt; &lt;span class="nt"&gt;--event&lt;/span&gt; &lt;span class="s1"&gt;'{"ticketId":"TKT-001","subject":"Test ticket"}'&lt;/span&gt; src/index.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Try the Demo
&lt;/h2&gt;

&lt;p&gt;The repo includes an interactive demo that runs the entire workflow in your terminal, showing each stage: AI analysis, human review prompts, checkpoint history, and final resolution.&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;npm run demo&lt;/code&gt; to open the interactive menu, or skip it with a direct command:&lt;/p&gt;

&lt;h3&gt;
  
  
  Local mode (no AWS credentials needed)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run demo:local &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;standard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Local mode uses mocked Bedrock responses so you can see the full workflow without calling AWS. Here's what the first few steps look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  ──────────────────────────────────────────────
  Ticket TKT-001
  Customer:  CUST-123 (pro)
  Subject:   Cannot export CSV reports
  ──────────────────────────────────────────────

  ▶ step: analyze-ticket
    ✓ Checkpointed analyze-ticket (1204ms)

  ──────────────────────────────────────────────
  AI Triage Result
  Category:    technical
  Priority:    high
  Sentiment:   frustrated
  Escalation:  No
  ──────────────────────────────────────────────

  ⏸ waitForCallback: agent-review
    Function suspended. Compute charges stopped.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The demo pauses at each callback and prompts you to respond as the agent or specialist. It walks through three ticket scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Standard ticket&lt;/strong&gt; (pro tier, CSV export bug): AI analyzes, agent approves with edits, reply sent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise escalation&lt;/strong&gt; (security concern): AI flags for escalation, agent approves, specialist reviews, reply sent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent rejection&lt;/strong&gt; (feature request): AI suggests a response, agent rejects, ticket returned for manual handling.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At each human review step, the demo pauses and lets you play the role of the support agent or specialist. You approve, reject, or edit the AI's suggestion. The demo shows the execution history after each step so you can see the checkpoint/replay model in action.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud mode (real Bedrock responses)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run demo:cloud &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;standard &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud mode invokes the deployed Lambda function with real Bedrock calls. You'll see actual AI-generated triage analysis and can watch the durable execution checkpoints in the Lambda console. Add &lt;code&gt;--profile=&amp;lt;name&amp;gt;&lt;/code&gt; if you're not using the default AWS profile.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying and Invoking
&lt;/h2&gt;

&lt;p&gt;Build and deploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sam build
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once deployed, invoke the function asynchronously with a qualified ARN. The &lt;code&gt;AutoPublishAlias&lt;/code&gt; in the template created a &lt;code&gt;live&lt;/code&gt; alias:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws lambda invoke &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--function-name&lt;/span&gt; durable-support-triage:live &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--invocation-type&lt;/span&gt; Event &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--durable-execution-name&lt;/span&gt; &lt;span class="s2"&gt;"ticket-TKT-001"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cli-binary-format&lt;/span&gt; raw-in-base64-out &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--payload&lt;/span&gt; &lt;span class="s1"&gt;'{
    "ticketId": "TKT-001",
    "customerId": "CUST-123",
    "customerTier": "pro",
    "subject": "Cannot export CSV reports",
    "body": "When I click the export button, nothing happens.",
    "contactEmail": "customer@example.com"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  response.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--invocation-type Event&lt;/code&gt;&lt;/strong&gt; is required for long-running workflows. Synchronous invocation (&lt;code&gt;RequestResponse&lt;/code&gt;) times out after 15 minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--durable-execution-name&lt;/code&gt;&lt;/strong&gt; provides built-in idempotency. If you invoke the function twice with the same execution name, the second invocation returns the existing execution instead of creating a duplicate. Using the ticket ID as the execution name is a natural fit. Note: execution names must be alphanumeric, hyphens, or underscores. If your ticket IDs contain dots, slashes, or other special characters, sanitize them first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;:live&lt;/code&gt;&lt;/strong&gt; on the function name is the alias qualifier. Without it, you'll get &lt;code&gt;InvalidParameterValueException: Durable execution requires qualified function identifier&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Monitoring execution progress
&lt;/h3&gt;

&lt;p&gt;Check execution status in the Lambda console under the &lt;strong&gt;Durable executions&lt;/strong&gt; tab. You'll see each step's status and timing, including when the function suspended waiting for the agent callback.&lt;/p&gt;

&lt;p&gt;You can also check programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws lambda get-durable-execution &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--durable-execution-arn&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:lambda:us-east-2:123456789012:function:durable-support-triage:live/durable-execution/ticket-TKT-001/&amp;lt;run-id&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;&amp;lt;run-id&amp;gt;&lt;/code&gt; with the run ID returned in the initial &lt;code&gt;invoke&lt;/code&gt; response. You can also find it in the Lambda console under the &lt;strong&gt;Durable executions&lt;/strong&gt; tab.&lt;/p&gt;

&lt;h3&gt;
  
  
  Completing the callbacks
&lt;/h3&gt;

&lt;p&gt;When the support agent finishes their review, send the callback from your ticketing system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws lambda send-durable-execution-callback-success &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--callback-id&lt;/span&gt; &lt;span class="s2"&gt;"your-callback-id-here"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cli-binary-format&lt;/span&gt; raw-in-base64-out &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--result&lt;/span&gt; &lt;span class="s1"&gt;'{
    "approved": true,
    "editedResponse": "Hi, thanks for reporting this. We have identified the issue and a fix is rolling out today.",
    "agentNotes": "Known bug in CSV export module"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--result&lt;/code&gt; value is a JSON string. The CLI handles serialization, so you pass the JSON object directly (unlike the test SDK, where you explicitly call &lt;code&gt;JSON.stringify()&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The function resumes, checks for escalation, and continues. If a specialist callback is pending, the same pattern applies: the specialist's system calls &lt;code&gt;send-durable-execution-callback-success&lt;/code&gt; when their review is complete. There's also &lt;code&gt;send-durable-execution-callback-failure&lt;/code&gt; for when an external system needs to report an error (e.g., agent rejects the ticket, or an integration fails).&lt;/p&gt;

&lt;p&gt;You can also react to execution state changes via EventBridge. Lambda emits events to the default event bus when executions start, succeed, fail, or time out. Create an EventBridge rule with this event pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"aws.lambda"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"detail-type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Durable Execution Status Change"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Gotchas and Hard-Won Lessons
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Determinism is not optional
&lt;/h3&gt;

&lt;p&gt;This is the rule that trips people up. Code outside steps re-executes on every replay. If it produces different results each time, your workflow breaks in subtle ways.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This generates a different ID on each replay&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requestId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;saveToDb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;// This generates one ID, checkpoints it, and returns the same value on replay&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requestId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gen-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;saveToDb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same applies to &lt;code&gt;Date.now()&lt;/code&gt;, &lt;code&gt;Math.random()&lt;/code&gt;, API calls, and database queries. If it can return different values, wrap it in a step.&lt;/p&gt;

&lt;p&gt;The ESLint plugin (&lt;code&gt;@aws/durable-execution-sdk-js-eslint-plugin&lt;/code&gt;) catches common violations. Set it up early.&lt;/p&gt;

&lt;h3&gt;
  
  
  Closure mutations are lost on replay
&lt;/h3&gt;

&lt;p&gt;Variables you modify inside a step are not preserved across replays:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;calculate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// This mutation is lost on replay&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Always 0 on replay&lt;/span&gt;

&lt;span class="c1"&gt;// Instead, return values from steps&lt;/span&gt;
&lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;calculate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Always 42&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is because the step function doesn't re-execute on replay. It returns the cached result. But the closure variable was modified by the function body, which never ran. Return values from steps instead of modifying outer variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  The qualified ARN requirement is easy to miss
&lt;/h3&gt;

&lt;p&gt;Durable functions require a qualified function identifier: a version number, alias, or &lt;code&gt;$LATEST&lt;/code&gt;. Using an unqualified ARN silently fails or throws &lt;code&gt;InvalidParameterValueException&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;AutoPublishAlias&lt;/code&gt; SAM property solves this. It creates a new version and updates the alias on every deploy. If you're using EventBridge Scheduler or other services to invoke your function, make sure they target the alias ARN, not the unqualified ARN.&lt;/p&gt;

&lt;h3&gt;
  
  
  More steps means slower replay
&lt;/h3&gt;

&lt;p&gt;Every time your function resumes, the SDK replays from the beginning, returning cached results for completed steps. The more steps you have, the more replay overhead per resumption.&lt;/p&gt;

&lt;p&gt;This is a trade-off. More granular steps give you better debuggability and more precise retry boundaries. Fewer steps give you faster replay. In practice, group related operations into a single step unless you need separate retry behavior or checkpoint boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  context.wait() is the simplest superpower
&lt;/h3&gt;

&lt;p&gt;The "How Checkpoint/Replay Works" section mentions &lt;code&gt;context.wait()&lt;/code&gt; for fixed-duration pauses, but the handler only uses &lt;code&gt;waitForCallback()&lt;/code&gt;. In practice, &lt;code&gt;context.wait()&lt;/code&gt; is one of the most useful primitives. Need a 24-hour cooling-off period before sending a follow-up? One line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cooling-off&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function suspends, compute charges stop, and Lambda resumes it 24 hours later. No cron jobs, no EventBridge Scheduler, no polling.&lt;/p&gt;

&lt;h3&gt;
  
  
  You can't enable durable execution on existing functions
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;DurableConfig&lt;/code&gt; can only be set when creating a function. You can't toggle it on an existing function. If you need to migrate, you'll need to create a new function. Plan for this.&lt;/p&gt;

&lt;p&gt;Changing &lt;code&gt;DurableConfig&lt;/code&gt; in CloudFormation also requires resource replacement, not an in-place update.&lt;/p&gt;

&lt;h3&gt;
  
  
  Checkpoint payloads have a 256KB limit
&lt;/h3&gt;

&lt;p&gt;Each step result is serialized and stored as a checkpoint. If a step returns an object larger than 256KB, you'll get a &lt;code&gt;CheckpointUnrecoverableExecutionError&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The workaround: store large data in S3 or DynamoDB and return a reference from the step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dataRef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;store-large-data&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`tickets/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ticketId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/attachments.json`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;largeData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Know your failure modes
&lt;/h3&gt;

&lt;p&gt;If a step throws an unrecoverable error (after all retries are exhausted), the execution moves to a &lt;code&gt;FAILED&lt;/code&gt; state. You can inspect the error in the Lambda console or via the &lt;code&gt;get-durable-execution&lt;/code&gt; API. For cases where you want to fail immediately without retries, throw an &lt;code&gt;UnrecoverableInvocationError&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;UnrecoverableInvocationError&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws/durable-execution-sdk-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UnrecoverableInvocationError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Customer account not found&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Failed executions can't be resumed. You'd need to start a new execution. Design your workflows so that steps are idempotent in case you need to re-run from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Lambda versions for deploy safety
&lt;/h3&gt;

&lt;p&gt;If you update your function code while an execution is suspended, replay uses the version that started the execution. This prevents inconsistencies from code changes mid-workflow. &lt;code&gt;AutoPublishAlias&lt;/code&gt; handles this, but it's worth understanding why: if your new code changes a step's return shape or removes a step, replay on the old version still works because Lambda pins executions to their starting version.&lt;/p&gt;

&lt;p&gt;Version pinning protects your function code, but it doesn't protect external schemas. If an execution suspends on Monday waiting for a callback, and on Wednesday your ticketing system starts sending a different JSON structure in the callback payload, the Monday execution will fail when it resumes. Keep callback payloads backwards compatible for as long as executions can be in flight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plan your observability early
&lt;/h3&gt;

&lt;p&gt;For production workflows that can run for days, go beyond basic CloudWatch Logs. Set up CloudWatch alarms on stuck executions (no state change within expected timeframes), use the EventBridge integration to track execution lifecycle events, and consider CloudWatch Logs Insights queries for filtering by execution name across replays.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;p&gt;Durable functions and Step Functions are not competing. They solve different problems.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Durable Functions&lt;/th&gt;
&lt;th&gt;Step Functions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workflow definition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sequential code in your language&lt;/td&gt;
&lt;td&gt;Amazon States Language (JSON/YAML)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application logic, tightly coupled workflows&lt;/td&gt;
&lt;td&gt;Cross-service orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Service integrations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Via SDK in your code&lt;/td&gt;
&lt;td&gt;220+ native integrations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Debugging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CloudWatch Logs, execution history&lt;/td&gt;
&lt;td&gt;Visual console, step-by-step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single Lambda function&lt;/td&gt;
&lt;td&gt;State machine + Lambda functions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lambda concurrency limits&lt;/td&gt;
&lt;td&gt;Distributed Map for large-scale parallel processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mental model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Write code&lt;/td&gt;
&lt;td&gt;Design state machines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On cost: durable functions use standard Lambda pricing for active compute time. During waits, compute charges stop. Step Functions charges per state transition, which adds up for high-volume workflows. See &lt;a href="https://aws.amazon.com/lambda/pricing/" rel="noopener noreferrer"&gt;Lambda pricing&lt;/a&gt; and &lt;a href="https://aws.amazon.com/step-functions/pricing/" rel="noopener noreferrer"&gt;Step Functions pricing&lt;/a&gt; for current details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use durable functions when&lt;/strong&gt; your workflow is application logic that reads naturally as sequential code. Support ticket triage, approval workflows, AI agent loops, saga patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Step Functions when&lt;/strong&gt; you're orchestrating across multiple AWS services, need visual debugging, or need the 220+ native integrations. ETL pipelines, media processing, infrastructure provisioning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use both together&lt;/strong&gt; when you have a high-level orchestration (Step Functions) that delegates to individual workflows (durable functions) for complex application logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This post covered the fundamentals: what durable functions are, how checkpoint/replay works, and how to build and test a complete AI-powered workflow with human-in-the-loop callbacks. In the next post, I'll use durable functions to build a multi-agent orchestration workflow where multiple AI agents collaborate on complex tasks with checkpointed reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" rel="noopener noreferrer"&gt;AWS Lambda Durable Functions documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions/" rel="noopener noreferrer"&gt;AWS Blog: Build multi-step applications and AI workflows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws/aws-durable-execution-sdk-js" rel="noopener noreferrer"&gt;Durable Execution SDK for JavaScript (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-samples/sample-lambda-durable-functions" rel="noopener noreferrer"&gt;Sample Lambda Durable Functions (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;The RISEN Framework for AI Agent System Prompts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/durable-support-triage" rel="noopener noreferrer"&gt;Demo repository for this post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What workflows are you thinking about building with durable functions? Let me know in the comments!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>typescript</category>
      <category>ai</category>
    </item>
    <item>
      <title>Chaos Engineering for AWS Lambda: failure-lambda 1.0</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Thu, 12 Mar 2026 16:44:15 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/chaos-engineering-for-aws-lambda-failure-lambda-10-fpm</link>
      <guid>https://dev.to/gunnargrosch/chaos-engineering-for-aws-lambda-failure-lambda-10-fpm</guid>
      <description>&lt;p&gt;I wrote the first version of &lt;a href="https://github.com/gunnargrosch/failure-lambda" rel="noopener noreferrer"&gt;failure-lambda&lt;/a&gt; back in 2019. The idea was simple: inject faults into AWS Lambda functions so you can test how your system behaves when things go wrong. Latency spikes, exceptions, blocked network calls. The kind of failures that happen in production whether you're ready for them or not.&lt;/p&gt;

&lt;p&gt;That version worked. People used it. But the codebase was showing its age. JavaScript with no types. AWS SDK v2. A flat configuration format that only allowed one failure mode at a time. And it only worked with Node.js.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/gunnargrosch/failure-lambda" rel="noopener noreferrer"&gt;failure-lambda 1.0&lt;/a&gt; is a ground-up rewrite. TypeScript, AWS SDK v3, a feature flag configuration model, two new failure modes (&lt;code&gt;timeout&lt;/code&gt; and &lt;code&gt;corruption&lt;/code&gt;), and a Lambda Layer that brings fault injection to any managed runtime with zero code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Chaos Engineering for Lambda?
&lt;/h2&gt;

&lt;p&gt;If you're building on Lambda, you're building on a managed service. You don't manage servers, but you still manage dependencies. Your function calls DynamoDB, S3, third-party APIs, other microservices. Any of those can be slow, unreliable, or unavailable.&lt;/p&gt;

&lt;p&gt;The question isn't whether failures will happen. It's whether your system handles them gracefully when they do. Does your function retry correctly? Does your circuit breaker trip? Does your API return a useful error message instead of a 500?&lt;/p&gt;

&lt;p&gt;Failure injection lets you answer those questions before your users do. Enable latency injection and watch your downstream timeouts. Block a dependency with the denylist and see if your fallback logic works. Return a 503 and check that your retry policy backs off properly.&lt;/p&gt;

&lt;p&gt;The important part: you control when and how these failures happen. Start with one mode at a low percentage in a test environment. Increase gradually. Build confidence that your system does what you think it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's New
&lt;/h2&gt;

&lt;p&gt;The short version: everything. TypeScript with full type definitions. AWS SDK v3. Two new failure modes.&lt;/p&gt;

&lt;p&gt;The old format was flat: one failure mode, one rate, one toggle. The new format treats each mode as an independent feature flag. You can enable latency injection at 50% and DNS denylist at 100% simultaneously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"percentage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"min_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"max_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"denylist"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"deny_list"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3.*.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seven failure modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;latency&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Adds random delay between configured bounds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;timeout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sleeps until the function is about to time out&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;exception&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Throws an exception with a configurable message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;statuscode&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Returns a specific HTTP status code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;diskspace&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fills &lt;code&gt;/tmp&lt;/code&gt; to consume available disk space&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;denylist&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Blocks network calls to matching hostnames&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;corruption&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Mangles the response body after the handler returns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Beyond the modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Event-based targeting.&lt;/strong&gt; Match conditions restrict injection to specific requests. Only corrupt GET requests to the prod stage. Only add latency to requests hitting &lt;code&gt;/api&lt;/code&gt;. Conditions support exact match, exists, startsWith, and regex operators.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"min_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"match"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"requestContext.http.path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"operator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"startsWith"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/api"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AppConfig Feature Flags.&lt;/strong&gt; Native support for the &lt;code&gt;AWS.AppConfig.FeatureFlags&lt;/code&gt; profile type. AppConfig gives you deployment strategies and automatic rollback, useful when you don't want an accidental "enable all failures at 100%" to take down your environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Middy middleware.&lt;/strong&gt; If you use &lt;a href="https://middy.js.org/" rel="noopener noreferrer"&gt;Middy&lt;/a&gt;, import &lt;code&gt;failure-lambda/middy&lt;/code&gt; instead of wrapping your handler.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CLI.&lt;/strong&gt; &lt;code&gt;npx failure-lambda&lt;/code&gt; gives you an interactive CLI for managing failure configuration. Check status, enable modes, disable everything. Supports both SSM and AppConfig backends and saves connection profiles so you don't retype region and parameter names every time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Lambda Layer
&lt;/h2&gt;

&lt;p&gt;This is the biggest addition. The npm package requires you to import &lt;code&gt;failure-lambda&lt;/code&gt; and wrap your handler. That's fine for Node.js, but it doesn't help if your Lambda functions are written in Python, Java, .NET, or Ruby.&lt;/p&gt;

&lt;p&gt;The Lambda Layer solves this. Add the layer to your function, set two environment variables, and fault injection works without touching your code. No imports, no wrapper, no middleware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;AWS_LAMBDA_EXEC_WRAPPER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/opt/failure-lambda-wrapper
&lt;span class="nv"&gt;FAILURE_INJECTION_PARAM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;failureLambdaConfig
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, it's a lightweight Rust proxy that sits between your handler and the Lambda Runtime API. Single static binary, no runtime dependencies, negligible cold start impact. On each invocation, the proxy reads your failure configuration from SSM or AppConfig and decides whether to inject faults before or after forwarding the request to your handler. Your code never knows it's there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────┐
│ Lambda Execution Environment                    │
│                                                 │
│  Lambda Runtime API                             │
│       │                                         │
│       ▼                                         │
│  failure-lambda proxy (Rust)                    │
│       │                                         │
│       ├── Read config from SSM / AppConfig      │
│       ├── Inject fault? ──yes──▶ Return early   │
│       │                         (statuscode,    │
│       │                          exception,     │
│       │                          latency)       │
│       no                                        │
│       │                                         │
│       ▼                                         │
│  Your handler (any runtime)                     │
│       │                                         │
│       ├── Inject fault? ──yes──▶ Modify response│
│       │                         (corruption)    │
│       no                                        │
│       │                                         │
│       ▼                                         │
│  Response returned                              │
└─────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The layer works with all managed Lambda runtimes that support &lt;code&gt;AWS_LAMBDA_EXEC_WRAPPER&lt;/code&gt;: Node.js, Python, Java, .NET, and Ruby. Both x86_64 and arm64 architectures. Custom runtimes can use the layer if they support &lt;code&gt;AWS_LAMBDA_EXEC_WRAPPER&lt;/code&gt;: the runtime bootstrap must check for the variable and invoke the specified executable before starting its own runtime loop. Download the zip from the &lt;a href="https://github.com/gunnargrosch/failure-lambda/releases/" rel="noopener noreferrer"&gt;GitHub release&lt;/a&gt;, publish it to your account, and you're ready to go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started: npm Package
&lt;/h2&gt;

&lt;p&gt;For Node.js, the npm package gives you the most control.&lt;/p&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 20 or later&lt;/li&gt;
&lt;li&gt;An AWS account with permissions to create SSM parameters and Lambda functions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ssm:GetParameter&lt;/code&gt; granted to the function's execution role
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;failure-lambda
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wrap your handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;failureLambda&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failure-lambda&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;failureLambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create an SSM parameter with your failure configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ssm put-parameter &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; failureLambdaConfig &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt; String &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--value&lt;/span&gt; &lt;span class="s1"&gt;'{"latency": {"enabled": false, "min_latency": 100, "max_latency": 400}}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set the &lt;code&gt;FAILURE_INJECTION_PARAM&lt;/code&gt; environment variable on your Lambda function to the parameter name, grant &lt;code&gt;ssm:GetParameter&lt;/code&gt;, and deploy.&lt;/p&gt;

&lt;p&gt;When you're ready to inject a fault:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx failure-lambda &lt;span class="nb"&gt;enable &lt;/span&gt;latency &lt;span class="nt"&gt;--param&lt;/span&gt; failureLambdaConfig &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Getting Started: Lambda Layer
&lt;/h2&gt;

&lt;p&gt;For any runtime, the layer path requires no code changes.&lt;/p&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account with permissions to publish Lambda layers and create Lambda functions&lt;/li&gt;
&lt;li&gt;AWS CLI configured&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ssm:GetParameter&lt;/code&gt; granted to the function's execution role&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Download &lt;code&gt;failure-lambda-layer-x86_64.zip&lt;/code&gt; or &lt;code&gt;failure-lambda-layer-aarch64.zip&lt;/code&gt; from the &lt;a href="https://github.com/gunnargrosch/failure-lambda/releases/" rel="noopener noreferrer"&gt;latest release&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Publish the layer:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws lambda publish-layer-version &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--layer-name&lt;/span&gt; failure-lambda &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zip-file&lt;/span&gt; fileb://failure-lambda-layer-x86_64.zip &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compatible-architectures&lt;/span&gt; x86_64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the layer ARN to your function, set &lt;code&gt;AWS_LAMBDA_EXEC_WRAPPER=/opt/failure-lambda-wrapper&lt;/code&gt; and &lt;code&gt;FAILURE_INJECTION_PARAM=failureLambdaConfig&lt;/code&gt;, create the SSM parameter, and grant &lt;code&gt;ssm:GetParameter&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. Your Python handler, your Java handler, your .NET handler: they all get the same fault injection capabilities without a single line of code changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Out: Injecting Faults Step by Step
&lt;/h2&gt;

&lt;p&gt;Let's walk through a concrete example. We'll deploy a simple function with the layer, verify it works normally, inject latency, inject a status code error, and then turn everything off. The whole thing uses a standard Node.js handler with zero failure-lambda code in it.&lt;/p&gt;

&lt;p&gt;Here's the handler. It simulates a quick database lookup and returns the response time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Order processed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SAM template adds the layer and sets the two required environment variables. Nothing else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;DemoFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
  &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;index.handler&lt;/span&gt;
    &lt;span class="na"&gt;Runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nodejs22.x&lt;/span&gt;
    &lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
    &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="na"&gt;Layers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;FailureLambdaLayerArn&lt;/span&gt;
    &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;AWS_LAMBDA_EXEC_WRAPPER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/opt/failure-lambda-wrapper&lt;/span&gt;
        &lt;span class="na"&gt;FAILURE_INJECTION_PARAM&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;FailureConfig&lt;/span&gt;
    &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;SSMParameterReadPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;ParameterName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;FailureConfig&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full template including &lt;code&gt;Parameters&lt;/code&gt; definitions is in the &lt;a href="https://github.com/gunnargrosch/failure-lambda/tree/main/examples" rel="noopener noreferrer"&gt;examples directory&lt;/a&gt;. The snippet above shows only the function resource for clarity.&lt;/p&gt;

&lt;p&gt;After deploying with &lt;code&gt;sam build &amp;amp;&amp;amp; sam deploy --guided&lt;/code&gt;, we hit the endpoint a few times to see steady state:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The examples below use &lt;code&gt;curl -s -o - -w '   HTTP %{http_code} | %{time_total}s\n'&lt;/code&gt; to append status code and total request time to each response. Standard curl won't show this without the &lt;code&gt;-w&lt;/code&gt; flag.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":5}   HTTP 200 | 0.21s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":5}   HTTP 200 | 0.19s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":6}   HTTP 200 | 0.19s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5ms handler duration, ~190ms end-to-end on warm invocations. That's our baseline. Now let's see what happens when conditions aren't ideal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Injecting latency
&lt;/h3&gt;

&lt;p&gt;A third-party API starts responding slowly. A DynamoDB table is throttling. A downstream microservice is under load. These are real scenarios, and you want to know how your system behaves before they happen in production.&lt;/p&gt;

&lt;p&gt;Update the SSM parameter to add 500-1000ms of random latency on every invocation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ssm put-parameter &lt;span class="nt"&gt;--name&lt;/span&gt; &amp;lt;your-param-name&amp;gt; &lt;span class="nt"&gt;--type&lt;/span&gt; String &lt;span class="nt"&gt;--overwrite&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--value&lt;/span&gt; &lt;span class="s1"&gt;'{"latency": {"enabled": true, "percentage": 100, "min_latency": 500, "max_latency": 1000}}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Configuration is cached for 60 seconds. If you update the SSM parameter and immediately hit the endpoint, you'll see the old behavior. Wait for the cache to refresh before testing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Wait about 60 seconds for the configuration cache to refresh, then hit the endpoint again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":11}   HTTP 200 | 0.99s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":5}    HTTP 200 | 1.08s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":5}    HTTP 200 | 1.14s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Responses jumped from ~190ms to over a second. The handler itself still runs in 5ms: the latency is injected by the proxy before the handler executes. This simulates what happens when a dependency responds slowly. Does your API Gateway timeout kick in at the right threshold? Do callers retry or give up? Does a slow function cause a queue to back up?&lt;/p&gt;

&lt;h3&gt;
  
  
  Injecting a status code error
&lt;/h3&gt;

&lt;p&gt;Latency is one thing. Complete failure is another. A downstream service returning 5xx errors, an expired API key, a misconfigured endpoint: all of these surface as error responses. Replace the config with a 503 Service Unavailable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ssm put-parameter &lt;span class="nt"&gt;--name&lt;/span&gt; &amp;lt;your-param-name&amp;gt; &lt;span class="nt"&gt;--type&lt;/span&gt; String &lt;span class="nt"&gt;--overwrite&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--value&lt;/span&gt; &lt;span class="s1"&gt;'{"statuscode": {"enabled": true, "percentage": 100, "status_code": 503}}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the cache refreshes (up to 60 seconds):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Injected status code 503"}   HTTP 503 | 0.31s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Injected status code 503"}   HTTP 503 | 0.21s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Injected status code 503"}   HTTP 503 | 0.18s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The handler never runs. The proxy short-circuits the invocation and returns a 503 directly. This represents a function that's failing for any reason: a permissions change, a missing environment variable, an unhandled exception. Does your frontend show a useful error message or a blank page? Does your Step Functions workflow retry or fail the entire execution? Does your monitoring alert you?&lt;/p&gt;

&lt;h3&gt;
  
  
  What the logs show
&lt;/h3&gt;

&lt;p&gt;The proxy writes structured JSON logs to CloudWatch. You can see exactly what it's doing on each invocation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"config_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"ssm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"enabled_flags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"[]"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"config_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"ssm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"enabled_flags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;latency&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"inject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;670&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"min_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;500.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"max_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1000.0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"inject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;859&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"min_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;500.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"max_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1000.0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"latency"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"inject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;933&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"min_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;500.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"max_latency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1000.0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"config_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"ssm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"enabled_flags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;statuscode&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"statuscode"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"inject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"status_code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"statuscode"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"inject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"status_code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"failure-lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"statuscode"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"inject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"status_code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every config fetch and every injection is logged with the mode, action, and parameters. You can query these in CloudWatch Logs Insights:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fields @timestamp, mode, action
| filter source = "failure-lambda"
| sort @timestamp desc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Turning it off
&lt;/h3&gt;

&lt;p&gt;The CLI can disable everything in one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx failure-lambda disable &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--param&lt;/span&gt; &amp;lt;your-param-name&amp;gt; &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or set the parameter back to an empty config manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ssm put-parameter &lt;span class="nt"&gt;--name&lt;/span&gt; &amp;lt;your-param-name&amp;gt; &lt;span class="nt"&gt;--type&lt;/span&gt; String &lt;span class="nt"&gt;--overwrite&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--value&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the cache refreshes, everything is back to normal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":5}   HTTP 200 | 0.33s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":5}   HTTP 200 | 0.23s

$ curl https://ij58ovr5s5.execute-api.eu-west-1.amazonaws.com/
{"message":"Order processed","duration_ms":5}   HTTP 200 | 0.19s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No redeployment. No code changes. The proxy saw the empty config, disabled injection, and passed everything through.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleaning up
&lt;/h3&gt;

&lt;p&gt;To remove everything deployed in this walkthrough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sam delete
aws ssm delete-parameter &lt;span class="nt"&gt;--name&lt;/span&gt; &amp;lt;your-param-name&amp;gt; &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
aws lambda delete-layer-version &lt;span class="nt"&gt;--layer-name&lt;/span&gt; failure-lambda &lt;span class="nt"&gt;--version-number&lt;/span&gt; &amp;lt;version&amp;gt; &lt;span class="nt"&gt;--region&lt;/span&gt; eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Learn
&lt;/h2&gt;

&lt;p&gt;The walkthrough above uses a single function. A real application with multiple functions, queues, and dependencies will surface more. But even one function reveals things about your system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timeout behavior&lt;/strong&gt; (&lt;code&gt;latency&lt;/code&gt;, &lt;code&gt;timeout&lt;/code&gt;). Are your Lambda timeouts, API Gateway timeouts, and client timeouts configured consistently? Latency injection exposes mismatches fast. A function with a 10-second timeout behind an API Gateway with a 3-second timeout will fail in ways that look intermittent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry and backoff&lt;/strong&gt; (&lt;code&gt;statuscode&lt;/code&gt;, &lt;code&gt;exception&lt;/code&gt;). When a function returns a 503, do callers retry with exponential backoff or hammer the endpoint? Do SQS redrive policies work as configured? Injecting errors at a percentage less than 100% lets you see if partial failures are handled differently than complete outages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error propagation&lt;/strong&gt; (&lt;code&gt;statuscode&lt;/code&gt;, &lt;code&gt;exception&lt;/code&gt;). Does a failure in one function produce a clear error message at the API boundary, or does it cascade into a generic 500? Injecting status codes and exceptions at different points in a call chain shows you exactly where error context gets lost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alerting and observability&lt;/strong&gt; (any mode). Do your CloudWatch alarms fire? Do they fire quickly enough? Injecting faults and watching your dashboards is the most direct way to validate your monitoring. If you don't get paged during a controlled experiment, you won't get paged during an incident.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback behavior&lt;/strong&gt; (&lt;code&gt;denylist&lt;/code&gt;). If you've built fallback logic for when a dependency is unavailable, does it actually work? The denylist mode blocks specific hostnames, so you can test what happens when S3 or DynamoDB is unreachable without affecting other dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capacity and scaling&lt;/strong&gt; (&lt;code&gt;latency&lt;/code&gt;). What happens when latency increases and concurrent executions climb? Do you hit reserved concurrency limits? Does a slow function cause upstream queues to grow? These are the kinds of cascading effects that are hard to predict and easy to test.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point of chaos engineering isn't to cause outages. It's to discover how your system responds to conditions that will eventually occur, in a controlled way, before your users encounter them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Injection isn't happening
&lt;/h3&gt;

&lt;p&gt;The most common cause is a missing &lt;code&gt;ssm:GetParameter&lt;/code&gt; permission on the function's execution role. Check CloudWatch Logs for a permission denied error from the proxy. The second most common cause is the configuration cache: changes take up to 60 seconds to take effect. If you've just updated the SSM parameter, wait for the next cache refresh before concluding injection isn't working.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture mismatch
&lt;/h3&gt;

&lt;p&gt;If you publish the x86_64 layer and attach it to an arm64 function (or vice versa), the proxy binary won't execute. Download the correct zip for your function's architecture: &lt;code&gt;failure-lambda-layer-x86_64.zip&lt;/code&gt; or &lt;code&gt;failure-lambda-layer-aarch64.zip&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS_LAMBDA_EXEC_WRAPPER has no effect
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;AWS_LAMBDA_EXEC_WRAPPER&lt;/code&gt; mechanism is built into managed Lambda runtimes. Custom runtimes need to explicitly support it by checking for the variable and invoking the wrapper before starting their own runtime loop. If you're using a custom runtime that doesn't implement this, the layer won't intercept anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decisions
&lt;/h2&gt;

&lt;p&gt;A few things I learned building this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feature flags over a single toggle
&lt;/h3&gt;

&lt;p&gt;The original version had one failure mode active at a time. That's not how production fails. In the real world, you might have a slow dependency and a flaky DNS resolution at the same time. The feature flag model lets you compose failures. Each mode is independent with its own percentage, so you can build realistic failure scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Rust for the layer
&lt;/h3&gt;

&lt;p&gt;The proxy sits in the critical path of every Lambda invocation. It needs to be fast with minimal memory overhead. Rust was the natural choice: predictable performance, no garbage collector pauses, and the single binary keeps the layer small.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caching with a purpose
&lt;/h3&gt;

&lt;p&gt;Every invocation used to call SSM to get the configuration. That's unnecessary latency and API costs. The library now caches SSM responses for 60 seconds by default. For AppConfig, the Lambda extension already handles caching, so the library disables its own cache entirely to avoid staleness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Validation that fails closed
&lt;/h3&gt;

&lt;p&gt;If your configuration JSON is malformed or has invalid values, the library logs a clear error and disables injection. It doesn't crash your function and it doesn't silently inject with bad parameters. Regex patterns in denylist rules are checked for nested quantifiers to prevent ReDoS.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;failure-lambda 1.0 brings TypeScript, a feature flag configuration model, seven failure modes, and a Lambda Layer that works across all managed runtimes without touching your code. This release covers the core use cases I've seen in practice. There are things I'd like to explore next: more granular targeting with Lambda function aliases, integration with AWS Fault Injection Service, and better observability into what's being injected across a fleet of functions. If you have ideas, &lt;a href="https://github.com/gunnargrosch/failure-lambda/issues" rel="noopener noreferrer"&gt;open an issue&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/failure-lambda" rel="noopener noreferrer"&gt;failure-lambda on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/failure-lambda" rel="noopener noreferrer"&gt;failure-lambda on npm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/failure-lambda/releases/" rel="noopener noreferrer"&gt;Lambda Layer download (GitHub Release)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/failure-lambda/tree/main/examples" rel="noopener noreferrer"&gt;Example applications (SAM, CDK, Serverless Framework, Layer)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hackernoon.com/chaos-engineering-and-aws-lambda-latency-injection-ddeb4ff8d983" rel="noopener noreferrer"&gt;Yan Cui on latency injection for Lambda&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/adhorn/aws-lambda-chaos-injection/" rel="noopener noreferrer"&gt;Adrian Hornsby's chaos injection library for Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What failure scenarios have you tested in your serverless applications? I'm curious what surprises people find with timeout mismatches and retry behavior. Let me know in the comments.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>testing</category>
      <category>devops</category>
    </item>
    <item>
      <title>Can It Run DOOM? Playing DOOM in Claude Code with doom-mcp</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Wed, 11 Mar 2026 02:04:03 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/can-it-run-doom-playing-doom-in-claude-code-with-doom-mcp-1ck1</link>
      <guid>https://dev.to/gunnargrosch/can-it-run-doom-playing-doom-in-claude-code-with-doom-mcp-1ck1</guid>
      <description>&lt;p&gt;DOOM runs on everything. Calculators, ATMs, pregnancy tests, an oscilloscope you bought at a garage sale. It was only a matter of time before AI coding assistants joined the list. Now they have.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;doom-mcp&lt;/code&gt; is an MCP server that embeds the real DOOM engine and exposes it to Claude Code, Kiro, Cursor, or any MCP client through three tools. The AI sees the game, fights demons, navigates levels, and reports what's happening. You can direct it or let it play on its own.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Let's play DOOM!

  Starting E1M1 on medium difficulty...

  HP:100 Armor:0 | Pistol (key:2) | Ammo: 50b 0s 0r 0c | Kills:0 | (-416,256) 0deg | E1M1
  No enemies in sight. Exploring the corridor ahead.

&amp;gt; Go forward and find some enemies

  Sprinting down the hallway... I can see an opening ahead.

  HP:100 Armor:0 | Pistol (key:2) | Ammo: 50b 0s 0r 0c | Kills:0 | (0,256) 0deg | E1M1
  ENEMIES IN SIGHT (aim for angle~0 then fire): Imp (HP:60) to your left (turn_left ~9) nearby

  There's an Imp to the left! Turning to face it...

&amp;gt; Shoot it!

  Aimed and firing!

  HP:100 Armor:0 | Pistol (key:2) | Ammo: 47b 0s 0r 0c | Kills:0 | (0,256) 30deg | E1M1
  ENEMIES IN SIGHT (aim for angle~0 then fire): Imp (HP:15) AHEAD close

  Hit! The Imp is wounded (HP:15). One more burst should finish it.

  HP:95 Armor:0 | Pistol (key:2) | Ammo: 44b 0s 0r 0c | Kills:1 | (0,256) 30deg | E1M1
  ITEMS: Stimpack (+10 HP) slightly to your right (turn_right ~4) point-blank (~3 ticks fwd+run to reach)
  1 kill! Want to see a screenshot?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Register with Claude Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add doom &lt;span class="nt"&gt;--scope&lt;/span&gt; user &lt;span class="nt"&gt;--&lt;/span&gt; npx &lt;span class="nt"&gt;-y&lt;/span&gt; doom-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Kiro, Cursor, Windsurf, or any other MCP client, add to &lt;code&gt;.mcp.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"doom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stdio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"doom-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Play
&lt;/h3&gt;

&lt;p&gt;Open a new session and say: "Let's play DOOM"&lt;/p&gt;

&lt;p&gt;The AI will ask which mode you want, start on E1M1, and begin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Play Modes
&lt;/h2&gt;

&lt;p&gt;Two ways to play:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User-directed&lt;/strong&gt;: You give commands ("go forward", "open that door", "shoot the imp"). The AI executes one action at a time and describes what happens. Good for a text-adventure feel where you call the shots and the AI handles the execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autonomous&lt;/strong&gt;: The AI makes all decisions: movement, combat, exploration. You watch and intervene if you want. It's genuinely entertaining to watch it work through a level, spot an Imp, and decide whether to charge or take cover.&lt;/p&gt;

&lt;h2&gt;
  
  
  WAD Files
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;doom-mcp&lt;/code&gt; ships with Freedoom out of the box. Freedoom is a free and open-source replacement IWAD (DOOM's game data format) with its own levels and enemy designs. If you want the original id Software levels, enemies, and atmosphere, the shareware &lt;code&gt;DOOM1.WAD&lt;/code&gt; is free to download legally. Set the path in your MCP config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"doom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stdio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"doom-mcp"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"DOOM_WAD_PATH"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/DOOM1.WAD"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you own DOOM or DOOM 2, those WADs work the same way.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Few Things I Learned Building This
&lt;/h2&gt;

&lt;h3&gt;
  
  
  FFI over subprocess
&lt;/h3&gt;

&lt;p&gt;The obvious approach is to run DOOM as a child process and communicate via pipes. The problem is timing and synchronization: you're fighting the engine's internal clock, process startup overhead, and serialization on every frame.&lt;/p&gt;

&lt;p&gt;Instead, &lt;code&gt;doom-mcp&lt;/code&gt; embeds doomgeneric (a portable C implementation of the DOOM engine) directly via Rust FFI (Foreign Function Interface). Rust was the right choice here: it has excellent FFI support for C code, compiles to a single native binary, and gives memory safety without a garbage collector that could interrupt the game loop. No subprocess spawning, no pipes. Each tool call advances the engine by calling &lt;code&gt;doomgeneric_Tick()&lt;/code&gt; directly and reading the frame buffer in-memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Virtual time
&lt;/h3&gt;

&lt;p&gt;DOOM normally ties its game clock to wall time. That's fine for a real-time player, but it's wrong for an AI that might take 500ms to decide its next move. Without intervention, the engine would skip ticks during the AI's thinking time and produce non-deterministic behavior.&lt;/p&gt;

&lt;p&gt;The solution is to decouple the engine's clock from wall time entirely. Each &lt;code&gt;doomgeneric_Tick()&lt;/code&gt; call advances exactly one game tic (1/35th of a second) regardless of how much real time has passed. Gameplay is fully deterministic: the same sequence of actions always produces the same result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Line-of-sight, not wallhack
&lt;/h3&gt;

&lt;p&gt;Enemy detection could just iterate the object list and report everything on the map. That would be cheating in a way that makes the game too easy and less interesting.&lt;/p&gt;

&lt;p&gt;Instead, the server performs a proper line-of-sight check for each enemy, the same check the DOOM engine uses internally (&lt;code&gt;P_CheckSight()&lt;/code&gt;). The AI only sees enemies it could see if it were a human looking at the screen. When an enemy moves behind a wall or around a corner, it drops from the AI's view immediately. It still needs to explore to find things.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the AI gets per action
&lt;/h3&gt;

&lt;p&gt;Each &lt;code&gt;doom_action&lt;/code&gt; call returns structured game state alongside a small PNG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HP, armor, ammo by type, current weapon, kill count&lt;/li&gt;
&lt;li&gt;Visible enemies with human-readable direction and distance (&lt;code&gt;Imp (HP:60) to your left (turn_left ~9) nearby&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Nearby items within pickup range, with CLOSING/RECEDING indicators&lt;/li&gt;
&lt;li&gt;Nearby doors and switches (&lt;code&gt;NEARBY: Door AHEAD (use to activate)&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;A 200x125 thumbnail PNG using a 216-color palette for inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The structured data gives the AI something it can reason about without having to interpret pixel-level vision. The image fills in the spatial context. Together they let the AI make reasonable decisions: "Imp to my left, nearby, HP 60 — turn left and fire."&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools Reference
&lt;/h2&gt;

&lt;h3&gt;
  
  
  doom_start
&lt;/h3&gt;

&lt;p&gt;Starts or restarts a game. Safe to call at any time.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;skill&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int 1-5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1=baby, 2=easy, 3=medium, 4=hard, 5=nightmare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;episode&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int 1-4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Episode number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;map&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int 1-9&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Map number&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  doom_action
&lt;/h3&gt;

&lt;p&gt;Advances the game by executing actions for a number of ticks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Required&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;actions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;Comma-separated: &lt;code&gt;forward&lt;/code&gt;, &lt;code&gt;backward&lt;/code&gt;, &lt;code&gt;turn_left&lt;/code&gt;, &lt;code&gt;turn_right&lt;/code&gt;, &lt;code&gt;strafe_left&lt;/code&gt;, &lt;code&gt;strafe_right&lt;/code&gt;, &lt;code&gt;fire&lt;/code&gt;, &lt;code&gt;use&lt;/code&gt;, &lt;code&gt;run&lt;/code&gt;, &lt;code&gt;1&lt;/code&gt;-&lt;code&gt;7&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ticks&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int 1-105&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;Ticks to advance. Default 7. 7 ticks ≈ 0.2s, 35 ticks ≈ 1s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Weapon keys: 1=fists, 2=pistol, 3=shotgun, 4=chaingun, 5=rocket launcher, 6=plasma, 7=BFG.&lt;/p&gt;

&lt;h3&gt;
  
  
  doom_screenshot
&lt;/h3&gt;

&lt;p&gt;Saves a full-resolution 320x200 screenshot to the system temp directory and opens it in the default image viewer. Does not advance the game. Note: the viewer launch will fail silently on headless systems or SSH sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Well Does It Actually Play?
&lt;/h2&gt;

&lt;p&gt;Realistically: well enough to be fun. On E1M1 at medium difficulty, it gets 5-10 kills in a typical 50-action session. It can navigate corridors, spot enemies, aim, and fire. It struggles with enemies behind partial cover and complex door sequences.&lt;/p&gt;

&lt;p&gt;It improves significantly when you direct it. "There's an Imp to your left" turns a wandering AI into a focused combatant. The user-directed mode is where most of the entertainment is. Two AI agents in deathmatch is the obvious next experiment, and the architecture could extend to other doomgeneric-compatible titles: Heretic, Hexen, DOOM II.&lt;/p&gt;

&lt;p&gt;The token cost is real: each action call is roughly 1,500-2,500 total tokens (input and output combined: game state text plus the PNG). A 50-action session is 75,000-125,000 tokens, which works out to roughly $0.50-2.00 depending on your model. Worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/gunnargrosch/doom-mcp" rel="noopener noreferrer"&gt;doom-mcp on GitHub&lt;/a&gt;: Source, docs, and examples&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/doom-mcp" rel="noopener noreferrer"&gt;doom-mcp on npm&lt;/a&gt;: Package page&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ozkl/doomgeneric" rel="noopener noreferrer"&gt;doomgeneric&lt;/a&gt; by ozkl: The portable DOOM engine this is built on&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://freedoom.github.io/" rel="noopener noreferrer"&gt;Freedoom&lt;/a&gt;: The open-source IWAD that ships with the package&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.doomworld.com/classicdoom/info/shareware.php" rel="noopener noreferrer"&gt;DOOM1.WAD shareware download&lt;/a&gt;: The original shareware episode, free and legal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real question was never whether it could run DOOM. It's what you do with it now that it can. Let me know in the comments how far you get.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>ai</category>
      <category>gaming</category>
      <category>rust</category>
    </item>
    <item>
      <title>Circuit Breakers on AWS Lambda: Why In-Memory State Silently Fails</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Mon, 09 Mar 2026 21:53:44 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/circuit-breakers-on-aws-lambda-why-in-memory-state-silently-fails-edh</link>
      <guid>https://dev.to/gunnargrosch/circuit-breakers-on-aws-lambda-why-in-memory-state-silently-fails-edh</guid>
      <description>&lt;p&gt;You added a circuit breaker to your Lambda function. It compiles, your tests pass, and it works correctly in local testing. But it's silently useless. The problem isn't the implementation. It's an assumption every in-memory circuit breaker makes that doesn't hold on Lambda.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Circuit Breakers Do
&lt;/h2&gt;

&lt;p&gt;The circuit breaker pattern comes from Michael Nygard's &lt;em&gt;Release It!&lt;/em&gt; and is named after the electrical component. Think about the services your Lambda functions actually call: a payment processor, a third-party enrichment API, a database under load, another service in your own fleet. Anything external your function depends on is a downstream service, and any of them can start responding slowly or fail outright. Slow is often worse than down. A dependency that takes 10 seconds to time out costs you 10 seconds of held concurrency per call, not a fast failure you can handle gracefully.&lt;/p&gt;

&lt;p&gt;That concurrency cost is the Lambda-specific reason to care. When a downstream call hangs, your function holds a concurrency unit. At 100 concurrent executions and a 10-second timeout, one flaky dependency can saturate your function in seconds, throttling every other request, including the ones with nothing to do with the sick service. The cascade happens fast: payment API slows down → order function saturates concurrency → order requests fail → the service calling orders backs up → users see errors across your entire checkout flow.&lt;/p&gt;

&lt;p&gt;When a downstream service starts failing, a circuit breaker stops calling it entirely, returns a fallback response immediately, and probes for recovery. It also gives the downstream service breathing room: instead of a flood of timeouts hammering something that's already struggling, it gets near-silence while the circuit is open. The naming follows the electrical analogy: a closed circuit is complete and current flows; an open circuit is broken and nothing gets through.&lt;/p&gt;

&lt;p&gt;Three states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLOSED&lt;/strong&gt;: &lt;strong&gt;Normal operation.&lt;/strong&gt; Calls go through. Failures are counted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OPEN&lt;/strong&gt;: &lt;strong&gt;Circuit tripped.&lt;/strong&gt; Calls fail fast without reaching the downstream service. A timeout runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HALF-OPEN&lt;/strong&gt;: &lt;strong&gt;One trial call allowed.&lt;/strong&gt; If it succeeds, the circuit closes. If it fails, it reopens with a longer timeout.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem is how Lambda runs code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why In-Memory State Fails on Lambda
&lt;/h2&gt;

&lt;p&gt;Lambda's concurrency model is built around isolated execution environments. From the AWS documentation: "For each concurrent request, Lambda provisions a separate instance of your execution environment." Two simultaneous invocations of the same function run in two separate environments with completely independent memory spaces. There is no shared memory between them.&lt;/p&gt;

&lt;p&gt;Consider what this means for a circuit breaker with a failure threshold of 5. Your function is receiving 50 concurrent requests. A downstream service starts failing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Execution environment 1 takes a request. The call fails. Its local failure count: 1/5.&lt;/li&gt;
&lt;li&gt;Execution environment 2 takes a request. The call fails. Its local failure count: 1/5.&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;li&gt;Execution environment 50 takes a request. The call fails. Its local failure count: 1/5.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;50 failures have hit the downstream service. No circuit has opened. Each environment has counted 1 failure and needs 4 more before it does anything. Meanwhile, all 50 environments continue sending requests to a service that is already failing. In the worst case, where traffic distributes evenly across environments, you need up to 250 total failures before any single execution environment opens its circuit.&lt;/p&gt;

&lt;p&gt;And that's assuming the same 50 execution environments handle all the traffic. Lambda scales by adding new execution environments as load increases. Each new environment starts with a failure count of zero. As long as traffic grows and new environments spin up, the fleet will always have environments that haven't seen enough failures to open. The circuit can never effectively protect you across the fleet.&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical. The most widely-used Node.js circuit breaker libraries (opossum, cockatiel) store state in process memory. They work correctly in a single-process server where all traffic goes through one circuit. They don't work for Lambda's distributed execution model. opossum does provide state export and import hooks (&lt;code&gt;toJSON()&lt;/code&gt;) specifically documented for serverless environments, but these don't solve the cross-environment isolation problem: each environment still starts from whatever state you restore, not a live shared view of current circuit state.&lt;/p&gt;

&lt;p&gt;Provisioned Concurrency reduces but doesn't eliminate this problem. PC keeps a fixed number of execution environments initialized and warm, so they accumulate local failure counts across more requests than standard on-demand environments. But they're still isolated from each other, and scaling events still add fresh environments that start at zero. In-memory state is less useless with PC, but it's still wrong at any meaningful concurrency level.&lt;/p&gt;

&lt;p&gt;Lambda also periodically terminates execution environments for runtime maintenance and updates, even for continuously invoked functions. An environment accumulating failure counts can be replaced with a fresh one starting at zero at any time, adding another layer of unreliability to in-memory state.&lt;/p&gt;

&lt;p&gt;Lambda Managed Instances (launched at re:Invent 2025) are an exception: they support multiple concurrent invocations per environment, so in-memory state accumulates across requests within the same environment. The argument above applies to standard Lambda functions, which remain the default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared State Across Execution Environments
&lt;/h2&gt;

&lt;p&gt;The fix is to store circuit state in a shared external store. When execution environment 1 records a failure, execution environment 2 sees it. When any environment opens the circuit, every environment stops calling the downstream service. Yes, this adds a network call to every invocation. The Performance and Cost sections have the numbers. For most workloads the overhead is small, and &lt;code&gt;CachedProvider&lt;/code&gt; can reduce it further. ElastiCache (Valkey) is the fastest option (sub-millisecond reads) and is the right choice if your functions are already in a VPC. DynamoDB is the right default for most Lambda workloads: no VPC required, single-digit millisecond latency, and it supports atomic operations and conditional writes for concurrent safety. Adding a VPC solely for circuit breaker state adds deployment complexity and a modest cold start overhead, which isn't worth it unless you're already VPC-attached.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;circuitbreaker-lambda&lt;/code&gt; is an open-source library I built that takes the DynamoDB path. It stores circuit state in DynamoDB and shares it across all execution environments running the same function.&lt;/p&gt;

&lt;p&gt;Two paths to choose from:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;npm package&lt;/th&gt;
&lt;th&gt;Lambda Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runtimes&lt;/td&gt;
&lt;td&gt;Node.js 20+&lt;/td&gt;
&lt;td&gt;Any managed runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integration&lt;/td&gt;
&lt;td&gt;Import library&lt;/td&gt;
&lt;td&gt;HTTP calls to local sidecar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold start overhead&lt;/td&gt;
&lt;td&gt;~50ms&lt;/td&gt;
&lt;td&gt;~350ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both paths share the same DynamoDB state schema, so a Node.js function using the npm package and a Python function using the Layer can share circuit state for the same downstream service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started: npm Package
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;circuitbreaker-lambda
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requires Node.js 20+.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a DynamoDB table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws dynamodb create-table &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--table-name&lt;/span&gt; circuitbreaker-table &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attribute-definitions&lt;/span&gt; &lt;span class="nv"&gt;AttributeName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;,AttributeType&lt;span class="o"&gt;=&lt;/span&gt;S &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--key-schema&lt;/span&gt; &lt;span class="nv"&gt;AttributeName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;,KeyType&lt;span class="o"&gt;=&lt;/span&gt;HASH &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--billing-mode&lt;/span&gt; PAY_PER_REQUEST
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Set the environment variable
&lt;/h3&gt;

&lt;p&gt;Add &lt;code&gt;CIRCUITBREAKER_TABLE&lt;/code&gt; as a Lambda environment variable. In a SAM template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;CIRCUITBREAKER_TABLE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;circuitbreaker-table&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Grant the function access to DynamoDB
&lt;/h3&gt;

&lt;p&gt;The function needs &lt;code&gt;GetItem&lt;/code&gt; and &lt;code&gt;UpdateItem&lt;/code&gt; on the table. In a SAM template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
        &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;dynamodb:GetItem&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;dynamodb:UpdateItem&lt;/span&gt;
        &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;CircuitBreakerTable.Arn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use it in your handler
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CircuitBreaker&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;circuitbreaker-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// callDownstreamService is any async function that calls your downstream service.&lt;/span&gt;
&lt;span class="c1"&gt;// It should throw on failure; the circuit breaker catches the throw and counts it.&lt;/span&gt;
&lt;span class="c1"&gt;// Initialized outside the handler so the same instance&lt;/span&gt;
&lt;span class="c1"&gt;// is reused across warm invocations of this execution environment&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callDownstreamService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;successThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// successes (across any environment) required to close from HALF-OPEN&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// ms to wait in OPEN state before allowing a trial call (HALF-OPEN)&lt;/span&gt;
  &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cached response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fire&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// fire() throws if the circuit is OPEN and no fallback is configured,&lt;/span&gt;
    &lt;span class="c1"&gt;// or if the downstream call fails and propagates the error.&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Service unavailable&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;fire()&lt;/code&gt; calls &lt;code&gt;callDownstreamService&lt;/code&gt;. If the call succeeds, it records a success in DynamoDB. If it fails, it records a failure. When the failure count hits the threshold, it opens the circuit and subsequent calls return the fallback immediately (or throw if no fallback is configured). Every execution environment handling that function reads the same DynamoDB state. &lt;code&gt;successThreshold&lt;/code&gt; counts successes across any execution environment via shared DynamoDB state, following the same last-writer-wins behavior as failure counts. Real fallbacks return something useful under degradation: cached data, a default empty state, or a simplified response. The &lt;code&gt;{ data: 'cached response' }&lt;/code&gt; placeholder in the example is where that goes.&lt;/p&gt;

&lt;p&gt;If you're using Middy middleware, there's an integration that wraps your handler directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@middy/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;circuitBreakerMiddleware&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;circuitbreaker-lambda/middy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;myHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;circuitBreakerMiddleware&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Service unavailable&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The middleware wraps the entire handler rather than a specific downstream function. When the circuit is OPEN, the middleware short-circuits the handler before it runs and returns the fallback response. Without a fallback configured, it throws so your error handler can respond. If your handler calls multiple downstream services, use the npm package directly with a distinct circuit ID for each service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Circuit IDs and Shared State
&lt;/h2&gt;

&lt;p&gt;The circuit ID is what links circuit state to a specific downstream service. By default it uses &lt;code&gt;AWS_LAMBDA_FUNCTION_NAME&lt;/code&gt;. Two execution environments running the same function share one circuit because they have the same function name and read from the same DynamoDB item.&lt;/p&gt;

&lt;p&gt;If one function calls multiple downstream services, give each a distinct circuit ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;paymentBreaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callPaymentService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;circuitId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payment-service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inventoryBreaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callInventoryService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;circuitId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;inventory-service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If multiple functions protect the same downstream service and you want them to share a circuit, give them the same ID. A circuit open in one function will be seen by all functions using that ID. For the Lambda Layer, use the same circuit ID string in the HTTP path across all functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started: Lambda Layer
&lt;/h2&gt;

&lt;p&gt;If your functions use a runtime other than Node.js, or if you want a single circuit breaker deployment that works across runtimes, the Lambda Layer is the other path. It ships a Rust extension that runs as a local sidecar on port 4243. Your handler makes HTTP calls to it instead of importing a library.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add the layer to your SAM template
&lt;/h3&gt;

&lt;p&gt;Download the layer zip from the &lt;a href="https://github.com/gunnargrosch/circuitbreaker-lambda/releases" rel="noopener noreferrer"&gt;GitHub releases page&lt;/a&gt;. The Rust extension is architecture-specific: download the x86_64 build for standard Lambda functions or the arm64 build for Graviton. Reference it as a &lt;code&gt;AWS::Serverless::LayerVersion&lt;/code&gt; resource and attach it to your function. The &lt;code&gt;examples/layer/template.yaml&lt;/code&gt; in the repo shows the full setup with both architectures. The key function configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;LayerNodeFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
  &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Layers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="nv"&gt;CircuitBreakerLayer&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;CIRCUITBREAKER_TABLE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;CircuitBreakerTable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Node.js handler
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CIRCUIT_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AWS_LAMBDA_FUNCTION_NAME&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CB_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://127.0.0.1:4243&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// Check circuit state before calling downstream&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CB_URL&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/circuit/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CIRCUIT_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;check&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Circuit OPEN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Call downstream and report result&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callDownstream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CB_URL&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/circuit/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CIRCUIT_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/success`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CB_URL&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/circuit/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CIRCUIT_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/failure`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;  &lt;span class="c1"&gt;// Lambda returns a non-200; event sources like SQS will retry&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python handler
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urllib.request&lt;/span&gt;

&lt;span class="n"&gt;circuit_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS_LAMBDA_FUNCTION_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cb_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:4243&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cb_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/circuit/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;circuit_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Sidecar unavailable — fail open and allow the downstream call
&lt;/span&gt;        &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;allowed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;allowed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Circuit OPEN&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_downstream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cb_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/circuit/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;circuit_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/success&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cb_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/circuit/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;circuit_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/failure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# For event-driven triggers like SQS, raise here instead so Lambda retries.
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)})}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both examples make two HTTP calls to the sidecar per invocation: one to check state before the downstream call, one to report the result. These are loopback calls to &lt;code&gt;127.0.0.1&lt;/code&gt;, not network calls, so the round-trip is sub-millisecond. The Rust sidecar also runs the &lt;code&gt;CachedProvider&lt;/code&gt; logic internally, so it rarely reaches DynamoDB on warm invocations. The warm latency numbers in the Performance section reflect this.&lt;/p&gt;

&lt;p&gt;The Layer approach requires more boilerplate per handler, but it works in any managed runtime and keeps the state management and DynamoDB logic out of your application code. The handler wires up local HTTP calls to the sidecar. That part does live in your code. But the actual circuit state tracking, DynamoDB reads and writes, failure counting, and backoff logic are all inside the Rust extension.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fail-open
&lt;/h3&gt;

&lt;p&gt;The term comes from physical security, not circuit states: a fail-open lock releases when power fails, defaulting to permissive. Here it means the same thing. If the DynamoDB state provider is unavailable, requests pass through rather than failing. This is a deliberate trade-off. The alternative is failing closed: a transient DynamoDB error takes down your service even if the downstream service it's protecting is completely healthy. Your circuit breaker becomes a single point of failure.&lt;/p&gt;

&lt;p&gt;Fail-open accepts that brief periods of unprotected calls are better than self-inflicted downtime. State provider errors are logged as structured JSON so you can monitor and alert on them, but they don't block requests. The counter-argument: if DynamoDB is unavailable during an active downstream incident, fail-open leaves traffic unprotected. For most workloads this is the right call: a simultaneous DynamoDB outage and downstream failure is an unlikely combination, and failing closed (blocking all traffic because the circuit breaker can't read state) makes things worse. If your downstream is fragile enough that this scenario is a real concern, a lower-level fallback or degraded mode is a better answer than fail-closed.&lt;/p&gt;

&lt;p&gt;For the Lambda Layer path, there are two sidecar failure modes to distinguish. If the extension fails during the INIT phase, Lambda restarts the execution environment entirely. The handler never runs, and Lambda retries automatically. If the extension crashes after initialization during an invocation, &lt;code&gt;fetch&lt;/code&gt; calls to &lt;code&gt;http://127.0.0.1:4243&lt;/code&gt; throw connection refused errors. For this second case, wrap the sidecar calls in a try/catch and fail open: allow the downstream call to proceed. The same principle applies as with DynamoDB unavailability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Warm invocation caching
&lt;/h3&gt;

&lt;p&gt;Every &lt;code&gt;fire()&lt;/code&gt; call reads circuit state from DynamoDB. For a function handling high throughput, that's a DynamoDB read on every invocation. You can reduce this with the &lt;code&gt;CachedProvider&lt;/code&gt;, which caches state in memory for warm execution environments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callDownstream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;cacheTtlMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// use cached state for 200ms on warm invocations&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a warm invocation, the cache is checked first. If the state is fresh, no DynamoDB call is made. The cache is write-through: when state is saved to DynamoDB, the cache is also updated. Keep the TTL short. A long cache window can delay the CLOSED to OPEN transition: an execution environment that cached a CLOSED state won't see a newly-opened circuit until the cache expires. 200ms is a reasonable starting point: it caps the detection lag while cutting DynamoDB reads significantly for high-throughput functions. Increase the TTL to reduce costs further at the cost of slower circuit detection. Decrease it for faster propagation at higher DynamoDB cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the DynamoDB item looks like
&lt;/h3&gt;

&lt;p&gt;When debugging a stuck circuit, this is what you're looking for in the table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-function-name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"circuitState"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OPEN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"failureCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"successCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"nextAttempt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1741234567890&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lastFailureTime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1741234557890&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"consecutiveOpens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;circuitState&lt;/code&gt; is &lt;code&gt;CLOSED&lt;/code&gt;, &lt;code&gt;OPEN&lt;/code&gt;, or &lt;code&gt;HALF-OPEN&lt;/code&gt;. &lt;code&gt;nextAttempt&lt;/code&gt; is a Unix timestamp in milliseconds. The circuit won't probe until after that time. &lt;code&gt;consecutiveOpens&lt;/code&gt; tracks how many consecutive HALF-OPEN→OPEN transitions have occurred, which drives the exponential backoff on the timeout.&lt;/p&gt;

&lt;p&gt;The library uses last-writer-wins writes rather than atomic increments. Under extreme concurrent failures (many execution environments failing at the exact same millisecond) some failure counts can be lost: if 10 environments all read &lt;code&gt;failureCount: 4&lt;/code&gt; and each write &lt;code&gt;5&lt;/code&gt;, the count advances by 1 instead of 10. In practice this means the circuit may take slightly longer to open than the threshold suggests under burst concurrency. It will still open. For the CLOSED→OPEN transition itself, multiple environments writing &lt;code&gt;OPEN&lt;/code&gt; simultaneously all succeed, which is fine: you want the circuit open. Atomic counter increments via DynamoDB's &lt;code&gt;ADD&lt;/code&gt; operation could prevent lost failure counts, but a state transition updates multiple fields simultaneously: state, failure count, and timestamp. Last-writer-wins on the full item keeps the write logic simple at the cost of occasional lost counts under extreme concurrency. If your function handles high burst concurrency, set &lt;code&gt;failureThreshold&lt;/code&gt; lower than you would in a single-process application. Lost counts mean the effective threshold is higher than the configured value, so a lower setting brings the actual behavior closer to the intended one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exponential backoff on repeated failures
&lt;/h3&gt;

&lt;p&gt;When a circuit transitions from HALF-OPEN back to OPEN (a recovery probe failed), the timeout before the next probe doubles. This prevents a repeatedly-failing service from being probed too aggressively. The backoff resets when the circuit closes successfully. The &lt;code&gt;maxTimeout&lt;/code&gt; option caps how long the backoff can grow.&lt;/p&gt;

&lt;h3&gt;
  
  
  HALF-OPEN probe behavior
&lt;/h3&gt;

&lt;p&gt;With shared DynamoDB state, when the circuit transitions to HALF-OPEN, every warm execution environment that reads the updated state may attempt a trial call. Unlike a single-process circuit breaker where exactly one probe goes out, a fleet of 50 environments can send up to 50 simultaneous probes to a recovering downstream service. &lt;code&gt;CachedProvider&lt;/code&gt; staggers probes across the TTL window as environments pick up the state change at different times, but doesn't eliminate the burst. A single-leader approach (using a DynamoDB conditional write to claim the probe slot) would be more precise, and it's tracked as a future improvement in the repo. The current behavior favors simplicity: the probe burst is proportional to the number of warm environments, which is typically small for functions with reasonable traffic patterns, and distributed leader election adds significant complexity for a probe that's designed to be retried on failure anyway.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom state backends
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;StateProvider&lt;/code&gt; interface is pluggable. If you need Redis, a relational database, or anything else, implement two methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisProvider&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;StateProvider&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;circuitId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;CircuitBreakerState&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;saveState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;circuitId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CircuitBreakerState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;stateProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RedisProvider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DynamoDB is the right default for most Lambda workloads. Valkey or Redis makes sense if you're already VPC-attached and running ElastiCache for caching: reusing existing infrastructure avoids the extra DynamoDB dependency. For most teams, running a cache cluster solely for circuit state isn't worth the VPC overhead and operational cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Using the npm package and Lambda Layer, here are measured results from a test run with 50 warm invocations per configuration and a shared DynamoDB table in the same region. All functions were configured at 512MB memory. The "downstream" in all cases was an HTTP call through an API Gateway endpoint backed by DynamoDB, which could be toggled healthy or unhealthy. The HTTP round-trip through API Gateway accounts for the ~590ms baseline. Raw DynamoDB read latency is single-digit milliseconds. Cold start times scale inversely with memory allocation. Lambda allocates CPU proportionally to memory, so at 128MB (where CPU is highly constrained) you would expect larger overhead, particularly for the Layer which initializes a Rust extension sidecar alongside the function runtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold start&lt;/strong&gt; (forced by updating a function environment variable):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Cold start&lt;/th&gt;
&lt;th&gt;vs. baseline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline (no circuit breaker)&lt;/td&gt;
&lt;td&gt;1300ms&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;npm package (Node.js)&lt;/td&gt;
&lt;td&gt;1353ms&lt;/td&gt;
&lt;td&gt;+4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda Layer (Node.js)&lt;/td&gt;
&lt;td&gt;1679ms&lt;/td&gt;
&lt;td&gt;+29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda Layer (Python)&lt;/td&gt;
&lt;td&gt;1541ms&lt;/td&gt;
&lt;td&gt;+18%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Layer cold start penalty comes from initializing the Rust extension sidecar alongside the function runtime. It's a one-time cost per execution environment. Since August 2025, AWS bills for the Lambda INIT phase on managed runtimes with ZIP deployment packages, so the Layer's +29% cold start overhead (379ms) is now both a latency and a cost consideration for functions with frequent cold starts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Warm invocations&lt;/strong&gt; (50 calls each):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Median&lt;/th&gt;
&lt;th&gt;p99&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline (no circuit breaker)&lt;/td&gt;
&lt;td&gt;590ms&lt;/td&gt;
&lt;td&gt;620ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;npm package (Node.js)&lt;/td&gt;
&lt;td&gt;592ms&lt;/td&gt;
&lt;td&gt;621ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda Layer (Node.js)&lt;/td&gt;
&lt;td&gt;589ms&lt;/td&gt;
&lt;td&gt;797ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda Layer (Python)&lt;/td&gt;
&lt;td&gt;585ms&lt;/td&gt;
&lt;td&gt;639ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The npm package p99 (621ms) is essentially identical to baseline (620ms). The Lambda Layer Node.js p99 (797ms) is higher because the Rust extension sidecar occasionally adds latency on the first few invocations after a warm start. The median is fine but the tail is longer. The Layer configurations showing slightly below baseline median are within measurement noise, not a genuine speedup. With &lt;code&gt;CachedProvider&lt;/code&gt;, DynamoDB reads are eliminated for subsequent invocations within the TTL window, which brings tail latency down for high-throughput functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost
&lt;/h2&gt;

&lt;p&gt;Each &lt;code&gt;fire()&lt;/code&gt; call reads circuit state from DynamoDB (one &lt;code&gt;GetItem&lt;/code&gt;) and writes on state changes. At on-demand pricing, a &lt;code&gt;GetItem&lt;/code&gt; on a small item costs $0.125 per million read request units. At one million Lambda invocations per day (around 11 RPS), that's roughly $0.125/day for the reads. State writes only happen on failures and state transitions, so they're a rounding error for a healthy function. During an active failure scenario writes increase. At $1.25 per million write request units, a function failing on every invocation could see $1.25/day in write costs before the circuit opens and stops the calls. In practice the circuit opens quickly (after the first threshold of failures per environment), so write volume drops sharply once OPEN. With &lt;code&gt;CachedProvider&lt;/code&gt; at 200ms TTL and warm execution environments, reads drop by an order of magnitude on high-throughput functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;The package includes a &lt;code&gt;MemoryProvider&lt;/code&gt; for unit testing. Pass it as the &lt;code&gt;stateProvider&lt;/code&gt; option to skip DynamoDB entirely in tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;MemoryProvider&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;circuitbreaker-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;breaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callDownstream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;stateProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MemoryProvider&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;MemoryProvider&lt;/code&gt; uses an in-memory Map and is not safe for production. It's for tests and local development only.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use a Circuit Breaker
&lt;/h2&gt;

&lt;p&gt;Not every Lambda function needs one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Very low concurrency.&lt;/strong&gt; If your function runs at a single execution environment (low traffic, no bursting), in-memory circuit breakers work: there's only one environment, so state is effectively shared. The overhead of distributed state isn't worth it for something handling a few requests per minute.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calls to AWS services.&lt;/strong&gt; The AWS SDK handles retries, timeouts, and transient failures with exponential backoff. Wrapping a DynamoDB &lt;code&gt;GetItem&lt;/code&gt; or an S3 &lt;code&gt;PutObject&lt;/code&gt; in a circuit breaker adds complexity without much benefit. AWS manages the resilience layer for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When fail-fast isn't better than retry.&lt;/strong&gt; Circuit breakers are for cascading failure protection. If your function's caller expects a synchronous result and there's no meaningful fallback response, letting the error propagate and retry may be simpler than managing circuit state.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The repo includes two deployable SAM examples with a toggleable downstream service so you can watch the full circuit lifecycle (healthy calls, failures accumulating, circuit opening, recovery) against real AWS infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;examples/sam/&lt;/code&gt;: npm package example. Single Node.js function at &lt;code&gt;/&lt;/code&gt;. Toggle the downstream at &lt;code&gt;/toggle&lt;/code&gt;, check circuit state at &lt;code&gt;/status&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;examples/layer/&lt;/code&gt;: Layer example. Node.js (&lt;code&gt;/node&lt;/code&gt;) and Python (&lt;code&gt;/python&lt;/code&gt;) functions side by side, sharing the same Layer and DynamoDB table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;examples/minimal-npm/&lt;/code&gt; and &lt;code&gt;examples/minimal-layer/&lt;/code&gt;: Stripped-down versions if you just want the bare minimum code without the toggle/status test infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/gunnargrosch/circuitbreaker-lambda" rel="noopener noreferrer"&gt;circuitbreaker-lambda on GitHub&lt;/a&gt;: Source, docs, and examples&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/circuitbreaker-lambda" rel="noopener noreferrer"&gt;circuitbreaker-lambda on npm&lt;/a&gt;: Package page&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/using-the-circuit-breaker-pattern-with-aws-lambda-extensions-and-amazon-dynamodb/" rel="noopener noreferrer"&gt;Using the circuit-breaker pattern with AWS Lambda extensions and Amazon DynamoDB&lt;/a&gt;: AWS Compute Blog post covering the same Lambda extension + DynamoDB architecture&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/circuit-breaker.html" rel="noopener noreferrer"&gt;AWS Prescriptive Guidance: Circuit Breaker&lt;/a&gt;: AWS' recommended approach using DynamoDB&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html" rel="noopener noreferrer"&gt;AWS Lambda execution environment documentation&lt;/a&gt;: The concurrency and isolation model this post is based on&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://martinfowler.com/bliki/CircuitBreaker.html" rel="noopener noreferrer"&gt;Circuit Breaker pattern (Martin Fowler)&lt;/a&gt;: The canonical pattern reference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The silent failure mode of in-memory circuit breakers on Lambda isn't obvious until you're debugging a production incident. If you're running a circuit breaker today, check whether it's sharing state across execution environments. If it's not, it's not protecting you. The fix is a DynamoDB table and three lines of configuration. The alternative is finding out during the next downstream outage. Let me know in the comments how you're handling downstream resilience on Lambda.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>typescript</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Building Multi-Agent Systems with RISEN Prompts and Strands Agents</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Thu, 05 Mar 2026 20:40:04 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/building-multi-agent-systems-with-risen-prompts-and-strands-agents-52bd</link>
      <guid>https://dev.to/gunnargrosch/building-multi-agent-systems-with-risen-prompts-and-strands-agents-52bd</guid>
      <description>&lt;p&gt;The &lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;RISEN post&lt;/a&gt; introduced system prompts as behavioral contracts. One reader comment cut to the core of what comes next: "What happens when you have multiple agents that each need their own contract?"&lt;/p&gt;

&lt;p&gt;The answer isn't complicated, but it's specific. The Expectation section of one agent defines the input format for the next. Narrowing prevents agents from doing each other's work. Steps encode the routing logic: which specialists to call, when, and why. The contract between agents lives in the prompts, not in orchestration code.&lt;/p&gt;

&lt;p&gt;This post builds a working multi-agent system that demonstrates these contracts in practice. Here's what it looks like. Three different purchase requests, three completely different agent journeys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"I need a 15 inch laptop for work"
  → Calling Price Research Agent
  → Calling Delivery &amp;amp; Logistics Agent
Agents called:  Price Research, Delivery &amp;amp; Logistics
Agents skipped: Financing, Risk Assessment, Contract Review

"I want to buy a VW Golf, probably a used one"
  → Calling Price Research Agent
  → Calling Financing Agent
  → Calling Risk Assessment Agent
  → Calling Delivery &amp;amp; Logistics Agent
Agents called:  Price Research, Financing, Risk Assessment, Delivery &amp;amp; Logistics
Agents skipped: Contract Review

"Looking for office space to rent, two-year lease, around 20 people"
  → Calling Price Research Agent
  → Calling Financing Agent
  → Calling Contract Review Agent
  → Calling Risk Assessment Agent
Agents called:  Price Research, Financing, Contract Review, Risk Assessment
Agents skipped: Delivery &amp;amp; Logistics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same coordinator, same five specialists available. The coordinator reads the request, decides which ones are needed, and only calls those. The rest of this post explains how.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents, Not Workflows
&lt;/h2&gt;

&lt;p&gt;The instinct when building a multi-agent system is to reach for a workflow engine. Define the steps, wire the handoffs, control the flow. This is the opposite of what makes agents useful. A workflow engine decides what happens next. An agent decides what happens next. When you hardcode the routing, you lose the ability for the agent to reason about what's actually needed. A laptop doesn't need risk assessment. A used car does. That's a judgment call, not a branching condition. With hardcoded routing, every new category of purchase means a code change. With the routing in the prompt, the coordinator handles it on its own.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.youtube.com/watch?v=9O9zZ1lQWiI" rel="noopener noreferrer"&gt;DEV415 session on A2A and MCP&lt;/a&gt; at re:Invent 2025 makes this point in a production context: you design agent behaviors, not control flow. The &lt;a href="https://github.com/nullchecktv/swiftship-demo/" rel="noopener noreferrer"&gt;SwiftShip demo&lt;/a&gt; shows it running on AWS with Lambda functions and Agent-to-Agent communication.&lt;/p&gt;

&lt;p&gt;The approach in this post is simpler: agents as tools. Each specialist is wrapped as a &lt;code&gt;tool()&lt;/code&gt; function that the coordinator can invoke. No HTTP endpoints, no message queues, no infrastructure beyond a single TypeScript process. The pattern is the same as the production architecture. The implementation is small enough to clone and run in two minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Split Into Multiple Agents
&lt;/h2&gt;

&lt;p&gt;Not every task needs multiple agents. Here are signals that splitting makes sense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Different expertise domains.&lt;/strong&gt; If two sections of your prompt have completely different Narrowing constraints ("only assess security, never comment on performance"), those are different agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different model requirements.&lt;/strong&gt; If one part of the task needs strong reasoning and another just needs fast summarization, using the same model for both is wasteful.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conditional execution.&lt;/strong&gt; If some parts of the work only apply to some inputs, a single agent either does unnecessary work or has complex conditional logic in its Steps. Multiple specialists with a routing coordinator handles this cleanly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent failure isolation.&lt;/strong&gt; If one specialist fails or times out, the others can still complete. A single agent either succeeds or fails entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your task fits in one prompt with consistent Narrowing and no conditional branches, keep it as one agent. Adding coordination overhead for its own sake makes the system slower and harder to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  RISEN as Coordination Contracts
&lt;/h2&gt;

&lt;p&gt;In a single-agent system, RISEN structures the output. In a multi-agent system, RISEN structures the coordination. Each component does double duty:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expectation defines the handoff format.&lt;/strong&gt; What one agent returns is what the next agent reads. The Price Research Agent's Expectation section says "return the typical price range with budget, mid-range, and premium tiers." That's what the coordinator gets back and synthesizes with findings from other specialists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Narrowing defines ownership boundaries.&lt;/strong&gt; The Financing Agent's Narrowing says "do not assess market pricing, delivery logistics, contract terms, or product risk." The Risk Assessment Agent's Narrowing says the same about financing, delivery, and contracts. No agent steps on another agent's job. Without this, agents drift into each other's domains and produce redundant or contradictory advice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps encode routing logic.&lt;/strong&gt; The coordinator's Steps section IS the routing decision. It's not code. It's plain English in the system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Steps
1. Read the purchase request and identify: what is being purchased, the
   likely category, the approximate value range, and any special
   circumstances.
2. Always invoke the research_prices tool.
3. If the estimated value exceeds $5,000, or if financing is mentioned,
   invoke the evaluate_financing tool.
4. If the item is a tangible physical product, invoke the plan_delivery
   tool. Physical products always require delivery or collection planning.
5. If the purchase involves a subscription, lease, or multi-year
   commitment, invoke the review_contract tool.
6. If the estimated value exceeds $10,000, the item is used, or the
   category carries known risk, invoke the assess_risk tool.
7. Synthesize all specialist reports into a structured recommendation.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model reads these instructions, assesses the purchase request against them, and calls the appropriate tools. A $1,000 laptop triggers Steps 2 and 4 (price research and delivery). A used car triggers Steps 2, 3, 4, and 6 (price research, financing, delivery, and risk). An office lease triggers Steps 2, 3, 5, and 6 (price research, financing, contract review, and risk). Delivery is skipped because office space is not a physical product. The routing is conditional and emerges from the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Demo: Smart Purchasing Coordinator
&lt;/h2&gt;

&lt;p&gt;The demo has one coordinator agent and five specialist agents. Each specialist is wrapped as a &lt;code&gt;tool()&lt;/code&gt; function and passed to the coordinator:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Specialist&lt;/th&gt;
&lt;th&gt;When called&lt;/th&gt;
&lt;th&gt;Narrowing constraint&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price Research&lt;/td&gt;
&lt;td&gt;Always&lt;/td&gt;
&lt;td&gt;Only pricing. No risk, financing, or delivery.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Financing&lt;/td&gt;
&lt;td&gt;Value &amp;gt; $5K or financing mentioned&lt;/td&gt;
&lt;td&gt;Only financing. No pricing or contracts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delivery &amp;amp; Logistics&lt;/td&gt;
&lt;td&gt;Physical product&lt;/td&gt;
&lt;td&gt;Only logistics. No pricing or risk.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk Assessment&lt;/td&gt;
&lt;td&gt;High value, used goods, or risky category&lt;/td&gt;
&lt;td&gt;Only risk. No pricing or delivery.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contract Review&lt;/td&gt;
&lt;td&gt;Subscription, lease, or multi-year commitment&lt;/td&gt;
&lt;td&gt;Only contract terms. No pricing or risk.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Agents as tools
&lt;/h3&gt;

&lt;p&gt;The Strands Agents TypeScript SDK doesn't have a built-in &lt;code&gt;agent.asTool()&lt;/code&gt; method. Instead, you wrap each specialist using the &lt;code&gt;tool()&lt;/code&gt; function. The callback creates a fresh agent, invokes it, and returns its output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@strands-agents/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;invokeSpecialist&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./create-specialist-agent.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;riskAssessmentPrompt&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../prompts/risk-assessment.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ADVANCED_MODEL&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../models.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;assessRisk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;assess_risk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Identifies purchase risks, recommends due diligence steps, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;and estimates realistic total cost of ownership.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;What is being purchased&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;new&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;used&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;refurbished&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="na"&gt;estimatedValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;riskContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nf"&gt;invokeSpecialist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;riskAssessmentPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ADVANCED_MODEL&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;invokeSpecialist&lt;/code&gt; helper creates the agent, invokes it with a 60-second timeout, and returns the string output. The shared helper keeps each tool wrapper to a few lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;invokeSpecialist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;SpecialistOptions&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;modelId&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;SPECIALIST_MODEL&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;timer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;ReturnType&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;setTimeout&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

  &lt;span class="c1"&gt;// Timeout produces a rejection, not partial output.&lt;/span&gt;
  &lt;span class="c1"&gt;// If a specialist times out, the coordinator gets an error, not a half-answer.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;race&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Purchase request details:\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;never&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;timer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Specialist timed out&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timer&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice two things: the Risk Assessment Agent uses a different model (&lt;code&gt;ADVANCED_MODEL&lt;/code&gt;, which defaults to Sonnet 4.6) because risk analysis requires stronger reasoning than standard price research (which runs on Haiku 4.5). And &lt;code&gt;options.tools&lt;/code&gt; lets specialists have their own sub-tools. The Price Research Agent has a &lt;code&gt;save_price_snapshot&lt;/code&gt; tool that writes structured price data to a local JSON file. The coordinator never sees this tool. It's scoped to the specialist.&lt;/p&gt;

&lt;h3&gt;
  
  
  The coordinator wiring
&lt;/h3&gt;

&lt;p&gt;The coordinator itself is straightforward. It gets the RISEN prompt, all five specialist tools, and a hook for routing visibility:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hook&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RoutingHook&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;COORDINATOR_MODEL&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;coordinatorPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;allTools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;hooks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;printRecommendation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nf"&gt;printSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getCalledTools&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hooks for routing visibility
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;RoutingHook&lt;/code&gt; uses the SDK's &lt;code&gt;BeforeToolCallEvent&lt;/code&gt; to print each specialist as the coordinator decides to call it. Readers see the routing decisions happen in real time before the specialist output appears:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RoutingHook&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;HookProvider&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;calledTools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

  &lt;span class="nf"&gt;getCalledTools&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;calledTools&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;registerCallbacks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;HookRegistry&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolUse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;displayName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allToolNames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;displayName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;calledTools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`  → Calling &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;displayName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; Agent`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hook also tracks which tools were called, which powers the summary at the end showing called vs. skipped agents. This is observability for multi-agent systems without any infrastructure: one hook provider, attached to the coordinator, watching the decisions flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running It
&lt;/h2&gt;

&lt;p&gt;You saw the routing output at the top: laptop triggers two specialists, used car triggers four, office lease triggers a different four (contract review instead of delivery). Here's what the full output looks like for the used car:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;════════════════════════════════════════════════════════════
PURCHASE REQUEST
════════════════════════════════════════════════════════════
"I want to buy a VW Golf, probably a used one"

Coordinator is analyzing your request...

  → Calling Price Research Agent
  → Calling Financing Agent
  → Calling Risk Assessment Agent
  → Calling Delivery &amp;amp; Logistics Agent

════════════════════════════════════════════════════════════
PURCHASING RECOMMENDATION
════════════════════════════════════════════════════════════
[Coordinator synthesis: pricing tiers for used Golfs, financing options
 with monthly payment estimates, risk assessment covering DSG transmission
 and hidden maintenance costs, delivery logistics for vehicle collection]

────────────────────────────────────────────────────────────
Agents called:  Price Research, Financing, Risk Assessment, Delivery &amp;amp; Logistics
Agents skipped: Contract Review
────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The coordinator calls four specialists, waits for all of them, then synthesizes their findings into a single recommendation. Each specialist stays in its lane: the Risk Assessment Agent talks about DSG transmission issues and hidden maintenance costs, the Financing Agent talks about loan terms and monthly payments, and neither comments on the other's domain. That separation comes from the Narrowing section of each specialist's RISEN prompt.&lt;/p&gt;

&lt;p&gt;The two can also produce tension. Risk might flag a $3,000 first-year repair budget while Financing offers attractive loan terms. The coordinator doesn't resolve that tension. Its Expectation section says to surface findings from each specialist clearly attributed. Presenting both sides is the right call. Choosing one would mean overriding a specialist's domain, which is exactly what Narrowing is supposed to prevent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluating the Routing
&lt;/h2&gt;

&lt;p&gt;How do you know the coordinator is making the right calls? The same pattern from the &lt;a href="https://dev.to/gunnargrosch/evaluating-agent-output-quality-lightweight-evals-without-a-framework-38gk"&gt;eval post&lt;/a&gt; applies here: define expected behavior, run it, check the results.&lt;/p&gt;

&lt;p&gt;The routing eval replaces the real specialist tools with stubs that return immediately. The coordinator still runs against the real LLM, so this tests the RISEN Steps routing logic without paying for five specialist invocations per case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stubTools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;allToolNames&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Stub for &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;passthrough&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`[stub response from &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four test cases with expected routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ROUTING EVAL
Coordinator: global.anthropic.claude-sonnet-4-6
Test cases: 4 (specialist tools stubbed)

Budget laptop
  PASS called Price Research
  PASS called Delivery &amp;amp; Logistics
  PASS skipped Financing
  PASS skipped Risk Assessment
  PASS skipped Contract Review

Used car with financing
  PASS called Price Research
  PASS called Financing
  PASS called Risk Assessment
  PASS called Delivery &amp;amp; Logistics
  PASS skipped Contract Review

Office lease
  PASS called Price Research
  PASS called Financing
  PASS called Contract Review
  PASS called Risk Assessment
  PASS skipped Delivery &amp;amp; Logistics

SaaS subscription
  PASS called Price Research
  PASS called Contract Review
  PASS skipped Delivery &amp;amp; Logistics

RESULT: 4/4 routing cases passed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SaaS case checks fewer assertions than the others. Financing and Risk Assessment are omitted from that test because the coordinator's decision on them is borderline for a 50-seat enterprise tool: the annual cost might exceed the financing and risk thresholds depending on how the coordinator estimates the value. The eval only asserts on routing decisions that are unambiguously right or wrong.&lt;/p&gt;

&lt;p&gt;The calibration loop caught two prompt issues. The coordinator was calling the Delivery Agent for SaaS subscriptions (a purely digital product). Adding a Narrowing constraint ("Do not invoke DeliveryAgent for purely digital purchases") fixed that. It was also inconsistently calling Delivery for laptops because Step 4 said "requires shipping" rather than asserting that physical products always need delivery planning. Making the Step explicit ("Physical products always require delivery or collection planning, even if the buyer has not mentioned it") stabilized it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes on AWS (Preview of the Next Post)
&lt;/h2&gt;

&lt;p&gt;This demo runs in a single process. Everything happens in-memory: the coordinator calls tool functions, those functions create specialist agents, the specialists return strings. That's fine for development and for understanding the pattern. But it doesn't scale, it has no fault isolation, and there's no way to monitor or manage the agents independently.&lt;/p&gt;

&lt;p&gt;The core pattern stays identical: each specialist is a callable endpoint, the coordinator's tools make HTTP calls instead of function calls, and the routing logic in the RISEN Steps doesn't change at all. Two paths to get there:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 1: Lambda with Function URLs or API Gateway.&lt;/strong&gt; Each specialist becomes a Lambda function exposed over HTTP. You can use Lambda Function URLs (simpler, direct IAM auth) or API Gateway (more control, useful if corporate policy restricts Function URLs). Either way, the coordinator's tool callbacks switch from local &lt;code&gt;invokeSpecialist&lt;/code&gt; calls to HTTP requests, and IAM auth restricts invocation to the coordinator's execution role only. The &lt;a href="https://github.com/nullchecktv/swiftship-demo/" rel="noopener noreferrer"&gt;SwiftShip demo&lt;/a&gt; uses this pattern with Function URLs: a triage agent calling payment, warehouse, and order agents over HTTP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 2: Amazon Bedrock AgentCore Runtime.&lt;/strong&gt; AgentCore is a serverless runtime purpose-built for AI agents. Each agent deploys as a containerized Express service inside AgentCore, which handles session isolation per user, automatic scaling, and built-in observability. It supports the Strands TypeScript SDK and the A2A protocol. The deployment model is more involved than Lambda (Docker and ECR are required). Choose it when you need per-user session state, want A2A protocol support without building your own routing layer, or need a runtime that scales agent sessions independently rather than per-request.&lt;/p&gt;

&lt;p&gt;In both cases, the RISEN prompts carry over unchanged. The next post walks through a full deployment of this purchasing coordinator demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Per-agent model selection
&lt;/h3&gt;

&lt;p&gt;Not all agents need the same model. The coordinator and Risk Assessment Agent use Sonnet 4.6 for stronger reasoning. Standard specialists (Price Research, Financing, Delivery, Contract Review) use Haiku 4.5, which is faster and cheaper. The model IDs are centralized in &lt;code&gt;models.ts&lt;/code&gt; with environment variable overrides:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;COORDINATOR_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;COORDINATOR_MODEL_ID&lt;/span&gt;
  &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;global.anthropic.claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SPECIALIST_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SPECIALIST_MODEL_ID&lt;/span&gt;
  &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;global.anthropic.claude-haiku-4-5-20251001-v1:0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ADVANCED_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ADVANCED_MODEL_ID&lt;/span&gt;
  &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;global.anthropic.claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Sub-agent tools
&lt;/h3&gt;

&lt;p&gt;The Price Research Agent has its own tool (&lt;code&gt;save_price_snapshot&lt;/code&gt;) that writes structured price data to a local JSON file. The coordinator never sees this tool. It's scoped to the specialist.&lt;/p&gt;

&lt;p&gt;In a production system, that tool could be anything: a call to a live pricing API, a search against a product catalog, a query to a DynamoDB table, a vector search against a knowledge base. The point is that specialist agents aren't just prompt wrappers. Each one can have its own tool surface, scoped to its domain, invisible to the coordinator. The coordinator stays focused on routing. The specialist handles whatever retrieval or action its domain requires.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stub-based routing eval
&lt;/h3&gt;

&lt;p&gt;Testing routing decisions is cheaper than testing specialist output quality. By stubbing the specialist callbacks, the routing eval runs four coordinator LLM calls instead of twenty. The stubs return immediately, so the eval completes in under a minute. This is practical for the iterative calibration loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's missing: conversation
&lt;/h3&gt;

&lt;p&gt;The coordinator is one-shot. It reads the request, calls specialists, synthesizes, done. A real purchasing advisor would ask follow-up questions: "What year range are you considering?" or "Do you have a trade-in?" After delivering a recommendation, a conversational coordinator could handle "Tell me more about the financing options" by calling just the Financing Agent again with the new context.&lt;/p&gt;

&lt;p&gt;The Strands SDK supports multi-turn conversation through the agent's message history. Making this demo conversational would mean wrapping the coordinator invocation in a loop that reads user input and feeds it back to the same agent instance. The RISEN prompts wouldn't change. The coordinator's Steps already describe when to call each specialist, and those decisions would apply on follow-up turns too. The main addition would be a new Step telling the coordinator to ask clarifying questions when the request is ambiguous before routing to specialists.&lt;/p&gt;

&lt;p&gt;This is a natural extension but adds enough complexity (input loop, conversation state, deciding when to re-route vs. answer directly) that it's better as a separate iteration than a first demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/gunnargrosch/multi-agent-risen-demo" rel="noopener noreferrer"&gt;multi-agent-risen-demo&lt;/a&gt;: Demo repo with all code from this post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;Writing System Prompts That Actually Work: The RISEN Framework for AI Agents&lt;/a&gt;: The RISEN framework post this builds on&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gunnargrosch/evaluating-agent-output-quality-lightweight-evals-without-a-framework-38gk"&gt;Evaluating Agent Output Quality: Lightweight Evals Without a Framework&lt;/a&gt;: The eval patterns used for routing evaluation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=9O9zZ1lQWiI" rel="noopener noreferrer"&gt;DEV415: Building Scalable Self-Orchestrating AI Workflows with A2A and MCP&lt;/a&gt;: re:Invent 2025 session on production multi-agent architecture&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/nullchecktv/swiftship-demo/" rel="noopener noreferrer"&gt;SwiftShip demo&lt;/a&gt;: Production A2A demo from the DEV415 session&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/strands-agents/sdk-typescript" rel="noopener noreferrer"&gt;Strands Agents SDK (TypeScript)&lt;/a&gt;: Agent framework used in the demo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The contracts between your agents are just prompts. Change the Steps, change the routing. Add a Narrowing constraint, prevent an overlap. No code changes required. That's the payoff of RISEN in a multi-agent context: the coordination logic is readable, editable, and testable without touching the orchestration code.&lt;/p&gt;

&lt;p&gt;What purchase would you try first? Run &lt;code&gt;npm start&lt;/code&gt; with something unexpected and see which specialists the coordinator calls. Let me know in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>typescript</category>
      <category>programming</category>
    </item>
    <item>
      <title>Evaluating Agent Output Quality: Lightweight Evals Without a Framework</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Tue, 03 Mar 2026 16:20:38 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/evaluating-agent-output-quality-lightweight-evals-without-a-framework-38gk</link>
      <guid>https://dev.to/gunnargrosch/evaluating-agent-output-quality-lightweight-evals-without-a-framework-38gk</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;Writing System Prompts That Actually Work&lt;/a&gt;, I ended with this advice: "run it against a few representative inputs and check the output against your Expectation section." That's a good starting point. But if you've iterated on a few prompts, you've probably noticed the problem: eyeballing doesn't scale. You change the Steps section, re-run your test input, skim the output, and think "yeah, that looks better." Then two iterations later you realize you broke something that was working before. You're doing regression testing by memory.&lt;/p&gt;

&lt;p&gt;If you haven't read the &lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;RISEN post&lt;/a&gt;: RISEN is a framework for writing agent system prompts with five components: Role, Instructions, Steps, Expectation, and Narrowing. The key idea is that each component doubles as an eval lever. Expectation defines what the output should look like, so it tells you what structural checks to write. Narrowing defines what the agent should avoid, so it tells you what scope violations to flag.&lt;/p&gt;

&lt;p&gt;This post covers practical evaluation patterns for agent output. No heavyweight eval framework required. Three tiers: structural checks you can run in pure code, an LLM-as-judge pattern for content quality, and a calibration loop for tuning both. I'll walk through each tier with a working demo that evaluates a RISEN-structured code review agent that reviews Lambda functions for security vulnerabilities, performance issues, and AWS best practice violations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Eval Before Picking a Framework
&lt;/h2&gt;

&lt;p&gt;Evaluation frameworks exist for a reason. The &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; Python SDK includes an &lt;a href="https://github.com/strands-agents/sdk-python/tree/main/src/strands/evals" rel="noopener noreferrer"&gt;evals package&lt;/a&gt; with output evaluators, trajectory evaluators, and benchmark runners. If you're building a production agent in Python with dozens of test cases and CI integration, use it.&lt;/p&gt;

&lt;p&gt;But most of the time you're not there yet. You're still iterating on your system prompt, changing a sentence in Narrowing to see if it stops the agent from going off-scope. For that stage, you need something lighter.&lt;/p&gt;

&lt;p&gt;Here's the model I use:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;What it checks&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Structural&lt;/td&gt;
&lt;td&gt;Format compliance, section presence, vocabulary&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Judge&lt;/td&gt;
&lt;td&gt;Content quality, finding detection, reasoning&lt;/td&gt;
&lt;td&gt;~$0.01/check&lt;/td&gt;
&lt;td&gt;Seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human&lt;/td&gt;
&lt;td&gt;Calibration, edge cases, subjective quality&lt;/td&gt;
&lt;td&gt;Your time&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you want to run it first and read later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gunnargrosch/agent-evals-demo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;agent-evals-demo
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run &lt;span class="nb"&gt;eval&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two principles guide the approach:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Binary pass/fail over scales.&lt;/strong&gt; A finding is caught or it isn't. A section is present or it isn't. Likert scales (1-5 ratings) sound more nuanced, but they're harder to calibrate and harder to act on. If you need to score "how good" a finding is, you don't yet know what "good" means for your use case. Define it first, then check for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RISEN components are eval levers.&lt;/strong&gt; Each component of your system prompt maps directly to something you can check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expectation&lt;/strong&gt; defines structural checks: are the required sections present? Is the summary table formatted correctly?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Narrowing&lt;/strong&gt; defines scope checks: did the agent stay in bounds? Did it avoid things you told it to avoid?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steps&lt;/strong&gt; define content checks: did the agent follow the workflow? Did it catch the issues each step should surface?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Test Cases
&lt;/h2&gt;

&lt;p&gt;To evaluate a code review agent, you need code to review. Not random code, but code with known issues so you can check whether the agent found them.&lt;/p&gt;

&lt;p&gt;I built three test cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test case&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Expected findings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic Vulnerabilities&lt;/td&gt;
&lt;td&gt;Well-known issues. Calibration baseline.&lt;/td&gt;
&lt;td&gt;5 (SSN exposure, no validation, client in handler, wildcard CORS, no error handling)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subtle Issues&lt;/td&gt;
&lt;td&gt;Harder problems. Tests depth.&lt;/td&gt;
&lt;td&gt;4 (NoSQL injection, scan vs query, missing idempotency, no batch write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;False Positive Bait&lt;/td&gt;
&lt;td&gt;Correct code that looks suspicious. Tests precision.&lt;/td&gt;
&lt;td&gt;0 (should find nothing wrong)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first is the same Lambda function from the RISEN demo: a user lookup function with SSN exposure, missing input validation, a DynamoDB client instantiated inside the handler, wildcard CORS, and no error handling. It has obvious problems that any decent review should catch. If your agent misses SSN exposure on this function, something is fundamentally broken.&lt;/p&gt;

&lt;p&gt;The second is harder. It processes SQS messages and writes to DynamoDB with a FilterExpression built by string concatenation. That's NoSQL injection, but it's subtler than SQL injection and many reviewers miss it. The code also uses Scan instead of Query and doesn't handle SQS redelivery (missing idempotency).&lt;/p&gt;

&lt;p&gt;The third is the interesting one. It has test constants that look like hardcoded secrets (&lt;code&gt;TEST_API_KEY = 'test-ak-00000...'&lt;/code&gt;), a structured error handler that could look like error swallowing, and environment variable configuration that could be mistaken for hardcoded values. A good reviewer should find nothing wrong here. A trigger-happy one will flag false positives.&lt;/p&gt;

&lt;p&gt;Each test case is a TypeScript object with the code, expected findings, and things that should not be flagged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;TestCase&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;expectedFindings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Finding&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;expectedAbsent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Finding&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="nx"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;keywords&lt;/code&gt; array gives the LLM judge contextual hints about what vocabulary to look for. Keywords are passed directly to the judge as part of the user message. The judge reads them and uses them to decide whether the review demonstrated real understanding of the problem. For the NoSQL injection finding: &lt;code&gt;['injection', 'FilterExpression', 'concatenat', 'ExpressionAttributeValues', 'parameteriz']&lt;/code&gt;. The review doesn't need all of them, but it needs to demonstrate understanding of the actual problem, not just mention a related keyword in passing.&lt;/p&gt;

&lt;p&gt;The stems (&lt;code&gt;'concatenat'&lt;/code&gt;, &lt;code&gt;'parameteriz'&lt;/code&gt;) are intentional. The judge reads them semantically, so &lt;code&gt;'concatenat'&lt;/code&gt; cues the judge to look for "concatenation", "concatenated", and similar. Calibrate keyword specificity carefully: too generic and the judge gives credit for tangentially related mentions; too specific and you miss a review that describes the problem correctly but uses different phrasing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 1: Structural Checks
&lt;/h2&gt;

&lt;p&gt;Structural checks are pure code. No LLM calls, no cost, instant results. They verify that the agent's output matches the format defined in your Expectation and respects the boundaries in your Narrowing.&lt;/p&gt;

&lt;p&gt;For the code review prompt, the Expectation section says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Return a structured review with a summary table listing each finding with its severity, a detailed section for each finding with severity level, description, problematic code, and corrected code, and a summary count of findings by severity at the end.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That translates directly to checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runStructuralChecks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;StructuralCheckResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;checks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StructuralCheckResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

  &lt;span class="c1"&gt;// Derived from Expectation: "summary table listing each finding"&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hasTable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="sr"&gt;.*severity.*&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="sr"&gt;.*finding.*&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;checks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summary table present&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hasTable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;hasTable&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Found summary table with findings&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No summary table found.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// Derived from Expectation: "corrected code"&lt;/span&gt;
  &lt;span class="c1"&gt;// Assumes paired backtick fences — malformed output with odd backtick count&lt;/span&gt;
  &lt;span class="c1"&gt;// produces a non-integer, which Math.floor rounds down silently.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;codeBlockCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/``&lt;/span&gt;&lt;span class="err"&gt;`
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt; || &lt;/span&gt;&lt;span class="se"&gt;[])&lt;/span&gt;&lt;span class="sr"&gt;.length /&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
  &lt;span class="nx"&gt;checks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Code blocks with fixes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;codeBlockCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`Found &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;codeBlockCount&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; code block(s)`&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// Derived from Narrowing: "Do not suggest rewriting the entire function"&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputLines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;longestCodeBlock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getLongestCodeBlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isFullRewrite&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;longestCodeBlock&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;longestCodeBlock&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;outputLines&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;
  &lt;span class="nx"&gt;checks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No full rewrite&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;isFullRewrite&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;isFullRewrite&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`Longest code block is &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;longestCodeBlock&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; lines.`&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Code blocks are targeted fixes.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;checks&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The full demo includes six checks: summary table, severity vocabulary (at least two severity levels used), code blocks, no full rewrite, summary count at end, and scope compliance (no style/readability suggestions). All derived from two RISEN components.&lt;/p&gt;

&lt;p&gt;Here's what the output looks like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Structural Checks (6/6 passed)

  PASS Summary table present
  PASS Severity vocabulary
  PASS Code blocks with fixes
  PASS No full rewrite
  PASS Summary count at end
  PASS Stays within scope


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;These are pattern-matching checks, not semantic ones. An agent that rephrases a style suggestion to dodge the regex will pass. That's fine: structural checks catch gross format violations cheaply. The judge handles subtlety.&lt;/p&gt;

&lt;p&gt;Structural checks catch the most common prompt problems: the agent ignored your format instructions, or it drifted out of scope. They're the first thing to run because they're free and fast. If structural checks fail, there's no point running the judge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 2: LLM-as-Judge
&lt;/h2&gt;

&lt;p&gt;Structural checks tell you the output looks right. They can't tell you the content is right. For that, you need a judge: a second agent that reads the review output and assesses whether specific findings were caught.&lt;/p&gt;

&lt;h3&gt;
  
  
  Different model, mandatory reasoning
&lt;/h3&gt;

&lt;p&gt;Two design decisions matter here. First, the judge uses a different model than the review agent. The review agent runs on Claude Sonnet 4.5. The judge runs on Claude Haiku 4.5. Using the same model to judge its own output creates self-enhancement bias: the model that produced a vague or incomplete finding will tend to accept that same vague finding as "caught." A different model gives you a more honest assessment, and Haiku is fast and cheap enough to run per-finding.&lt;/p&gt;

&lt;p&gt;Second, the judge must write its reasoning before its verdict. This is the same principle behind chain-of-thought prompting: forcing the model to explain its logic before committing to an answer produces better answers. The judge prompt's Expectation section enforces this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
# Expectation
Return exactly one JSON object with this structure:
{
  "findingId": "&amp;lt;the finding ID provided&amp;gt;",
  "reasoning": "&amp;lt;2-3 sentences explaining why you believe the finding was
                 or was not caught&amp;gt;",
  "caught": &amp;lt;true or false&amp;gt;
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Reasoning comes before the verdict in the JSON. The model writes the reasoning first, then decides.&lt;/p&gt;

&lt;h3&gt;
  
  
  The judge prompt
&lt;/h3&gt;

&lt;p&gt;The judge gets a RISEN-structured prompt. It's worth showing in full. This is a RISEN prompt evaluating another RISEN prompt's output, and the structure makes that relationship explicit:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
# Role
You are a code review evaluator. You assess whether a code review
correctly identified a specific security or performance issue. You are
precise and literal: a finding is caught only if the review clearly
describes the problem and its impact.

# Instructions
You will receive a code review output and a specific finding to check
for. Determine whether the review caught the finding. Return your
assessment as JSON.

# Steps
1. Read the finding description and keywords carefully.
2. Search the review output for mentions of the issue.
3. Assess whether the review identified the core problem (not just
   mentioned a related keyword in passing).
4. Write your reasoning first, then your verdict.

# Expectation
Return exactly one JSON object with this structure:
{
  "findingId": "&amp;lt;the finding ID provided&amp;gt;",
  "reasoning": "&amp;lt;2-3 sentences explaining why you believe the finding
                 was or was not caught&amp;gt;",
  "caught": &amp;lt;true or false&amp;gt;
}

Return ONLY the JSON object. No markdown fences, no extra text.

# Narrowing
- A finding is "caught" only if the review identifies the specific
  problem described, not just a vaguely related concern.
- If the review mentions the general area but misses the specific
  vulnerability (e.g., mentions DynamoDB but not the injection vector),
  that is NOT caught.
- Do not give credit for partial matches. The review must demonstrate
  understanding of the actual issue.


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The Narrowing is where the precision comes from. Without it, the judge tends to give credit for proximity. The review mentions DynamoDB? Close enough to "NoSQL injection"? No. The review has to demonstrate understanding of the actual vulnerability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running the judge
&lt;/h3&gt;

&lt;p&gt;The judge evaluates each expected finding independently:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
async function judgeFindings(
  reviewOutput: string,
  findings: Finding[]
): Promise&amp;lt;JudgeVerdict[]&amp;gt; {
  // Each finding gets its own agent instance and runs in parallel.
  // Promise.allSettled means one timeout or error doesn't block the rest.
  const results = await Promise.allSettled(
    findings.map(async (finding) =&amp;gt; {
      const agent = createJudgeAgent(judgePrompt)
      const prompt = `
Review output to evaluate:
---
${reviewOutput}
---

Finding to check:
- ID: ${finding.id}
- Description: ${finding.description}
- Severity: ${finding.severity}
- Keywords that indicate detection: ${finding.keywords.join(', ')}

Did the review catch this finding?`

      const result = await agent.invoke(prompt)
      return parseJudgeResponse(result.toString(), finding.id)
    })
  )

  return results.map((result, i) =&amp;gt;
    result.status === 'fulfilled'
      ? result.value
      : { findingId: findings[i].id, reasoning: 'Judge failed', caught: false }
  )
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For the false positive test case, a separate judge prompt checks whether the review incorrectly flagged correct code as problematic.&lt;/p&gt;

&lt;p&gt;The judge returns a JSON object with &lt;code&gt;reasoning&lt;/code&gt; first and &lt;code&gt;caught&lt;/code&gt; second. The &lt;code&gt;caught&lt;/code&gt; boolean is what drives the PASS/FAIL in the terminal output; the &lt;code&gt;reasoning&lt;/code&gt; string is what gets printed below it. You defined that schema in the Expectation section of the judge prompt.&lt;/p&gt;

&lt;p&gt;Here's what the judge output looks like on the Subtle Issues test case:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Judge Verdicts (2/4 caught)

  PASS nosql-injection
    The review explicitly identifies the NoSQL injection vulnerability,
    clearly describing the core problem: user input is concatenated
    directly into the FilterExpression without parameterization.

  PASS scan-instead-of-query
    The review explicitly identifies this issue, clearly describing the
    core problem: the code uses ScanCommand which reads the entire table
    before filtering.

  FAIL missing-idempotency
    The review does not identify the missing idempotency issue. While
    the review addresses error handling, it does not discuss the specific
    problem of duplicate orders when SQS messages are reprocessed.

  FAIL no-batch-write
    The review does not identify the inefficiency of sequential
    PutCommand operations in a loop instead of BatchWriteCommand.


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The baseline prompt catches 2 out of 4 findings on the harder test case. It finds the NoSQL injection and the scan problem, but misses idempotency and batch writes. 50% on the subtle test case isn't a failure: it's the starting point. That's the data you need to improve the prompt. If your results look different across runs, that's expected. The calibration section covers non-determinism and what to do about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 3: The Calibration Loop
&lt;/h2&gt;

&lt;p&gt;The first time you run evals, you'll disagree with some of the judge's verdicts. That's expected and useful. The calibration loop is how you turn those disagreements into a better rubric.&lt;/p&gt;

&lt;p&gt;The process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the eval.&lt;/li&gt;
&lt;li&gt;Read every judge verdict, especially the reasoning.&lt;/li&gt;
&lt;li&gt;For each disagreement, decide which of three things happened:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The judge is wrong.&lt;/strong&gt; Tighten the judge prompt's Narrowing or add a keyword to the finding definition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The agent is wrong.&lt;/strong&gt; The agent should have caught this. Tighten the review prompt's Steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The test case is wrong.&lt;/strong&gt; The expected finding is unreasonable, or the code doesn't actually have the issue you thought it did. Fix the test case.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After my first run, I found two calibration issues. The "summary count at end" structural check was too strict: it only looked in the last 800 characters and required specific phrasing. I widened the search window and added more patterns. The false positive check for "hardcoded secret" was catching cases where the review mentioned test constants neutrally rather than flagging them as issues. I tightened the false positive judge prompt to distinguish between "flagged as a finding" and "mentioned in passing."&lt;/p&gt;

&lt;p&gt;Two iterations were enough to get a stable rubric for three test cases. If you have more test cases or a more complex agent, you might need three or four rounds.&lt;/p&gt;

&lt;p&gt;After the summary, the eval also prints a &lt;strong&gt;Suggestions&lt;/strong&gt; section that maps each failure back to the RISEN component to edit: missed findings point to Steps, structural failures point to Expectation, false positives point to Narrowing. It doesn't tell you what to change, but it tells you where to look.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on non-determinism
&lt;/h3&gt;

&lt;p&gt;LLMs don't produce the same output twice. Your eval might pass on one run and fail on the next for the same prompt and same input. This is normal, and it matters for how you interpret results.&lt;/p&gt;

&lt;p&gt;For structural checks, non-determinism is rarely an issue: the agent either returns a table or it doesn't. For the judge, a borderline verdict (the review hinted at a finding but didn't nail it) may flip between runs. If a finding fails consistently, it's a real signal. If it flips, that's a signal too: the finding is on the boundary of what the current prompt reliably catches, and the judge rubric may need tightening.&lt;/p&gt;

&lt;p&gt;For CI, this means not treating every eval failure as a blocker. Run structural checks in CI: they're stable. Use judge results as monitoring: track pass rates over multiple runs and alert on sustained regressions rather than single failures. The &lt;code&gt;--ci&lt;/code&gt; flag exits non-zero on any failure; use it in CI only once your rubric is stable enough that flakiness is rare.&lt;/p&gt;

&lt;p&gt;A simple strategy for borderline findings: run the eval three times and consider the finding caught if it passes at least two of three runs. The demo doesn't do this automatically. You'd wire it up in your CI script, but it filters out most random variance without being too permissive. The README notes which findings are intermittent in the baseline prompt; those are good candidates for this approach before you tighten the prompt further.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing Across Prompt Iterations
&lt;/h2&gt;

&lt;p&gt;The real payoff comes when you change your prompt and want to know if the change helped. The compare script runs both prompts against all test cases and shows what changed.&lt;/p&gt;

&lt;p&gt;The v2 prompt adds explicit steps for NoSQL injection detection and idempotency checking:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
# Steps
...
2. Check for security issues: injection (including NoSQL injection via
   string concatenation in DynamoDB expressions), overly permissive IAM
   assumptions, hardcoded secrets, missing input validation.
3. Check for data handling: look for string concatenation in
   FilterExpression, KeyConditionExpression, or ProjectionExpression.
   These must use ExpressionAttributeValues with placeholders.
4. Check for idempotency: if processing messages from SQS, SNS, or
   EventBridge, verify the handler is idempotent.
...


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Here's the comparison output:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
COMPARISON: BASELINE vs VARIANT

Basic Vulnerabilities
  Structural: 6/6 -&amp;gt; 6/6
  Findings:   5/5 -&amp;gt; 5/5

Subtle Issues
  Structural: 6/6 -&amp;gt; 6/6
  Findings:   2/4 -&amp;gt; 3/4 (+1)
    + now catches: missing-idempotency

False Positive Bait
  Structural: 5/6 -&amp;gt; 6/6 (+1)
  False pos:  0 -&amp;gt; 1 (-1)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The v2 prompt catches the missing idempotency issue that the baseline missed. That's the targeted improvement. But it also introduced a false positive on the clean code: it incorrectly flagged "wildcard CORS" on a function that uses environment-based CORS configuration.&lt;/p&gt;

&lt;p&gt;This is the trade-off you're always navigating with prompt changes. Adding specificity to Steps improves recall (catches more real issues) but can hurt precision (flags more non-issues). The eval gives you data to make that trade-off deliberately instead of guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Same Pattern, Different Domain
&lt;/h2&gt;

&lt;p&gt;Code review is a useful test bed, but the interesting question is whether the pattern holds for something more subjective. The demo includes a second domain: content review. An agent that reviews blog post drafts for completeness, structure, and technical accuracy.&lt;/p&gt;

&lt;p&gt;The test cases are blog post drafts instead of Lambda functions. The expected findings are things like "missing prerequisites section" and "unexplained command flags" instead of "SSN exposure" and "NoSQL injection." The structural checks are completely different: section checklist present, finding categories used, readiness assessment at end. No severity tables, no code blocks with fixes.&lt;/p&gt;

&lt;p&gt;But the pieces that change are exactly three files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Test cases&lt;/strong&gt;: blog post drafts with known issues, using the same &lt;code&gt;TestCase&lt;/code&gt; interface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structural checks&lt;/strong&gt;: regex/string checks derived from the content review prompt's Expectation section.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent prompt&lt;/strong&gt;: a RISEN-structured prompt for technical editing instead of security review.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The judge, report formatter, and &lt;code&gt;TestCase&lt;/code&gt;/&lt;code&gt;Finding&lt;/code&gt; types stay the same. Run it with:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npm run eval:content


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
TEST CASE: Incomplete Tutorial

Structural Checks (5/5 passed)

  PASS Section checklist present
  PASS Finding categories
  PASS Actionable suggestions
  PASS Readiness assessment
  PASS Stays within scope

Judge Verdicts (4/5 caught)

  PASS missing-prerequisites
  PASS unexplained-command-flags
  PASS missing-import
  PASS no-troubleshooting
  FAIL missing-conclusion


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Different domain, same eval pattern. If you're building a summarization agent, a customer support agent, or anything else, the approach is the same: define test cases with known expected findings, write structural checks from your Expectation section, and let the judge handle content quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running the Demo
&lt;/h2&gt;

&lt;p&gt;The demo repo has everything you need to run this yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You'll need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account with Amazon Bedrock access to Claude Sonnet 4.5 and Claude Haiku 4.5&lt;/li&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;li&gt;AWS credentials configured (&lt;code&gt;AWS_PROFILE&lt;/code&gt; or default)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Getting started:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
git clone https://github.com/gunnargrosch/agent-evals-demo.git
cd agent-evals-demo
npm install
npm test


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;code&gt;npm test&lt;/code&gt; runs the unit tests for the structural checks. No LLM calls, no AWS credentials needed. A good first step to verify the setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commands:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run eval:structural&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code review structural checks only. No judge, no cost.&lt;/td&gt;
&lt;td&gt;~30s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run eval&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code review full eval: structural + LLM judge.&lt;/td&gt;
&lt;td&gt;~2min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run eval:content&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Content review eval: structural + LLM judge.&lt;/td&gt;
&lt;td&gt;~1min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm run compare&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Baseline vs v2 code review prompt, side by side.&lt;/td&gt;
&lt;td&gt;~4min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can also run the full eval with the v2 prompt:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npm run eval -- --v2


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To test your own code review prompt, write it in a text file and pass it with &lt;code&gt;--prompt&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npm run eval -- --prompt ./my-code-review-prompt.txt


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you want to eval a completely different kind of agent without writing any TypeScript, &lt;code&gt;--test-case&lt;/code&gt; accepts a JSON file of test cases and &lt;code&gt;--skip-structural&lt;/code&gt; skips the built-in code review checks:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npm run eval -- --prompt ./my-prompt.txt --test-case ./my-cases.json --skip-structural


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The compare script accepts &lt;code&gt;--baseline&lt;/code&gt; and &lt;code&gt;--variant&lt;/code&gt; for comparing any two prompt files:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npm run compare -- --baseline ./my-prompt-v1.txt --variant ./my-prompt-v2.txt


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Pass &lt;code&gt;--ci&lt;/code&gt; to any eval script to exit non-zero on failures, useful for pipeline integration:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npm run eval -- --ci


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The eval uses Sonnet 4.5 for the review agent and Haiku 4.5 for the judge. A full &lt;code&gt;npm run eval&lt;/code&gt; across all three test cases costs roughly $0.10-0.15 at current Bedrock pricing. &lt;code&gt;npm run compare&lt;/code&gt; runs six agent calls plus the judge, so roughly double.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Graduate to a Framework
&lt;/h2&gt;

&lt;p&gt;This approach works well for iterating on a single agent's system prompt with a handful of test cases. When you hit these signals, it's time to look at a framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More than 10 test cases.&lt;/strong&gt; You'll want parallel execution, caching, and proper test runners.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI integration.&lt;/strong&gt; The demo has a &lt;code&gt;--ci&lt;/code&gt; flag that exits non-zero on failures, so you can hook it into a pipeline. But once you need test result history, trend tracking, or gating deploys across multiple agents, a framework handles that better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple agents coordinating.&lt;/strong&gt; Trajectory evaluation (did the agents take the right steps?) matters as much as output evaluation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team collaboration.&lt;/strong&gt; Others need to run and extend evals without understanding your bespoke scripts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Strands Agents SDK includes an &lt;a href="https://github.com/strands-agents/sdk-python/tree/main/src/strands/evals" rel="noopener noreferrer"&gt;evals package&lt;/a&gt; (Python) with &lt;code&gt;OutputEvaluator&lt;/code&gt; and &lt;code&gt;TrajectoryEvaluator&lt;/code&gt; classes that handle these scenarios. Note that this is the Python SDK. The TypeScript SDK doesn't include an evals package yet, so graduating means either switching to Python or building on top of what this demo started. The lightweight approach in this post is for the earlier stage: when you're still figuring out what "good" looks like for your agent's output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/gunnargrosch/agent-evals-demo" rel="noopener noreferrer"&gt;agent-evals-demo&lt;/a&gt;: Demo repo with all code from this post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94"&gt;Writing System Prompts That Actually Work: The RISEN Framework for AI Agents&lt;/a&gt;: The RISEN framework post this builds on&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/gunnargrosch/risen-prompt-demo" rel="noopener noreferrer"&gt;risen-prompt-demo&lt;/a&gt;: Demo repo for the RISEN post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/strands-agents/sdk-typescript" rel="noopener noreferrer"&gt;Strands Agents SDK (TypeScript)&lt;/a&gt;: Agent framework used in the demo&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/strands-agents/sdk-python/tree/main/src/strands/evals" rel="noopener noreferrer"&gt;Strands Agents evals (Python)&lt;/a&gt;: Full eval framework for when you graduate&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hamel.dev/blog/posts/evals/" rel="noopener noreferrer"&gt;Your AI Product Needs Evals&lt;/a&gt;: Hamel Husain's guide to building evals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you try it, the calibration loop is where the interesting disagreements show up. What does your current approach to agent eval look like? Let me know in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>aws</category>
      <category>testing</category>
    </item>
    <item>
      <title>Writing System Prompts That Actually Work: The RISEN Framework for AI Agents</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Sun, 01 Mar 2026 17:32:04 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94</link>
      <guid>https://dev.to/gunnargrosch/writing-system-prompts-that-actually-work-the-risen-framework-for-ai-agents-4p94</guid>
      <description>&lt;p&gt;You've probably written a system prompt that looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a helpful assistant. Help the user with their request.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works. The model responds. But the output is unpredictable. Ask it to review code and you get a mix of style comments and security findings with no consistent structure. Ask it to diagnose an incident and it gives you a wall of text that buries the actionable steps. Ask it to design an architecture and it picks services without explaining trade-offs.&lt;/p&gt;

&lt;p&gt;If you're building agents that need to produce consistent, structured output, whether that's a single-agent workflow or a multi-agent system, the problem isn't the model. It's the prompt. A vague system prompt gives the model no framework for structuring its reasoning, so it improvises every time. Sometimes the improvisation is great. Sometimes it misses the point entirely. You can't build reliable agents on "sometimes."&lt;/p&gt;

&lt;h2&gt;
  
  
  The RISEN Framework
&lt;/h2&gt;

&lt;p&gt;RISEN is a structured approach to writing system prompts. Each letter represents a component:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;R&lt;/strong&gt;ole&lt;/td&gt;
&lt;td&gt;Who the agent is. Expertise, experience, specialization.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;I&lt;/strong&gt;nstructions&lt;/td&gt;
&lt;td&gt;What you want it to do. The core task.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;S&lt;/strong&gt;teps&lt;/td&gt;
&lt;td&gt;How to get there. The ordered workflow.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;E&lt;/strong&gt;xpectation&lt;/td&gt;
&lt;td&gt;What the output should look like. Format, structure, sections.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;N&lt;/strong&gt;arrowing&lt;/td&gt;
&lt;td&gt;What to exclude. Constraints, boundaries, scope limits.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You'll see the E defined as "End Goal" in some formulations. Expectation is a deliberate choice here: for agents, what matters is the structural contract for the output, not a vague goal statement. "Produce a useful architecture" is an end goal. "Return sections for Requirements Summary, Service Selection with trade-off tables, SAM Template, and Cost Estimate" is an expectation.&lt;/p&gt;

&lt;p&gt;Most people only write the &lt;strong&gt;I&lt;/strong&gt; part. "Review the code." "Diagnose the issue." "Design an architecture." That's an instruction with no context about who's doing the work, what process to follow, what format to use, or what to leave out.&lt;/p&gt;

&lt;p&gt;RISEN fills in the rest. The result isn't just a prompt. It's a behavioral contract. The agent knows what role it's playing, what steps to follow, what structure to produce, and what boundaries to respect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Agents
&lt;/h2&gt;

&lt;p&gt;System prompts matter more for agents than for simple chat. In a chat application, a vague system prompt means the user gets a mediocre answer and can follow up. In an agentic workflow, a vague system prompt means the agent takes actions based on an ambiguous understanding of its role. It might use the wrong tools, skip steps, or produce output that downstream agents can't parse.&lt;/p&gt;

&lt;p&gt;In multi-agent systems (whether you're using protocols like &lt;a href="https://github.com/google/A2A" rel="noopener noreferrer"&gt;A2A&lt;/a&gt; and &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;MCP&lt;/a&gt;, or frameworks like &lt;a href="https://github.com/strands-agents" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;), each agent's system prompt is its behavioral contract with the rest of the system. A warehouse management agent in a logistics pipeline needs to know exactly what decisions it owns, what format to return, and what to escalate. "You are a warehouse assistant" doesn't cut it.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/nullchecktv/swiftship-demo/" rel="noopener noreferrer"&gt;SwiftShip demo&lt;/a&gt; from re:Invent session DEV415 is a good example. It's a logistics platform with four agents (Triage, Order, Payment, Warehouse) that coordinate to resolve delivery exceptions. Every agent has a RISEN-structured system prompt. The Triage Agent's Steps section is a full decision tree: classify the exception, determine the resolution strategy, invoke the right specialist agents in the right order (Payment before Warehouse before Order for replacements), and produce a resolution summary. The Narrowing section prevents it from handling general customer inquiries and enforces that it never processes refunds without confirming the exception type. That's not a prompt. That's an orchestration contract.&lt;/p&gt;

&lt;p&gt;This is also where &lt;strong&gt;Narrowing&lt;/strong&gt; earns its place. Without explicit constraints, agents over-deliver. An incident response agent might suggest rewriting the application code when all you need right now is "switch DynamoDB to on-demand capacity." Narrowing keeps the agent focused on what's useful for the current context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Difference in Practice
&lt;/h2&gt;

&lt;p&gt;I put together a &lt;a href="https://github.com/gunnargrosch/risen-prompt-demo" rel="noopener noreferrer"&gt;demo repo&lt;/a&gt; with three scenarios that show the difference between basic and RISEN system prompts. Each scenario sends the same user prompt to the same model twice: once with a one-sentence system prompt, once with a RISEN-structured prompt. Same model, same input, different guidance. The demo uses &lt;a href="https://github.com/strands-agents" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; with Amazon Bedrock.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: Incident response
&lt;/h3&gt;

&lt;p&gt;The user prompt is a DynamoDB throttling alert: 4,850 WCU consumed against 1,000 provisioned, 2,347 throttled requests, deployed a new Lambda version 45 minutes ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are an incident response assistant. Help diagnose and resolve
AWS issues.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;RISEN prompt (abbreviated, &lt;a href="https://github.com/gunnargrosch/risen-prompt-demo" rel="noopener noreferrer"&gt;full version in the demo repo&lt;/a&gt;):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Role
You are an AWS site reliability engineer on an on-call rotation
with 10 years of experience operating production serverless workloads.

# Instructions
Perform a structured diagnosis. Identify the most likely root cause,
provide immediate mitigation steps, and recommend longer-term fixes.

# Steps
1. Parse the alert details: service, metric, threshold, duration.
2. List the top 3 most likely root causes in order of probability.
3. For each, describe evidence that would confirm or rule it out.
4. Provide immediate mitigation steps executable in under 5 minutes.
5. Recommend longer-term fixes with estimated effort.

# Expectation
Sections: Alert Summary, Probable Root Causes (ranked), Diagnostic
Steps, Immediate Mitigation, Long-Term Fixes. Include specific
metric names, CLI commands, and thresholds.

# Narrowing
- Operator has CLI access but cannot deploy code changes during
  the incident.
- Focus on mitigation first. Restoring service is the priority.
- Do not suggest "contact AWS Support" as a first step.
- All commands should use AWS CLI v2 syntax.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The basic prompt gives a solid response. It correctly identifies the new Lambda deployment as the likely cause, provides useful CLI commands, and suggests scaling up DynamoDB. But it's organized as a narrative with emoji headers and ends by asking the operator what to do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## 🚨 Immediate Issue
Your write capacity is being consumed at **485% of provisioned capacity**...

## 🔍 Root Cause Hypothesis
Given the timeline, the new Lambda deployment is the likely culprit.

...

**What would you like to do first? Scale the table or rollback the Lambda?**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That question is the wrong instinct for an incident response agent. At 2 AM, you don't want a conversation. You want a ranked action plan.&lt;/p&gt;

&lt;p&gt;The RISEN prompt produces exactly that. Root causes are ranked with confidence percentages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## 1. **Lambda Write Amplification (90% confidence)**
## 2. **Hot Partition Key Issue (70% confidence)**
## 3. **SQS Message Backlog Processing (60% confidence)**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each mitigation option includes cost and impact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Option A: Increase DynamoDB Write Capacity (60 seconds)
aws dynamodb update-table \
  --table-name order-events-prod \
  --provisioned-throughput ReadCapacityUnits=1000,WriteCapacityUnits=5000
Impact: Eliminates throttling immediately. Table update takes 30-60 seconds.
Cost: ~$0.35/hour additional ($2,336/month vs $467/month baseline)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And long-term fixes come with effort estimates ("Effort: 15 minutes", "Effort: 4 hours") so you can prioritize. The Narrowing constraint about not deploying code during an incident kept the response focused on what an on-call engineer can actually do without waking up the development team.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 2: Architecture decision
&lt;/h3&gt;

&lt;p&gt;This scenario adds a twist: both agents get the same &lt;a href="https://docs.aws.amazon.com/aws-mcp/latest/userguide/what-is-mcp-server.html" rel="noopener noreferrer"&gt;AWS MCP server&lt;/a&gt; tools for searching AWS documentation, checking service limits, and validating recommendations. Same tools, same model, same user prompt. The only difference is the system prompt.&lt;/p&gt;

&lt;p&gt;The user prompt describes requirements for a real-time order notification system: 50,000 orders per day, multiple notification channels, customer preferences, 30-second delivery SLA, under $500/month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are an AWS solutions architect. Help design cloud architectures.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;RISEN prompt (abbreviated):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Role
You are a principal AWS solutions architect specializing in
event-driven serverless architectures.

# Instructions
Design an AWS architecture. Evaluate service options, justify
choices with trade-offs, and provide a SAM template snippet.
Use the AWS documentation tools to validate your recommendations.

# Steps
1. Restate requirements as functional and non-functional.
2. Identify the core architectural pattern.
3. For each component, list 2-3 service options with trade-offs.
   Use the documentation tools to verify current service limits
   and pricing.
4. Select and justify the recommended option.
5. Describe the data flow end to end.
6. Provide a SAM template snippet.
7. Call out operational considerations.

# Expectation
Sections: Requirements Summary, Architecture Pattern, Service
Selection (with trade-off tables), Data Flow, SAM Template,
Operational Considerations. Include a monthly cost estimate.

# Narrowing
- Prefer serverless over instance-based.
- Use managed services only.
- SAM templates should be valid YAML, not pseudocode.
- Cost estimates using current us-east-1 pricing.
- Use the documentation tools only to verify specific facts
  (pricing, limits, quotas). Do not use them to generate
  the architecture itself.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both agents used the MCP tools. But look at what they did with them.&lt;/p&gt;

&lt;p&gt;The basic prompt queried the documentation and jumped straight to a recommendation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Recommended Architecture: Event-Driven Real-Time Order Notification System
...
### Core Components
#### 1. Event Ingestion Layer
- **Amazon EventBridge**: Central event bus for order events
...
Would you like me to:
1. Generate CDK/CloudFormation templates for this architecture?
2. Create the Lambda function code with full error handling?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No alternatives evaluated. No trade-offs. And it ends by asking what to do next.&lt;/p&gt;

&lt;p&gt;The RISEN prompt used the same tools to verify facts, then produced trade-off tables for every component:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;### 3.1 Event Ingestion Layer

| Service           | Pros                         | Cons                       | Verdict      |
|-------------------|------------------------------|----------------------------|--------------|
| EventBridge       | Native filtering, $1/M events| Limited transformation     | SELECTED     |
| Kinesis Streams   | Replay, high throughput      | $11/month min, overkill    |              |
| SQS               | Simple, cheap                | No native fanout           |              |

Decision: EventBridge - 8.33 events/sec peak &amp;lt;&amp;lt; 10,000/sec limit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Steps guided the agent to evaluate before deciding. The Narrowing constraint "Use the documentation tools only to verify specific facts" kept the tool usage focused: the agent looked up pricing and limits, not architectures. The result was a full architecture document with a SAM template, a cost breakdown ($253.51/month against the $500 budget), and operational considerations including scaling limits and monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 3: Code review
&lt;/h3&gt;

&lt;p&gt;The user prompt is a Lambda function with several issues: SDK client instantiated inside the handler, no input validation, sensitive data (SSN) returned in the API response, wildcard CORS headers, and no error handling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a code review assistant. Review code for issues and suggest improvements.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;RISEN prompt (abbreviated):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Role
You are a senior AWS security engineer specializing in serverless
application security.

# Instructions
Review the provided code for security vulnerabilities, performance
issues, and AWS best practice violations. Prioritize findings by
severity and provide fix recommendations with corrected code.

# Steps
1. Identify the AWS services and patterns in use.
2. Check for security issues: injection, overly permissive IAM,
   hardcoded secrets, missing input validation.
3. Check for performance issues: cold start impact, unnecessary
   SDK client instantiation.
4. Check for reliability issues: missing error handling, no retries.
5. For each finding, provide severity, the problematic code,
   and a corrected snippet.

# Expectation
Structured review organized by severity. Each finding includes:
severity level, description, problematic code, corrected code.
End with a summary count.

# Narrowing
- Focus on production impact. Ignore style preferences.
- Do not suggest rewriting the entire function or switching runtimes.
- Limit the review to security, performance, and reliability.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both prompts catch the SSN exposure. But look at how the output differs.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;basic prompt&lt;/strong&gt; opens with emoji-coded sections and mixes severity with style:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Critical Issues 🔴
### 1. **Security Vulnerability - Sensitive Data Exposure**
...
## Medium Priority Issues 🟠
### 7. **Type Safety**
- `event: any` loses type safety
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also generates a full rewrite of the function (which the reviewer didn't ask for) and ends with a brief summary.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;RISEN prompt&lt;/strong&gt; produces a consistent structure: every finding follows the same format (Severity, Description, Problematic Code, Corrected Code) and ends with a summary table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Severity     | Count | Issues                                     |
|--------------|-------|--------------------------------------------|
| **Critical** | 2     | NoSQL injection, PII exposure (SSN)        |
| **High**     | 3     | Missing auth, permissive CORS, no errors   |
| **Medium**   | 3     | Cold start, input validation, null checks  |
| **Low**      | 2     | TypeScript any type, missing headers       |
| **Total**    | **10**|                                            |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Narrowing constraint "do not suggest rewriting the entire function" kept the RISEN response focused on targeted fixes. The basic prompt had no such guardrail and generated a complete replacement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;li&gt;AWS credentials configured for Amazon Bedrock access&lt;/li&gt;
&lt;li&gt;Python 3.10+ and &lt;code&gt;uvx&lt;/code&gt; (for the architecture scenario's &lt;a href="https://docs.aws.amazon.com/aws-mcp/latest/userguide/what-is-mcp-server.html" rel="noopener noreferrer"&gt;AWS MCP server&lt;/a&gt; integration)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gunnargrosch/risen-prompt-demo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;risen-prompt-demo
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run a scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run code-review
npm run incident
npm run architecture
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each scenario runs the basic prompt first, then the RISEN prompt, so you can see the difference in your terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your Own RISEN Prompts
&lt;/h2&gt;

&lt;p&gt;A few things I've noticed that make RISEN prompts more effective:&lt;/p&gt;

&lt;h3&gt;
  
  
  Role is more than a job title
&lt;/h3&gt;

&lt;p&gt;"You are a code reviewer" gives the model a vague persona. "You are a senior AWS security engineer specializing in serverless application security" tells it what lens to apply. The more specific the role, the more the model draws on relevant knowledge. Include years of experience, domain expertise, and the specific technology stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Get the step granularity right
&lt;/h3&gt;

&lt;p&gt;Too few steps and the model skips reasoning. Too many and it gets rigid. Three to seven steps tends to work. Each step should represent a distinct phase, not a sub-task. If you find yourself writing "2a, 2b, 2c," that's one step with internal detail, not three steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Make Narrowing specific
&lt;/h3&gt;

&lt;p&gt;The most common mistake is forgetting Narrowing entirely. The second most common is making it too vague. "Keep it focused" isn't a constraint. "Do not suggest services in preview or limited availability" is. Write constraints that you could objectively check against the output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Don't skip Expectation for agents
&lt;/h3&gt;

&lt;p&gt;For single-use prompts, Expectation is nice to have. For agents whose output feeds into other agents or structured workflows, it's required. Specify sections, ordering, format (bullet points, tables, code blocks). If you skip one section, don't skip this one.&lt;/p&gt;

&lt;p&gt;Your first RISEN prompt won't be your last. Run it against a few representative inputs and check the output against your Expectation section. If the structure is right but the content is off, adjust the Role or Steps. If the agent keeps going out of scope, tighten the Narrowing. If the output format is inconsistent, make the Expectation more specific. The framework gives you five independent levers to tune.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Frameworks Worth Knowing
&lt;/h2&gt;

&lt;p&gt;RISEN isn't the only structured approach. Anthropic and OpenAI both publish recommended prompt structures for their models that cover similar ground: role, instructions, output format, examples, and constraints. If your agent uses tools extensively, RISE-M extends RISEN with a sixth component, Methods, which covers when and how to use each tool. The architecture scenario above is a lightweight version of this: the Steps and Narrowing sections include tool usage guidance ("verify current service limits" in Steps, "only to verify specific facts" in Narrowing). If your tool-specific constraints keep growing, a dedicated Methods section may be cleaner.&lt;/p&gt;

&lt;p&gt;The frameworks overlap. The value isn't in picking the "right" one. It's in moving from an unstructured one-liner to any systematic approach that covers role, task, process, format, and constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Here
&lt;/h2&gt;

&lt;p&gt;If you want to try RISEN on your next agent, here's a blank template you can copy and fill in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Role
You are a [job title/expertise] specializing in [domain]. You have
[years of experience] with [specific technologies/tools].

# Instructions
[Core task in 1-2 sentences. What should the agent accomplish?]

# Steps
1. [First thing the agent should do]
2. [Second thing]
3. [Continue until the workflow is complete]

# Expectation
[Output format: sections, tables, code blocks, bullet points.
Specify the structure the response should follow.]

# Narrowing
- [What to exclude or ignore]
- [Scope boundaries]
- [Constraints on format, length, or approach]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fill in Role first (it shapes everything else), then Instructions, then Steps. Expectation and Narrowing come last because they depend on knowing what the agent is doing and how.&lt;/p&gt;

&lt;h3&gt;
  
  
  Additional Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/risen-prompt-demo" rel="noopener noreferrer"&gt;RISEN Prompt Demo Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/nullchecktv/swiftship-demo/" rel="noopener noreferrer"&gt;SwiftShip Multi-Agent Demo&lt;/a&gt; (RISEN in a production-style multi-agent system)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/strands-agents" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts" rel="noopener noreferrer"&gt;Anthropic System Prompt Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/cookbook/examples/gpt4-1_prompting_guide" rel="noopener noreferrer"&gt;OpenAI GPT-4.1 Prompting Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2406.06608" rel="noopener noreferrer"&gt;The Prompt Report (Zhou et al.)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic: Building Effective Agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Have you applied a structured framework to your agent system prompts? What changed in the output when you did? I'd like to hear about it in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>aws</category>
      <category>llm</category>
    </item>
    <item>
      <title>Streaming Bedrock Responses Through API Gateway and Lambda</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Wed, 25 Feb 2026 09:05:13 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/streaming-bedrock-responses-through-api-gateway-and-lambda-2lj9</link>
      <guid>https://dev.to/gunnargrosch/streaming-bedrock-responses-through-api-gateway-and-lambda-2lj9</guid>
      <description>&lt;p&gt;If you're building applications that call Amazon Bedrock through API Gateway and Lambda, your users are probably staring at a spinner. The model generates tokens progressively, but the standard Lambda integration buffers the entire response before sending anything back. For a typical LLM response, that's 8-10 seconds of nothing, then everything at once.&lt;/p&gt;

&lt;p&gt;API Gateway &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/rest-api-streaming.html" rel="noopener noreferrer"&gt;response streaming&lt;/a&gt; fixes this. Tokens flow from Bedrock through Lambda and API Gateway to the client as they're generated. The first token arrives in ~500ms. The total generation time stays the same. The difference is entirely in when the user starts seeing output. Beyond latency, streaming also lifts two constraints that matter for larger workloads: the 10 MB response payload limit and the 29-second default integration timeout. Streaming responses can run for up to 15 minutes and exceed 10 MB.&lt;/p&gt;

&lt;p&gt;I put together a &lt;a href="https://github.com/gunnargrosch/apigw-lambda-streaming" rel="noopener noreferrer"&gt;demo&lt;/a&gt; that runs both approaches side by side so you can see the difference for yourself. Two Lambda functions, same model, same prompt, same API Gateway. One streams, one buffers. The streaming panel fills up token by token while the buffered panel sits there waiting.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Streaming&lt;/th&gt;
&lt;th&gt;Buffered&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to first byte&lt;/td&gt;
&lt;td&gt;~500ms&lt;/td&gt;
&lt;td&gt;~8-10s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total time&lt;/td&gt;
&lt;td&gt;~8-10s&lt;/td&gt;
&lt;td&gt;~8-10s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User experience&lt;/td&gt;
&lt;td&gt;Progressive, real-time&lt;/td&gt;
&lt;td&gt;Waiting, then all at once&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Streaming requires changes in two places: the API Gateway configuration and the Lambda handler. Neither is complicated on its own. The part that trips people up is getting them to work together.&lt;/p&gt;

&lt;h3&gt;
  
  
  The API Gateway side
&lt;/h3&gt;

&lt;p&gt;In the OpenAPI spec, the streaming endpoint needs two things that the standard endpoint doesn't:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A different Lambda invocation URI path: &lt;code&gt;/response-streaming-invocations&lt;/code&gt; instead of &lt;code&gt;/invocations&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;responseTransferMode: STREAM&lt;/code&gt; property on the integration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's what that looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;/streaming&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;x-amazon-apigateway-integration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_PROXY&lt;/span&gt;
      &lt;span class="na"&gt;httpMethod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POST&lt;/span&gt;
      &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Fn::Sub&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${StreamingFunction.Arn}/response-streaming-invocations"&lt;/span&gt;
      &lt;span class="na"&gt;responseTransferMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;STREAM&lt;/span&gt;
      &lt;span class="na"&gt;passthroughBehavior&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;when_no_match&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare that to the standard endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;/non-streaming&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;x-amazon-apigateway-integration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_PROXY&lt;/span&gt;
      &lt;span class="na"&gt;httpMethod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POST&lt;/span&gt;
      &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Fn::Sub&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${NonStreamingFunction.Arn}/invocations"&lt;/span&gt;
      &lt;span class="na"&gt;passthroughBehavior&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;when_no_match&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The URI path version changes from &lt;code&gt;2015-03-31&lt;/code&gt; to &lt;code&gt;2021-11-15&lt;/code&gt;, and &lt;code&gt;response-streaming-invocations&lt;/code&gt; replaces &lt;code&gt;invocations&lt;/code&gt;. Under the hood, this tells API Gateway to use Lambda's &lt;code&gt;InvokeWithResponseStreaming&lt;/code&gt; API instead of the standard &lt;code&gt;Invoke&lt;/code&gt;. Miss either of those details and your streaming endpoint silently falls back to buffered behavior. No error, just a longer wait.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Lambda side
&lt;/h3&gt;

&lt;p&gt;The streaming handler uses &lt;code&gt;awslambda.streamifyResponse()&lt;/code&gt; to wrap the handler function. This gives you a writable &lt;code&gt;HttpResponseStream&lt;/code&gt; instead of requiring you to return a response object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BedrockRuntimeClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ConverseStreamCommand&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-sdk/client-bedrock-runtime&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BedrockRuntimeClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AWS_REGION&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamingHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;responseStream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NodeJS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WritableStream&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;httpResponseStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;awslambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HttpResponseStream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;responseStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text/event-stream&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cache-Control&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;no-cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Connection&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;keep-alive&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Access-Control-Allow-Origin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ConverseStreamCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BEDROCK_MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;inferenceConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentBlockDelta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentBlockDelta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;httpResponseStream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;\n\n`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;httpResponseStream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;data: [DONE]&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;httpResponseStream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;awslambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;streamifyResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamingHandler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each token gets written as a Server-Sent Event the moment Bedrock generates it. The &lt;code&gt;data: [DONE]&lt;/code&gt; sentinel tells the client the stream is complete.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;HttpResponseStream.from()&lt;/code&gt; call is doing something important behind the scenes: it writes the response metadata (status code, headers) as a JSON object followed by an 8-null-byte delimiter that API Gateway uses to separate metadata from the response body. If you're not using &lt;code&gt;HttpResponseStream.from()&lt;/code&gt;, you're responsible for writing that delimiter yourself.&lt;/p&gt;

&lt;p&gt;The buffered handler does the same Bedrock call but accumulates everything in memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;fullResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentBlockDelta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;fullResponse&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentBlockDelta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;corsHeaders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fullResponse&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same model, same prompt, same token generation speed. The only difference is when the client sees the output.&lt;/p&gt;

&lt;h3&gt;
  
  
  The &lt;code&gt;awslambda&lt;/code&gt; global
&lt;/h3&gt;

&lt;p&gt;One thing worth noting: &lt;code&gt;awslambda&lt;/code&gt; is a global object injected by the Lambda runtime. It's not in any npm package, and &lt;code&gt;@types/aws-lambda&lt;/code&gt; doesn't include it either. In TypeScript, you need a type declaration for it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;declare&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;awslambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;HttpResponseStream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="na"&gt;responseStream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NodeJS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WritableStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;NodeJS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WritableStream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nl"&gt;streamifyResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;responseStream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NodeJS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WritableStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the kind of detail that's easy to miss. Without this declaration, TypeScript will reject your handler at compile time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy and Try It
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account with &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html" rel="noopener noreferrer"&gt;Bedrock model access&lt;/a&gt; enabled&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" rel="noopener noreferrer"&gt;AWS CLI&lt;/a&gt; configured with credentials&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html" rel="noopener noreferrer"&gt;AWS SAM CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clone and deploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gunnargrosch/apigw-lambda-streaming.git
&lt;span class="nb"&gt;cd &lt;/span&gt;apigw-lambda-streaming
&lt;span class="nb"&gt;cd &lt;/span&gt;functions &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ..
sam build
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SAM outputs the API Gateway base URL when deployment completes. Copy it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The interactive demo
&lt;/h3&gt;

&lt;p&gt;Open &lt;code&gt;demo.html&lt;/code&gt; in a browser and paste your API Gateway URL. Click &lt;strong&gt;Run Comparison&lt;/strong&gt; (or press &lt;strong&gt;Cmd+Enter&lt;/strong&gt; on macOS, &lt;strong&gt;Ctrl+Enter&lt;/strong&gt; on Windows/Linux). Both endpoints fire simultaneously. The streaming panel fills up token by token while the buffered panel shows a spinner.&lt;/p&gt;

&lt;p&gt;The results panel at the bottom shows Time to First Byte for both approaches and the speedup factor. For longer responses (a few paragraphs or more), streaming TTFB is typically 10-20x faster than buffered. Shorter responses show a smaller gap since the buffered endpoint finishes sooner.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing with curl
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Streaming: tokens appear progressively as SSE events&lt;/span&gt;
curl &lt;span class="nt"&gt;-N&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://&amp;lt;api-id&amp;gt;.execute-api.&amp;lt;region&amp;gt;.amazonaws.com/demo/streaming &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Write a short story about serverless computing"}'&lt;/span&gt;

&lt;span class="c"&gt;# Buffered: waits for the complete response&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://&amp;lt;api-id&amp;gt;.execute-api.&amp;lt;region&amp;gt;.amazonaws.com/demo/non-streaming &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Write a short story about serverless computing"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-N&lt;/code&gt; flag on the streaming curl disables output buffering so you see tokens as they arrive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure
&lt;/h2&gt;

&lt;p&gt;The SAM template defines two Lambda functions sharing an execution role with &lt;code&gt;bedrock:InvokeModelWithResponseStream&lt;/code&gt; and &lt;code&gt;bedrock:InvokeModel&lt;/code&gt; permissions. Both functions use Node.js 20.x, 256 MB memory, and a 120-second timeout. The Bedrock model ID is configurable via a SAM parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;BedrockModelId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;String&lt;/span&gt;
    &lt;span class="na"&gt;Default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us.anthropic.claude-sonnet-4-5-20250929-v1:0&lt;/span&gt;
    &lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bedrock model ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Override it during deployment to use a different model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sam deploy &lt;span class="nt"&gt;--parameter-overrides&lt;/span&gt; &lt;span class="nv"&gt;BedrockModelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us.anthropic.claude-haiku-4-5-20251001-v1:0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API Gateway is configured with an inline OpenAPI spec via &lt;code&gt;AWS::Include&lt;/code&gt;, which keeps the streaming-specific integration properties in a separate &lt;code&gt;openapi.yaml&lt;/code&gt; file rather than buried in the SAM template.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to Know
&lt;/h2&gt;

&lt;p&gt;A few operational details worth being aware of before you ship this to production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Idle timeouts&lt;/strong&gt;: Regional and Private API endpoints have a 5-minute idle timeout on streaming responses. Edge-optimized endpoints have a 30-second idle timeout. For LLM token streams this is rarely an issue since tokens arrive continuously, but if you're calling a slower model or one that pauses during longer reasoning chains, keep this in mind.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bandwidth throttling&lt;/strong&gt;: The first 10 MB of a streaming response has no bandwidth restrictions. After that, data is throttled to 2 MB/s. Not an issue for LLM token streams, but worth knowing if you're streaming larger payloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Each 10 MB of streamed response data (rounded up to the nearest 10 MB) is billed as a single API request. For typical LLM responses, this means one request per call, same as buffered.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not supported with streaming&lt;/strong&gt;: VTL response transformation, integration response caching, and content encoding. If you rely on any of these, you'll need to handle them differently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt;: API Gateway adds three new access log variables for streaming: &lt;code&gt;$context.integration.responseTransferMode&lt;/code&gt; (BUFFERED or STREAMED), &lt;code&gt;$context.integration.timeToAllHeaders&lt;/code&gt;, and &lt;code&gt;$context.integration.timeToFirstContent&lt;/code&gt;. Useful for monitoring TTFB at the API Gateway level.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to Use This
&lt;/h2&gt;

&lt;p&gt;Response streaming makes the biggest difference for LLM applications where users are waiting for generated text: chatbots, content generation, code assistants, summarization tools. The total time doesn't change, but the perceived latency drops significantly.&lt;/p&gt;

&lt;p&gt;This demo focuses on Bedrock, but response streaming works with any Lambda or HTTP proxy integration. A few other use cases where it helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Large file downloads&lt;/strong&gt;: Streaming lets responses exceed the standard 10 MB payload limit, so you can serve large datasets, reports, or media files directly through API Gateway without routing through S3 pre-signed URLs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-running operations with progress updates&lt;/strong&gt;: An endpoint that runs a multi-step workflow can stream progress events back to the client as each step completes, instead of forcing the client to poll a separate status endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web and mobile TTFB optimization&lt;/strong&gt;: Any API response that takes more than a second or two to fully compute benefits from streaming partial results early. Server-side rendering, search results, or aggregation queries can send the first chunk while the backend continues processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A few situations where streaming matters less:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Batch processing&lt;/strong&gt;: No user is watching. Buffer the response and process it when it's complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short responses&lt;/strong&gt;: If the backend returns in under a second, streaming adds complexity without a noticeable UX improvement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured output you need to parse as a whole&lt;/strong&gt;: If your application needs the complete JSON response before it can do anything useful, streaming partial data doesn't help.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Clean Up
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sam delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Additional Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/apigw-lambda-streaming" rel="noopener noreferrer"&gt;Demo Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" rel="noopener noreferrer"&gt;Building Responsive APIs with Amazon API Gateway Response Streaming&lt;/a&gt; (AWS Compute Blog)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/rest-api-streaming.html" rel="noopener noreferrer"&gt;API Gateway Response Streaming Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;Lambda Response Streaming Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html" rel="noopener noreferrer"&gt;Amazon Bedrock Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html" rel="noopener noreferrer"&gt;Bedrock Converse Stream API Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Have you added response streaming to your Bedrock applications? I'd like to hear about your experience in the comments.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>serverless</category>
      <category>api</category>
    </item>
    <item>
      <title>Building AI Agents in TypeScript with the Strands Agents SDK</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Mon, 23 Feb 2026 17:59:40 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/building-ai-agents-in-typescript-with-the-strands-agents-sdk-1kom</link>
      <guid>https://dev.to/gunnargrosch/building-ai-agents-in-typescript-with-the-strands-agents-sdk-1kom</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/strands-agents" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; has been Python-only since launch. If your application code is TypeScript, that meant either switching languages for the agent layer or building your own tool-use loop from scratch. Neither is great.&lt;/p&gt;

&lt;p&gt;The SDK now has &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/typescript-strands-agents-preview/" rel="noopener noreferrer"&gt;TypeScript support&lt;/a&gt;, currently in preview. Breaking changes are possible and not all Python SDK features are available yet. But the core agent loop, tool use, and streaming all work, and the patterns feel natural if you're already writing TypeScript. I put together a &lt;a href="https://github.com/gunnargrosch/strands-agents-ts-demo" rel="noopener noreferrer"&gt;demo repo&lt;/a&gt; with five standalone examples that progressively build from a minimal agent to multi-turn conversation. Each one is a single file you can run directly. No build step, no boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;You'll need Node.js 20+ and AWS credentials configured for Amazon Bedrock access. The TypeScript SDK supports &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/" rel="noopener noreferrer"&gt;Bedrock, OpenAI, Gemini, and custom providers&lt;/a&gt;. These examples use Bedrock.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gunnargrosch/strands-agents-ts-demo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;strands-agents-ts-demo
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All examples use &lt;code&gt;tsx&lt;/code&gt; for direct TypeScript execution. No compilation needed.&lt;/p&gt;

&lt;p&gt;If you'd rather follow along in your own project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm init &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm pkg &lt;span class="nb"&gt;set type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;module
npm &lt;span class="nb"&gt;install&lt;/span&gt; @strands-agents/sdk zod
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; @types/node typescript tsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Example 1: The Simplest Agent
&lt;/h2&gt;

&lt;p&gt;How little code does it take to get an agent running?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;BedrockModel&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@strands-agents/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a helpful assistant that likes to be edgy in a funny way.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Which city is best, Gothenburg or Stockholm?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Prompt: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. &lt;code&gt;BedrockModel()&lt;/code&gt; with no arguments defaults to Claude Sonnet 4.5. &lt;code&gt;printer: false&lt;/code&gt; disables the SDK's built-in console output, which clutters your terminal when you want to control the formatting yourself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run simple
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or pass your own prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run simple &lt;span class="s2"&gt;"What's the meaning of life?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Example 2: Adding a Tool
&lt;/h2&gt;

&lt;p&gt;A bare agent is just a chat wrapper. Tools are where it gets useful:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@strands-agents/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;calculator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;calculate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Perform basic math operations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;add&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;subtract&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;multiply&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;divide&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="na"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;add&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;
      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;subtract&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;
      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;multiply&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;
      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;divide&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Division by zero&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a helpful math assistant. Always explain how you calculated the result.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent decides when to call the tool based on the prompt. Ask it "What is 42 times 17?" and it invokes the calculator, then explains the result.&lt;/p&gt;

&lt;p&gt;The Zod part matters more than it looks. If the model sends parameters that fail validation, the SDK catches the error and sends it back to the model as a tool result, giving it a chance to correct the input on the next loop iteration. If you've built custom agent loops before, you know how much code goes into handling bad tool inputs and retrying. Here that's just the agent loop doing its job.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run calculator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Example 3: Streaming and Model Configuration
&lt;/h2&gt;

&lt;p&gt;The first two examples wait for the full response before printing anything. That's fine for short answers, but for anything longer you want tokens streaming as they arrive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;BedrockModel&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@strands-agents/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="c1"&gt;// modelId: 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',&lt;/span&gt;
  &lt;span class="c1"&gt;// region: 'us-east-1',&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a helpful assistant. Keep responses concise.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Explain how a CPU works in five sentences.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Prompt: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;modelContentBlockDeltaEvent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;textDelta&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;agent.stream()&lt;/code&gt; returns an async iterable of every event in the agent loop: text deltas, tool call events, tool results, metadata. We filter for &lt;code&gt;textDelta&lt;/code&gt; because we only want the model's text on stdout, not the tool orchestration noise.&lt;/p&gt;

&lt;p&gt;The commented-out lines show how to pin a specific model and region. The &lt;code&gt;us.&lt;/code&gt; prefix means cross-region inference, routing requests across US regions. Worth considering for agent workloads: multi-step loops that make several model calls in sequence benefit from the distributed capacity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run streaming
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Example 4: Multi-Tool Orchestration
&lt;/h2&gt;

&lt;p&gt;This is where agents start to feel like agents rather than fancy API wrappers. The researcher example has three tools: &lt;code&gt;search&lt;/code&gt;, &lt;code&gt;compare&lt;/code&gt;, and &lt;code&gt;summarize&lt;/code&gt;. The agent decides which to call and in what order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Search the knowledge base for information about a topic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The search query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;info&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;knowledgeBase&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;found&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;info&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;found&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No results found.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;compare&lt;/code&gt; tool structures side-by-side comparisons, and &lt;code&gt;summarize&lt;/code&gt; creates formatted output. Ask the agent to compare TypeScript and Rust and here's what actually happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt → Agent
  ├─ Agent calls search("TypeScript") → gets results
  ├─ Agent calls search("Rust") → gets results
  ├─ Agent calls compare(typescript_data, rust_data) → structured comparison
  ├─ Agent calls summarize(comparison) → formatted output
  └─ Agent returns final response with the summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the part worth paying attention to. The agent isn't following a hardcoded sequence. It's making decisions at each step based on what the previous tool returned. It could search both topics first, then compare. Or it could search one, realize it needs more context, and search again before moving on. The model drives the orchestration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run researcher
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The knowledge base here is simulated, but the pattern is real. Replace it with API calls, database queries, or file operations and you have a working research agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 5: Multi-Turn Conversation
&lt;/h2&gt;

&lt;p&gt;All the previous examples are single-shot: one prompt, one response. Real applications usually need back-and-forth. Call &lt;code&gt;agent.invoke()&lt;/code&gt; multiple times on the same instance and it maintains the conversation history using a sliding window manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createInterface&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Multi-turn chat. Conversation context is preserved across messages.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Type "exit" to quit.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You: &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`\nAgent: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;\n`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each call adds to the message history. The agent remembers what you discussed and can reference earlier context. This is the foundation for chatbots, interactive assistants, or any agent that needs ongoing interaction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Not Covered
&lt;/h2&gt;

&lt;p&gt;The TypeScript SDK is still catching up to the Python SDK. A few things that aren't available yet: structured output, multi-agent patterns (graph, swarm, workflow orchestration), callback handlers for streaming (TypeScript uses async iterators exclusively), module-based tool loading, and the observability/telemetry stack. The &lt;a href="https://strandsagents.com" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; tracks what's available in each language.&lt;/p&gt;

&lt;p&gt;What is here covers the core well: model invocation, tool use with type safety, streaming, and conversation management. For most agent use cases, that's the foundation you build on.&lt;/p&gt;

&lt;p&gt;These five examples all run locally against Bedrock. The next interesting question is what happens when you deploy an agent as a service: behind an API, on Lambda, handling concurrent requests with proper error handling and observability. That's where the patterns get more nuanced, and it's what I plan to cover next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/strands-agents-ts-demo" rel="noopener noreferrer"&gt;Strands Agents TypeScript Demo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://strandsagents.com" rel="noopener noreferrer"&gt;Strands Agents Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/strands-agents" rel="noopener noreferrer"&gt;Strands Agents GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/quickstart/typescript/" rel="noopener noreferrer"&gt;TypeScript SDK Quickstart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/" rel="noopener noreferrer"&gt;Model Providers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;Amazon Bedrock Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you try any of these examples or build something with the SDK, I'd like to hear about it in the comments.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>beginners</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building the AWS Serverless Power for Kiro</title>
      <dc:creator>Gunnar Grosch</dc:creator>
      <pubDate>Sun, 22 Feb 2026 14:46:04 +0000</pubDate>
      <link>https://dev.to/gunnargrosch/building-the-aws-serverless-power-for-kiro-25f2</link>
      <guid>https://dev.to/gunnargrosch/building-the-aws-serverless-power-for-kiro-25f2</guid>
      <description>&lt;p&gt;Kiro can scaffold a Lambda function and wire up an API Gateway. But ask it to choose between Step Functions and Durable Functions for your workflow, or to configure a Kafka event source mapping with the right VPC setup and authentication, and you'll hit a wall. General knowledge gets you a working template. Practical experience gets you one that holds up in production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kiro.dev/docs/powers/" rel="noopener noreferrer"&gt;Powers&lt;/a&gt;, introduced at re:Invent 2025, are Kiro's answer to this. A Power bundles an MCP server connection, best practices, and workflow guidance into a package that loads dynamically when relevant. Without framework context, agents guess at patterns and configurations. Powers give the agent instant access to specialized knowledge, but only when it's actually needed, avoiding the context bloat you get from loading everything upfront.&lt;/p&gt;

&lt;p&gt;I'd already &lt;a href="https://dev.to/gunnargrosch/turning-aws-serverless-experience-into-a-claude-code-plugin-2nha"&gt;encoded this serverless expertise as a Claude Code plugin&lt;/a&gt;. The &lt;a href="https://github.com/gunnargrosch/aws-serverless-kiro-power" rel="noopener noreferrer"&gt;AWS Serverless Kiro Power&lt;/a&gt; brings that same knowledge to Kiro. This post walks through what it covers, how it's built, and what I learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Power Covers
&lt;/h2&gt;

&lt;p&gt;The scope is the full serverless development lifecycle on AWS, backed by 25 tools from the &lt;a href="https://awslabs.github.io/mcp/servers/aws-serverless-mcp-server" rel="noopener noreferrer"&gt;AWS Serverless MCP Server&lt;/a&gt; and ten steering guides that provide decision-making context. Project initialization, building, local testing, deployment, web application hosting, event source mappings, security, and observability. Rather than listing every capability, here's a concrete example of the difference the Power makes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kafka ESM setup: without the Power vs. with it
&lt;/h3&gt;

&lt;p&gt;Tell Kiro "set up a Lambda function to process messages from my MSK cluster." Without the Power, you get a Lambda function with an MSK event source mapping. The basics are there, but the batch size is the default, there's no error handling for partial batch failures, the IAM policy is too broad, and the VPC configuration doesn't account for your cluster being in private subnets. It works until it doesn't.&lt;/p&gt;

&lt;p&gt;With the Power, the &lt;code&gt;esm_guidance&lt;/code&gt; tool asks about your cluster configuration before generating anything. The steering guides inform the agent to configure &lt;code&gt;BisectBatchOnFunctionError&lt;/code&gt;, set up a DLQ for failed messages, and choose a batch size based on your message throughput. &lt;code&gt;secure_esm_msk_policy&lt;/code&gt; generates a least-privilege IAM policy scoped to your specific cluster ARN instead of using wildcards. The VPC configuration uses private subnets with security group rules for your broker ports. &lt;code&gt;esm_optimize&lt;/code&gt; tunes the parallelization factor if you need higher throughput. The result handles the edge cases that only show up after you've run the setup in production for a while.&lt;/p&gt;

&lt;p&gt;That pattern repeats across the Power. Project setup through &lt;code&gt;sam_init&lt;/code&gt; uses the right template for your use case instead of the default. &lt;code&gt;deploy_webapp&lt;/code&gt; handles the full CloudFront and S3 setup for Lambda Web Adapter deployments. The observability guide knows which CloudWatch metrics to alarm on and what thresholds make sense. The troubleshooting guide maps symptoms to root causes.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It's Built
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws-serverless-kiro-power/
├── POWER.md                           # Tool docs, best practices, troubleshooting
├── mcp.json                           # MCP server connection config
└── steering/                          # On-demand workflow guidance
    ├── getting-started.md             # Prerequisites and first-use walkthrough
    ├── sam-project-setup.md           # SAM initialization and workflow
    ├── cdk-project-setup.md           # CDK constructs, testing, pipelines
    ├── web-app-deployment.md          # Full-stack deployment patterns
    ├── event-sources.md               # Event source mapping configuration
    ├── event-driven-architecture.md   # EventBridge, Pipes, schema registry
    ├── orchestration-and-workflows.md # Step Functions, Durable Functions
    ├── observability.md               # Logging, tracing, metrics, dashboards
    ├── optimization.md                # Performance and cost tuning
    └── troubleshooting.md             # Symptom-based diagnosis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The split between &lt;code&gt;POWER.md&lt;/code&gt; and the steering files is deliberate. &lt;code&gt;POWER.md&lt;/code&gt; is always loaded. Its frontmatter defines the activation keywords that tell Kiro when to load the Power:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-serverless"&lt;/span&gt;
&lt;span class="na"&gt;displayName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Serverless"&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;deploy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;serverless&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;applications&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Lambda,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SAM,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;API&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Gateway,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;EventBridge,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Step&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Functions,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;event-driven&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;architectures"&lt;/span&gt;
&lt;span class="na"&gt;keywords&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;serverless"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sam"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cdk"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;gateway"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployment"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloudformation"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event-driven"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;microservices"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;backend"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;app"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqs"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kafka"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deploy"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloudwatch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cold&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;start"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rest&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;api"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eventbridge"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;url"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;functions"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;durable&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;functions"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;machine"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gunnar&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Grosch"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below the frontmatter, &lt;code&gt;POWER.md&lt;/code&gt; contains tool documentation with parameter tables and quick-reference best practices. The steering files load on demand when the agent hits a specific workflow. This matters because agent context windows have limits. Loading all ten guides upfront would eat context budget on knowledge the agent might not need for the current task. On-demand loading means the Kafka ESM guide loads when you ask about Kafka, not when you're deploying a web app.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mcp.json&lt;/code&gt; connects to the AWS Serverless MCP Server with &lt;code&gt;--allow-write&lt;/code&gt; and &lt;code&gt;--allow-sensitive-data-access&lt;/code&gt; flags enabled. You can remove either flag if you want the agent to advise without modifying your AWS account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decisions
&lt;/h2&gt;

&lt;p&gt;A few things I learned building this:&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep steering files focused on decisions, not templates
&lt;/h3&gt;

&lt;p&gt;Early versions had full YAML templates, CI/CD pipelines, and code examples in the steering files. The problem is that the MCP tools already generate these. Having them in the steering files just duplicates content and wastes context. The final versions focus on decision-making guidance: when to use which deployment type, how to choose batch sizes, what metrics to alarm on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Document every tool with actual parameters
&lt;/h3&gt;

&lt;p&gt;I validated every tool's parameters against the actual MCP server schemas. This matters because if &lt;code&gt;POWER.md&lt;/code&gt; says a parameter is called &lt;code&gt;function_identifier&lt;/code&gt; but the tool actually expects &lt;code&gt;resource_name&lt;/code&gt;, the agent will fail silently or hallucinate parameters. All 25 tools have correct required/optional parameter listings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Follow the official Power structure
&lt;/h3&gt;

&lt;p&gt;The official Kiro Powers (Stripe, Neon, Datadog, and others) follow a consistent structure in &lt;code&gt;POWER.md&lt;/code&gt;: frontmatter with keywords, overview, available steering files, full MCP tool documentation, usage examples, Do/Don't best practices, Error/Cause/Solution troubleshooting, and configuration. Following this pattern means Kiro knows exactly where to find what it needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  The expertise is portable, the packaging isn't
&lt;/h3&gt;

&lt;p&gt;I built the &lt;a href="https://dev.to/gunnargrosch/turning-aws-serverless-experience-into-a-claude-code-plugin-2nha"&gt;Claude Code plugin&lt;/a&gt; first. The ten steering guides transferred to the Kiro Power almost entirely. The packaging didn't: Claude Code uses &lt;code&gt;SKILL.md&lt;/code&gt; as the entry point, Kiro uses &lt;code&gt;POWER.md&lt;/code&gt; with different frontmatter conventions. Claude Code uses &lt;code&gt;.mcp.json&lt;/code&gt;, Kiro uses &lt;code&gt;mcp.json&lt;/code&gt;. Claude Code's plugin includes a &lt;code&gt;PostToolUse&lt;/code&gt; hook that auto-validates SAM templates after edits, which doesn't package into a Power. You could set up that hook in Kiro separately, but it's not something the Power installs for you. Distribution is different too: marketplace installation versus direct GitHub import.&lt;/p&gt;

&lt;p&gt;But the real investment is the expertise layer: the decision trees, the operational knowledge, the judgment calls that come from experience. Writing that once and adapting the packaging was significantly less work than building from scratch. This is the same idea behind the &lt;a href="https://agentskills.io/" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; standard: encode expertise in a format that multiple tools can consume. If you structure knowledge as clear decision guidance in markdown files, the per-tool adaptation is mostly mechanical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS CLI configured with credentials&lt;/li&gt;
&lt;li&gt;AWS SAM CLI installed&lt;/li&gt;
&lt;li&gt;Docker Desktop (for local testing)&lt;/li&gt;
&lt;li&gt;Python 3.10+ with uv package manager (this runs the MCP server locally, regardless of what language your application uses)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open the Powers panel in Kiro&lt;/li&gt;
&lt;li&gt;Click "Add Custom Power" and select "Import power from GitHub"&lt;/li&gt;
&lt;li&gt;Enter: &lt;code&gt;https://github.com/gunnargrosch/aws-serverless-kiro-power&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Press "Enter" to confirm&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Try It Out
&lt;/h3&gt;

&lt;p&gt;The Power activates on keywords like &lt;code&gt;serverless&lt;/code&gt;, &lt;code&gt;lambda&lt;/code&gt;, &lt;code&gt;sam&lt;/code&gt;, &lt;code&gt;deploy&lt;/code&gt;, &lt;code&gt;dynamodb&lt;/code&gt;, &lt;code&gt;kinesis&lt;/code&gt;, &lt;code&gt;sqs&lt;/code&gt;, and others. Just describe what you want to build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; I need a Python API on Lambda with DynamoDB behind it
&amp;gt; the iterator age on my kinesis stream keeps climbing, help me figure out why
&amp;gt; set up an EventBridge bus for order events with routing to three downstream services
&amp;gt; my cold starts are killing me, what can I do
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kiro uses the MCP tools to scaffold, build, test, and deploy, following the patterns in the steering files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making It Better
&lt;/h2&gt;

&lt;p&gt;Both the &lt;a href="https://github.com/gunnargrosch/aws-serverless-plugin" rel="noopener noreferrer"&gt;Claude Code plugin&lt;/a&gt; and the &lt;a href="https://github.com/gunnargrosch/aws-serverless-kiro-power" rel="noopener noreferrer"&gt;Kiro Power&lt;/a&gt; are open source. The underlying expertise is shared, so a fix in one improves both. Some areas where contributions would be most useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Steering guides for additional event source patterns (DocumentDB, MQ, S3 event notifications)&lt;/li&gt;
&lt;li&gt;Real-world troubleshooting scenarios that the current guides don't cover&lt;/li&gt;
&lt;li&gt;Corrections where the guidance doesn't match your production experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open an issue or a PR on whichever repo you're using.&lt;/p&gt;

&lt;h3&gt;
  
  
  Additional Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/aws-serverless-kiro-power" rel="noopener noreferrer"&gt;AWS Serverless Kiro Power Source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/gunnargrosch/aws-serverless-plugin" rel="noopener noreferrer"&gt;AWS Serverless Claude Code Plugin Source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://awslabs.github.io/mcp/servers/aws-serverless-mcp-server" rel="noopener noreferrer"&gt;AWS Serverless MCP Server Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kiro.dev/docs/powers/" rel="noopener noreferrer"&gt;Kiro Powers Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://agentskills.io/" rel="noopener noreferrer"&gt;Agent Skills Standard&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/serverless-application-model/" rel="noopener noreferrer"&gt;AWS SAM Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://serverlessland.com/" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/gunnargrosch/turning-aws-serverless-experience-into-a-claude-code-plugin-2nha"&gt;Turning AWS Serverless Experience into a Claude Code Plugin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What Powers have you built or are you thinking about building? Let me know in the comments.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kiro</category>
      <category>productivity</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
