Long-running workflows without managing infrastructure
Your Lambda function needs to wait for a human approval. Or retry a failed API call with exponential backoff. Or orchestrate multiple steps that span hours. How do you build that without managing servers or databases?
AWS Lambda Durable Functions solve this. Write your workflow in your programming language—Node.js, TypeScript, Python, with more coming—using straightforward async code. Lambda handles the rest: checkpointing state, resuming after waits, retrying issues, and scaling automatically. Workflows can run for up to a year, and you only pay for actual execution time—not while waiting.
What Are Durable Functions?
Durable functions are Lambda functions that can pause and resume. When your function waits for a callback or sleeps for an hour, Lambda checkpoints its state and stops execution. When it's time to continue, Lambda resumes exactly where it left off—with all variables and context intact.
This isn't a new compute model. It's regular Lambda with automatic state management. You write normal async/await code. Lambda makes it durable.
A Simple Example
Here's a workflow that creates an order, waits 5 minutes, then sends a notification:
import { DurableContext, withDurableExecution } from '@aws/durable-execution-sdk-js';
export const handler = withDurableExecution(
async (event: any, context: DurableContext) => {
const order = await context.step('create-order', async () => {
return createOrder(event.items);
});
await context.wait({ seconds: 300 });
await context.step('send-notification', async () => {
return sendEmail(order.customerId, order.id);
});
return { orderId: order.id, status: 'completed' };
}
);
That's it. No state machines to configure, no databases to manage, no polling loops. The function pauses during the wait, costs nothing while idle, and resumes automatically after 5 minutes.
Key Capabilities
Long execution times - Functions can run for up to 1 year. Individual invocations are still limited to 15 minutes, but the workflow continues across multiple invocations.
Automatic checkpointing - Lambda saves your function's state at each step. If something fails, the function resumes from the last checkpoint—not from the beginning.
Built-in retries - Configure retry strategies with exponential backoff. Lambda handles the retry logic and timing automatically.
Wait for callbacks - Pause execution until an external event arrives. Perfect for human approvals, webhook responses, or async API results.
Parallel execution - Run multiple operations concurrently and wait for all to complete. Lambda manages the coordination.
Nested workflows - Invoke other durable functions and compose complex workflows from simple building blocks.
How It Works: The Replay Model
Durable functions use a replay-based execution model. When your function resumes, Lambda replays it from the start—but instead of re-executing operations, it uses checkpointed results.
Here's what happens:
- First invocation - Your function runs, executing each step and checkpointing results
- Wait or callback - Function pauses, Lambda saves state and stops execution
- Resume - Lambda invokes your function again, replaying from the start
- Replay - Operations return checkpointed results instantly instead of re-executing
- Continue - Function continues past the wait with all context intact
This model ensures your function always sees consistent state, even across issues and restarts. Operations are deterministic—they execute once and replay with the same result.
Learn more: Understanding the Replay Model explains how replay works, why operations must be deterministic, and how to handle non-deterministic code safely.
Common Use Cases
Approval workflows - Wait for human approval before proceeding. The function pauses until someone clicks approve or reject.
Saga patterns - Coordinate distributed transactions with compensating actions. If a step fails, automatically roll back previous steps.
Scheduled tasks - Wait for specific times or intervals. Process data at midnight, send reminders after 24 hours, or retry every 5 minutes.
API orchestration - Call multiple APIs with retries and error handling. Coordinate responses and handle partial issues gracefully.
Data processing pipelines - Process large datasets in stages with checkpoints. Resume from the last successful stage if something fails.
Event-driven workflows - React to external events like webhooks, IoT signals, or user actions. Wait for events and continue processing when they arrive.
Testing Your Workflows
Testing long-running workflows doesn't mean waiting hours. The Durable Execution SDK includes a testing library that runs your functions locally in milliseconds:
import { LocalDurableTestRunner } from '@aws/durable-execution-sdk-js-testing';
const runner = new LocalDurableTestRunner({
handlerFunction: handler,
});
const execution = await runner.run();
expect(execution.getStatus()).toBe('SUCCEEDED');
expect(execution.getResult()).toEqual({ orderId: '123', status: 'completed' });
The test runner simulates checkpoints, skips time-based waits, and lets you inspect every operation. You can test callbacks, retries, and issues without deploying to AWS.
Learn more: Testing Durable Functions covers local testing, cloud integration tests, debugging techniques, and best practices.
Deploying with AWS SAM
Deploy durable functions using AWS SAM with a few key configurations:
Resources:
OrderProcessorFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src/order-processor
Handler: index.handler
Runtime: nodejs22.x
DurableConfig:
ExecutionTimeout: 900
RetentionPeriodInDays: 7
Metadata:
BuildMethod: esbuild
BuildProperties:
EntryPoints:
- index.ts
The DurableConfig property enables durable execution and sets the workflow timeout. SAM automatically handles IAM permissions for checkpointing and state management.
Learn more: Deploying Durable Functions with SAM covers template configuration, permissions, build settings, and deployment best practices.
When to Use Durable Functions
- Your workflow spans multiple steps with waits or callbacks
- You need automatic retries with exponential backoff
- You want to coordinate multiple async operations
- Your process requires human approval or external events
- You need to handle long-running tasks without managing state
- You prefer writing workflows as code rather than configuration
Getting Started
-
Install the SDK:
npm install @aws/durable-execution-sdk-js -
Write your function: Wrap your handler with
withDurableExecution() -
Use durable operations:
context.step(),context.wait(),context.waitForCallback() -
Test locally: Use
LocalDurableTestRunnerfor fast iteration -
Deploy with SAM: Add
DurableConfigto your template - Monitor execution: Use Amazon CloudWatch and AWS X-Ray for observability
Learn More
- Understanding the Replay Model - Deep dive into how durable functions work under the hood
- Testing Durable Functions - Comprehensive guide to testing workflows locally and in the cloud
- Deploying with AWS SAM - Complete deployment guide with templates and best practices
Summary
AWS Lambda Durable Functions let you build long-running workflows without managing infrastructure. Write straightforward async code, and Lambda handles state management, retries, and resumption. Your functions can wait for callbacks, retry issues, and run for up to a year—all while paying only for execution time.
Start with simple workflows, test locally for fast iteration, and deploy with confidence knowing Lambda manages the complexity of distributed state.
Top comments (0)