TL;DR: In December 2025, AWS released Lambda Durable Functions. Your Lambda can now run for 366 days (not 15 minutes), automatically checkpoint progress, suspend during waits without charges, handle retries with built-in strategies, wait for external callbacks like human approvals, and process batches with concurrency control. All in a single Lambda function.
What Are Lambda Durable Functions?
Lambda Durable Functions extends Lambda to support long-running, stateful workflows that can pause, wait, and resume. Unlike standard Lambda functions (max 15 minutes), durable functions can run for up to 366 days through checkpoint-and-replay.
Key capabilities:
- Automatic checkpointing after each operation
- Zero-cost suspension during waits (Lambda suspends)
- Built-in retry with configurable strategies
- External callback support (human approvals, webhooks)
- Batch processing with per-item checkpoints
Getting Started
This tutorial uses TypeScript and Serverless Framework for infrastructure-as-code. You can also use your favorite programming language, console, or any infrastructure-as-code tool like CDK, SAM, Terraform, etc.
Enable Durable Execution
Configure your Lambda function to support durable execution in serverless.yml:
functions:
myFunction:
handler: handler.main
timeout: 900 # Lambda execution timeout (seconds, max 900 = 15 min)
durableConfig:
executionTimeout: 86400 # Workflow timeout (seconds, max 31,622,400 = 366 days)
retentionPeriodInDays: 7 # Keep execution history for 7 days
Parameter explanations:
timeout: Maximum time for a single Lambda invocation (max 15 minutes). Your function is replayed multiple times, and each replay must complete within this limit.executionTimeout: Maximum time for the entire workflow across all replays (max 366 days). This is how long your durable function can run from start to finish, including all waits.retentionPeriodInDays: How long AWS keeps your execution history and checkpoint logs after completion (1-90 days). Used for debugging and observability.
Example scenario: You have a workflow that processes a payment, waits 2 hours, then ships an order.
- Set
timeout: 60because each individual Lambda execution (processing payment, then later shipping order) completes in under 60 seconds - Set
executionTimeout: 7200(2 hours) because the entire workflow from start to finish takes 2 hours (including the wait)
Set Up the Durable Execution SDK
Durable functions require the SDK - it's not optional. The SDK handles checkpoint-and-replay, manages execution state, and provides the durable operations you'll use in your code. Without it, you'd need to manually implement all state management, checkpoint tracking, and recovery logic yourself.
Available languages:
- JavaScript
- TypeScript
- Python
AWS will add support for more languages over time.
Install:
npm install @aws/durable-execution-sdk-js
Wrap Your Handler
Wrap your Lambda handler with withDurableExecution to enable durable execution:
import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';
export const handler = withDurableExecution(
async (event: any, context: DurableContext) => {
// Your durable workflow code here
return { statusCode: 200 };
}
);
The DurableContext gives you access to durable operations including step(), wait(), parallel(), map(), waitForCallback(), and several others for building long-running workflows.
How Durable Execution Works
Checkpoint and Replay
Durable functions run multiple times during their lifecycle. Each time Lambda invokes your function, it replays your code from the beginning - but skips completed operations by reading from the checkpoint log.
Example:
export const handler = withDurableExecution(async (event, context) => {
// Step 1: Charge payment
const charge = await context.step('charge', async () => {
return processPayment(event.amount);
});
// Step 2: Wait 2 hours
await context.wait({ seconds: 7200 });
// Step 3: Ship order
const shipment = await context.step('ship', async () => {
return createShipment(charge.orderId);
});
return { shipment };
});
Execution timeline:
Invocation 1 (T+0):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β [charge β] β checkpoint saved β
β [wait 2h...] β Lambda suspends (no charges) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β³ 2 hours pass...
Invocation 2 (T+2h):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β [charge β‘cached] β reads from checkpoint β
β [wait β‘skipped] β already completed β
β [ship β] β checkpoint saved β
β Return result β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
During the 2-hour wait, no Lambda runs. Zero charges.
Determinism Requirements
Replay depends on your code producing the same results every time it runs. Any code outside durable operations must be deterministic - meaning it returns the same output for the same input.
Non-deterministic operations must be wrapped:
// β Wrong: Random value changes on each replay
const id = uuid();
await context.step('save', async () => saveWithId(id));
// β
Correct: Random value generated once, checkpointed
const id = await context.step('generate-id', async () => {
return uuid();
});
await context.step('save', async () => saveWithId(id));
Wrap these in steps:
- Random values (
Math.random(),uuid(),uuidv4()) - Timestamps (
Date.now(),new Date()) - External API calls
- Database queries
Core Operations
The SDK provides several operations for building durable workflows. Each operation creates checkpoints automatically, ensuring your function can resume from any point.
context.step()
Executes business logic with automatic checkpointing and retry. Once a step succeeds, it never re-executes - the checkpointed result is used on replay.
const result = await context.step('process-payment', async () => {
return await paymentService.charge(amount);
});
Use for: Database calls, API requests, any side-effecting operation.
context.wait()
Pauses execution for a specified duration. The SDK creates a checkpoint, terminates the function invocation, and schedules resumption. When the wait completes, Lambda invokes your function again.
await context.wait({ seconds: 3600 }); // Wait 1 hour
Use for: Delays between operations, rate limiting, scheduled actions.
context.parallel()
Executes multiple operations concurrently with optional concurrency control.
const results = await context.parallel([
async (ctx) => ctx.step('task1', async () => processTask1()),
async (ctx) => ctx.step('task2', async () => processTask2()),
async (ctx) => ctx.step('task3', async () => processTask3())
]);
Use for: Independent operations that can run simultaneously.
context.map()
Concurrently executes an operation on each item in an array with optional concurrency control.
const results = await context.map(itemArray, async (ctx, item, index) =>
ctx.step('task', async () => processItem(item, index))
);
Use for: Batch processing, parallel data transformation.
context.waitForCallback()
Suspends execution until an external system submits a callback. The SDK creates a callback, executes your submitter function with the callback ID, and waits for the result.
const result = await context.waitForCallback(
'external-api',
async (callbackId, ctx) => {
await submitToExternalAPI(callbackId, requestData);
},
{ timeout: { minutes: 30 } }
);
The external system receives the callbackId and sends the result back using the Lambda API (SendDurableExecutionCallbackSuccess or SendDurableExecutionCallbackFailure).
Use for: Human approvals, webhook integrations, external system coordination.
context.createCallback()
Creates a callback and returns both a promise and callback ID. You send the callback ID to an external system, which submits the result using the Lambda API (SendDurableExecutionCallbackSuccess or SendDurableExecutionCallbackFailure).
const [promise, callbackId] = await context.createCallback('approval', {
timeout: { hours: 24 }
});
await sendApprovalRequest(callbackId, requestData);
const approval = await promise;
Use for: Advanced scenarios where you need the callback ID before suspending.
context.invoke()
Invokes another Lambda function and waits for its result.
const result = await context.invoke(
'invoke-processor',
'arn:aws:lambda:us-east-1:123456789012:function:processor',
{ data: inputData }
);
Use for: Function composition, workflow decomposition, calling other Lambda functions.
context.waitForCondition()
Polls for a condition with automatic checkpointing between attempts. The SDK executes your check function, creates a checkpoint with the result, waits according to your strategy, and repeats until the condition is met.
const result = await context.waitForCondition(
async (state, ctx) => {
const status = await checkJobStatus(state.jobId);
return { ...state, status };
},
{
initialState: { jobId: 'job-123', status: 'pending' },
waitStrategy: (state) =>
state.status === 'completed'
? { shouldContinue: false }
: { shouldContinue: true, delay: { seconds: 30 } }
}
);
Use for: Polling external systems, waiting for resources to be ready, implementing retry with backoff.
context.runInChildContext()
Creates an isolated execution context for grouping operations. Child contexts have their own checkpoint log and can contain multiple steps, waits, and other operations. The SDK treats the entire child context as a single unit for retry and recovery.
const result = await context.runInChildContext(
'batch-processing',
async (childCtx) => {
return await processBatch(childCtx, items);
}
);
Use for: Organizing complex workflows, implementing sub-workflows, isolating operations that should retry together.
Complete Example: Order Fulfillment Workflow
Here's a real-world example combining multiple operations:
import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';
export const handler = withDurableExecution(
async (event: { orderId: string; items: string[] }, context: DurableContext) => {
// Step 1: Process payment
const payment = await context.step('process-payment', async () => {
return await paymentService.charge(event.orderId);
});
// Step 2: Wait 1 hour for fraud check window
await context.wait({ seconds: 3600 });
// Step 3: Parallel operations - reserve inventory and calculate shipping
const [inventory, shipping] = await context.parallel([
async (ctx) => ctx.step('reserve-inventory', async () =>
inventoryService.reserve(event.items)
),
async (ctx) => ctx.step('calculate-shipping', async () =>
shippingService.calculate(event.orderId)
)
]);
// Step 4: Wait for external approval
const approval = await context.waitForCallback(
'order-approval',
async (callbackId, ctx) => {
await notificationService.sendApprovalRequest(callbackId, event.orderId);
},
{ timeout: { hours: 24 } }
);
if (!approval.approved) {
return { status: 'rejected', orderId: event.orderId };
}
// Step 5: Process each item
const shipments = await context.map(
event.items,
async (ctx, item, index) =>
ctx.step('ship-item', async () =>
shippingService.shipItem(item, shipping.address)
)
);
// Step 6: Send confirmation
await context.step('send-confirmation', async () => {
return notificationService.sendConfirmation(event.orderId, shipments);
});
return { status: 'completed', orderId: event.orderId, shipments };
}
);
This workflow demonstrates:
- Sequential steps with automatic checkpointing (payment, confirmation)
- Time-based waits for fraud checks (no charges during wait)
- Parallel execution for independent operations (inventory + shipping)
- External callbacks for order approval with 24-hour timeout
-
Batch processing with
map()for shipping multiple items
The entire workflow runs for 25+ hours (1-hour fraud check + 24-hour approval window) while only consuming compute during active operations.
Resources
Have you tried Lambda Durable Functions yet? What workflows are you building with them? Share your experiences and questions in the comments below!
Top comments (0)