DEV Community

Cover image for AWS Lambda Durable Functions: Build Workflows That Last
Eric D Johnson for AWS

Posted on

AWS Lambda Durable Functions: Build Workflows That Last

Long-running workflows without managing infrastructure

Your Lambda function needs to wait for a human approval. Or retry a failed API call with exponential backoff. Or orchestrate multiple steps that span hours. How do you build that without managing servers or databases?

AWS Lambda Durable Functions solve this. Write your workflow in your programming language—Node.js, TypeScript, Python, with more coming—using straightforward async code. Lambda handles the rest: checkpointing state, resuming after waits, retrying issues, and scaling automatically. Workflows can run for up to a year, and you only pay for actual execution time—not while waiting.

What Are Durable Functions?

Durable functions are Lambda functions that can pause and resume. When your function waits for a callback or sleeps for an hour, Lambda checkpoints its state and stops execution. When it's time to continue, Lambda resumes exactly where it left off—with all variables and context intact.

This isn't a new compute model. It's regular Lambda with automatic state management. You write normal async/await code. Lambda makes it durable.

A Simple Example

Here's a workflow that creates an order, waits 5 minutes, then sends a notification:

import { DurableContext, withDurableExecution } from '@aws/durable-execution-sdk-js';

export const handler = withDurableExecution(
  async (event: any, context: DurableContext) => {
    const order = await context.step('create-order', async () => {
      return createOrder(event.items);
    });

    await context.wait({ seconds: 300 });

    await context.step('send-notification', async () => {
      return sendEmail(order.customerId, order.id);
    });

    return { orderId: order.id, status: 'completed' };
  }
);
Enter fullscreen mode Exit fullscreen mode

That's it. No state machines to configure, no databases to manage, no polling loops. The function pauses during the wait, costs nothing while idle, and resumes automatically after 5 minutes.

Key Capabilities

Long execution times - Functions can run for up to 1 year. Individual invocations are still limited to 15 minutes, but the workflow continues across multiple invocations.

Automatic checkpointing - Lambda saves your function's state at each step. If something fails, the function resumes from the last checkpoint—not from the beginning.

Built-in retries - Configure retry strategies with exponential backoff. Lambda handles the retry logic and timing automatically.

Wait for callbacks - Pause execution until an external event arrives. Perfect for human approvals, webhook responses, or async API results.

Parallel execution - Run multiple operations concurrently and wait for all to complete. Lambda manages the coordination.

Nested workflows - Invoke other durable functions and compose complex workflows from simple building blocks.

How It Works: The Replay Model

Durable functions use a replay-based execution model. When your function resumes, Lambda replays it from the start—but instead of re-executing operations, it uses checkpointed results.

Here's what happens:

  1. First invocation - Your function runs, executing each step and checkpointing results
  2. Wait or callback - Function pauses, Lambda saves state and stops execution
  3. Resume - Lambda invokes your function again, replaying from the start
  4. Replay - Operations return checkpointed results instantly instead of re-executing
  5. Continue - Function continues past the wait with all context intact

This model ensures your function always sees consistent state, even across issues and restarts. Operations are deterministic—they execute once and replay with the same result.

Learn more: Understanding the Replay Model explains how replay works, why operations must be deterministic, and how to handle non-deterministic code safely.

Common Use Cases

Approval workflows - Wait for human approval before proceeding. The function pauses until someone clicks approve or reject.

Saga patterns - Coordinate distributed transactions with compensating actions. If a step fails, automatically roll back previous steps.

Scheduled tasks - Wait for specific times or intervals. Process data at midnight, send reminders after 24 hours, or retry every 5 minutes.

API orchestration - Call multiple APIs with retries and error handling. Coordinate responses and handle partial issues gracefully.

Data processing pipelines - Process large datasets in stages with checkpoints. Resume from the last successful stage if something fails.

Event-driven workflows - React to external events like webhooks, IoT signals, or user actions. Wait for events and continue processing when they arrive.

Testing Your Workflows

Testing long-running workflows doesn't mean waiting hours. The Durable Execution SDK includes a testing library that runs your functions locally in milliseconds:

import { LocalDurableTestRunner } from '@aws/durable-execution-sdk-js-testing';

const runner = new LocalDurableTestRunner({
  handlerFunction: handler,
});

const execution = await runner.run();

expect(execution.getStatus()).toBe('SUCCEEDED');
expect(execution.getResult()).toEqual({ orderId: '123', status: 'completed' });
Enter fullscreen mode Exit fullscreen mode

The test runner simulates checkpoints, skips time-based waits, and lets you inspect every operation. You can test callbacks, retries, and issues without deploying to AWS.

Learn more: Testing Durable Functions covers local testing, cloud integration tests, debugging techniques, and best practices.

Deploying with AWS SAM

Deploy durable functions using AWS SAM with a few key configurations:

Resources:
  OrderProcessorFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/order-processor
      Handler: index.handler
      Runtime: nodejs22.x
      DurableConfig:
        ExecutionTimeout: 900
        RetentionPeriodInDays: 7
    Metadata:
      BuildMethod: esbuild
      BuildProperties:
        EntryPoints:
          - index.ts
Enter fullscreen mode Exit fullscreen mode

The DurableConfig property enables durable execution and sets the workflow timeout. SAM automatically handles IAM permissions for checkpointing and state management.

Learn more: Deploying Durable Functions with SAM covers template configuration, permissions, build settings, and deployment best practices.

When to Use Durable Functions

  • Your workflow spans multiple steps with waits or callbacks
  • You need automatic retries with exponential backoff
  • You want to coordinate multiple async operations
  • Your process requires human approval or external events
  • You need to handle long-running tasks without managing state
  • You prefer writing workflows as code rather than configuration

Getting Started

  1. Install the SDK: npm install @aws/durable-execution-sdk-js
  2. Write your function: Wrap your handler with withDurableExecution()
  3. Use durable operations: context.step(), context.wait(), context.waitForCallback()
  4. Test locally: Use LocalDurableTestRunner for fast iteration
  5. Deploy with SAM: Add DurableConfig to your template
  6. Monitor execution: Use Amazon CloudWatch and AWS X-Ray for observability

Learn More

Summary

AWS Lambda Durable Functions let you build long-running workflows without managing infrastructure. Write straightforward async code, and Lambda handles state management, retries, and resumption. Your functions can wait for callbacks, retry issues, and run for up to a year—all while paying only for execution time.

Start with simple workflows, test locally for fast iteration, and deploy with confidence knowing Lambda manages the complexity of distributed state.

Top comments (0)