DEV Community

Cover image for Do we need AWS Durable Functions when we have Step Functions?
JoLo for AWS Community Builders

Posted on • Edited on • Originally published at Medium

Do we need AWS Durable Functions when we have Step Functions?

At re:invent 2025, AWS announced AWS Lambda Durable Functions, which could replace Step Functions and enable multi-step applications.
This got me interested because I don't like the definition in YAML or JSONata for Amazon Step Functions. But I appreciate the visual representation of each step and the ability to debug them individually.

In this blog post, I want to compare Step Functions with AWS Lambda Durable Functions.

What are Durable Functions?

Durable Functions are Serverless building blocks that let ordinary code remember its place. They can sleep for hours, wait for external events, survive crashes and new deployments, then pick up exactly where they left off—without keeping a server warm or paying for idle time.

Cloud providers hide the complexity: each time the function yields, the runtime snapshots the call stack and state to disk (or a distributed log) and restores them on the next wake-up. You get the elasticity of functions with the reliability of long-running workflows.

Cloudflare, Azure, and Vercel are offering Durable Workflows. But also businesses like Temporal.io and Inngest build their business around them. Trigger.dev is an open source library for TypeScript apps (I am a fan 😇) that also offers a nice UI for them.

The Idea

Okay, back to the questions, if we need Durable Functions when we have Step Functions?
To answer that, we will create a simple Magic Link confirmation.
The Idea - Magic Code Email.png
In an architectural view where I combine durable functions handler with Step Functions, it would look like this:
The Architecture - Combining Durable Functions and Step Functions.png
For this demo, I created an API Gateway that has two endpoints. One points to the durable function; the other points to Step Functions.
From the architecture, you can clearly see that the durable function's architecture is simpler. We will cover the part with the Send Task Token later.
You can find the repository here.

The following sections describe the implementation of each.
To do this, we will use AWS CDK in TypeScript.

Durable Function

The NodeJsFunction- construct has a new property durableConfig.

new NodejsFunction(this, 'DurableHandler', {
  functionName: 'durable-handler',
  runtime: Runtime.NODEJS_24_X,
  entry: 'lambdas/durable/handler.ts',
  // 👇 this marks the handler to become durable
  durableConfig: {
    executionTimeout: Duration.minutes(15)
  }
})
Enter fullscreen mode Exit fullscreen mode

In the AWS Console, it looks as follows
AWS Durable Functions.png
For the AWS Lambda Handler, I used the npm package aws-durable-execution-js.
Here, when a function is running
Durable Functions - AWS Console.gif
Here the code for the durable function that waits for a callback. For that, we need to install the aws-durable-execution-js

import { checkValue } from '../shared/check-value';
export const handler: DurableLambdaHandler = withDurableExecution<EmailRequest>(
  async (event, context) => {
    console.log('durable-handler event', event)
    const { email } = event
    const result = await context.waitForCallback(
        'wait-for-users-input',
        async (callbackId, ctx) => {
          ctx.logger?.info(`Submitting callback ID to external service: ${callbackId}`);

          const result = await sendEmail({
            email,
            taskToken: callbackId // <-- Callback Id has been generated
          });
          ctx.logger.info(`DurableLambdaHandler result ${result}`, result)

          const foundValue = await checkValue(email, result.code) // <-- This actually not correct but for demo purpose we keep it
          ctx.logger.info('DurableLambdaHandler foundValue', foundValue)
        },
        {
          timeout: { minutes: 15 },
        }
      );
      console.log('durable-handler result', result)
      return {
        success: true,
        externalResult: result,
      };

    return {
      success: true
    }
  }
);

Enter fullscreen mode Exit fullscreen mode

The function withDurableExecution makes this durable. Then the context.waitForCallback makes the function wait and returns a callbackId to pass to the callback. That id then needs to be passed to the user.

Cannot pass another value to the Callback

However, we need another endpoint to look into the DynamoDB because the inputs have been made, and the durable function only waits for the callbackId. That means I cannot return the user input to the callback function.

Cannot use API Gateway directly

In the section above, I wanted to execute it directly from API Gateway. However, this won't work because it is not supported as an event source. Frustrating, so I have to add another proxy Lambda that executes the durable Lambda.
The Architecture - Corrected because invoker.png

Use a Qualified ARNs requirement

That is one of the most confusing ones because aws-durable-execution-js and the AWS CDK do not mention that you need a Qualified ARN to invoke the durable function. Even GitHub Copilot and MCP context7 didn't get that information, so they hallucinate their response. I had to look it up in their official documentation.
In the end, I added the $LATEST to the ARN.

Not well-structured Documentation

The part where we add a policy to durable function in their documentation adds a circular dependency and CDK will fail.

const durableFunction = new lambda.Function(this, 'DurableFunction', { 
    runtime: lambda.Runtime.NODEJS_22_X,
    handler: 'index.handler',
    code: lambda.Code.fromAsset('lambda'),
    functionName: 'myDurableFunction',
    durableConfig: {
      executionTimeout: Duration.hours(1),
      retentionPeriod: Duration.days(30)
    }
}); // Add checkpoint permissions

// 👇 This creates a circular dependency
durableFunction.addToRolePolicy(new iam.PolicyStatement({
    actions: ['lambda:CheckpointDurableExecutions',
        'lambda:GetDurableExecutionState'],
    resources: [durableFunction.functionArn]
}));
Enter fullscreen mode Exit fullscreen mode

However, that’s exactly what the documentation suggested. In the end, I left it out because I didn’t want to use resumability or their step-based logic anyway. I’ll revisit this once I encounter a use case that actually requires it.

Another thing is that if we want to waitForCallback, we need to send a SendDurableExecutionCallbackSuccessCommand or SendDurableExecutionCallbackFailureCommand via the Lambda API. Unfortunately, this isn't documented, but GitHub Copilot found it for me.
Furthermore, you can only use ESM (which is good) here, and you need to bundle the latest client-lambda in the token handler with it because the default client-lambda (which is usually part of the Lambda) is not up-to-date (as of 26/12/2025). It is missing the API SendDurableExecutionCallbackSucces and SendDurableExecutionCallbackFailure. It isn't delightful when your Lambda function bundle grows larger.

const callbackHandler = new NodejsFunction(this, 'CallbackHandler', {
      functionName: 'callback-handler',
      runtime: Runtime.NODEJS_24_X,
      entry: 'lambdas/shared/callback-handler.ts',
      bundling: {
        bundleAwsSDK: true, // <-- This has to be set otherwise you use an older SDK version
      },
    });

callbackHandler.addToRolePolicy(new PolicyStatement({
      actions: [
        'lambda:SendDurableExecutionCallbackSuccess',
        'lambda:SendDurableExecutionCallbackFailure'
      ],
      resources: [`${durableHandler.functionArn}:*`, durableHandler.functionArn]
    }))
Enter fullscreen mode Exit fullscreen mode

Confusing AWS Console

I don't understand why the console is showing steps that come 'after' the callback is already shown as completed, even though the function is still 'running'.
Confusing AWS Console Output for Durable Functions.gif

Step Functions

For Amazon Step Functions, we’re currently implementing them with Lambda. I wanted to use Step Functions without Lambda, but I couldn’t validate the user’s input while the workflow was waiting. The user sends a code, but I need to pass it to the state machine while waiting. So that is why it will also be a single Lambda.

export const handler = async (event: Event) => {
  // -------------------------------------------------
  // 1️⃣ Parse incoming request
  // -------------------------------------------------
  console.log(event);
  const payload = typeof event.body === 'string' ? JSON.parse(event.body) : event.body;
  const { email, token, code } = payload;

  if (!token || !code) {
    return { statusCode: 400, body: JSON.stringify({ error: 'token and code required' }) };
  }

    // -------------------------------------------------
    // 3️⃣ Tell Step Functions the task succeeded
    // -------------------------------------------------
    const cmd = new SendTaskSuccessCommand({
      taskToken: token,
      output: JSON.stringify({ verification: 'ok' }), // any payload you want the state machine to receive
    });

    const result = await sfn.send(cmd);
    return { statusCode: 200, body: JSON.stringify({ message: 'Verification accepted'})
};
Enter fullscreen mode Exit fullscreen mode

Here is the implementation in AWS CDK

const choiceStep = new Choice(this, 'IsCodeCorrect?')
      .when(Condition.stringEquals(JsonPath.stringAt('$.verification'), 'ok'), new Succeed(this, 'Success'))
      .otherwise(new Fail(this, 'IncorrectCode', {
        cause: 'Verification code did not match',
        error: 'IncorrectCode'
      }));

    const stepSendEmail = new NodejsFunction(this, 'StepSendEmail', {
      functionName: 'step-send-email',
      runtime: Runtime.NODEJS_24_X,
      entry: 'lambdas/step-functions/send-email.ts',
      bundling: {
        format: OutputFormat.ESM,
        minify: true,
      },
      environment: {
        TABLE_NAME: stateTable.tableName,
        SENDER_EMAIL: senderEmail
      },
      timeout: Duration.seconds(30)
    });
    stateTable.grantWriteData(stepSendEmail);
    emailIdentity.grantSendEmail(stepSendEmail);

    const sendEmailStep = new LambdaInvoke(this, 'SendEmailStep', {
      lambdaFunction: stepSendEmail,
      integrationPattern: IntegrationPattern.WAIT_FOR_TASK_TOKEN,
      payload:
        TaskInput.fromObject({
          email: JsonPath.stringAt('$.email'),
          taskToken: JsonPath.taskToken,
        })
    });

    // Define the state machine with task token pattern
    const definition = sendEmailStep
      .next(choiceStep); // Check if the received code matches the original code

    const stateMachine = new StateMachine(this, 'EmailStateMachine', {
      stateMachineName: 'EmailProcessingWorkflow',
      definitionBody: DefinitionBody.fromChainable(definition),
      timeout: Duration.minutes(15) // Match task token timeout
    });


Enter fullscreen mode Exit fullscreen mode

For the Send Task Token to proceed, we need a proxy Lambda, similar to the durable functions.
The Architecture - Corrected for the Step Functions.png
To wait for human interaction, we need to use a Callback with the Task Token. Initially, I sent a success message, but I also need to include the user's input here. That is not possible because the step function is already running, and all arguments and inputs have been made.

So I added a task-token-sender-Handler endpoint to the API Gateway that returns a success or failure response when the user input is correct.

Conclusion

It makes sense that AWS introduced durable functions, given that Cloudflare, Azure, and Vercel had already introduced them. Now, with the rise of AI Agent, which lets functions wait for user input without losing context, this makes sense. Wes from Syntax.fm even predicted that 2026 will be the year of durable functions1.
Definitely, the Durable Function is infrastructure-wise simple to implement, but it requires a third-party package, aws-durable-execution-js . Luckily, the handler won't grow large because the dependencies are limited to the lambda client.
It is also interesting to note that the Lambda paradigm, which used to run for 15 minutes of computing, has shifted to days, if not years.

Both the durable functions and Step Functions require a proxy to send either a success or a failure, and cannot send input to the current waiting Lambda Function. Step Functions require much more configuration (see CDK code above), and waiting for the Callback with the Task Token gave me a headache.

However, AWS Lambda Durable Functions cannot be used directly with API Gateway and require an 'Invoker' in between. It took a while to understand that I needed an invoker for that. Also, I need the Qualified ARN (simply the Lambda version at the end of the ARN) to invoke the durable handler that is missing in the developer documentation. The documentation for aws-durable-execution-js does not explain how to retrigger waitForCallback. It does not specify that it requires the Lambda SDK to send either a success or a failure (AI and MCP also weren't aware).
What is also confusing is the output in the AWS Console. It shows that the steps have already been concluded, even though the function was still running.

Cost-wise you won't be charged for idle time.

Even with some documentation issues and odd AWS Console output, I will use AWS Lambda Durable Functions more, as it requires me to structure my code better. For my little idea, I won't use Lambda's waitForCallback yet; I'll use Step Functions, event trigger.dev, or Temporal.io, as they are much more mature in that regard.

I guess Wes is right and 2026 may be the year of Durable Functions.


  1. https://syntax.fm/show/967/what-s-going-to-happen-in-web-dev-during-2026 

Top comments (0)