DEV Community

Cover image for AWS Lambda Durable Functions vs Step Functions: a real-world comparison

AWS Lambda Durable Functions vs Step Functions: a real-world comparison

Hey devs, I recently built the same order dispatch workflow twice, once with AWS Step Functions and once with AWS Lambda durable functions. The difference in developer experience was significant. Let me walk you through what I learned and why I decided to do this.

AWS Lambda durable functions are relatively new to the AWS ecosystem, so deciding whether to use them is not always straightforward.

šŸŽÆ The Problem: A Real-World Order Workflow

I needed to build a simple workflow for handling an order:

  1. Store the order in DynamoDB
  2. Check inventory
  3. Wait for human approval
  4. Automatically find alternatives if rejected or complementary items if confirmed
  5. Wait 2 days
  6. Generate an email with Bedrock (with alternatives or complementary items, based on confirmation or rejection)
  7. Send it via SES

This is a real-world scenario (and also something I needed in production): human-in-the-loop, approvals, timers, and external service integrations. In reality, it’s a bit more complex than that, but for the sake of discussion, we can focus on this key question: should I choose Step Functions or Lambda durable functions?

⚔ Which one should I choose?

The choice framework proposed by AWS is a good start:

Go with Lambda durable functions if:

  • You prefer using your familiar programming language
  • Local testing without cloud dependencies is important to you
  • Your compute service of choice is AWS Lambda and your business logic primarily lives in those functions

Stick with Step Functions if:

  • Visual workflows are important for your stakeholders
  • You're orchestrating many AWS services together
  • You want to reduce ops burden (patching, scaling, ..)

I wasn’t sure what to choose, as my goal is always to maximize Developer Experience (DevEx) and maintainability. I don’t necessarily need a fully visual workflow, but one of the business requirements is ensuring seamless integration with other AWS services and upcoming workflows. I’m also a big fan of Step Functions when it comes to CDK-based projects. On the other hand, I’m really drawn to the simplicity of Lambda durable functions.

šŸ†š Code comparison

So, this was the perfect scenario to explore both solutions. The workflow is clear and simple enough that it won’t take much time to build them in parallel, allowing for a real-world comparison.

Let’s look at the actual code: this is where the differences become clear.

A TypeScript function using Lambda durable functions

Why does this perfectly suit my scenario?

It’s a single TypeScript function, there’s no CDK involved to implement the workflow. CDK is used only to create the architecture, thus a single Lambda, cleanly separating my workflow logic from the infrastructure code.

Using async/await feels very natural for a developer working in this environment, and I can encapsulate the entire workflow within a single function, giving me one clear place to understand what’s going on. On top of that, I get full IDE support with autocomplete and type checking (and AI, have you tried Kiro yet?)

Let’s take a look at the code, starting with the imports.
I’ll remove all the pure business logic, as we should focus on the workflow itself (and I can’t disclose my client’s code!).

import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';
Enter fullscreen mode Exit fullscreen mode

Since we’ve imported it, we should wrap the handler in a durable execution.

export const handler = withDurableExecution(async (event: OrderEvent, context: DurableContext) => {

});
Enter fullscreen mode Exit fullscreen mode

We can now move on to our workflow steps.
Let’s start by saving the order (in my real-world scenario I've saved it to DynamoDB using the AWS SDK) with a first async step.

  // Step 1: Save order
  const order = await context.step('save-order', async (): Promise<OrderData> => {
    const orderId = `ORD-${Date.now()}`;
    return {
      orderId,
      buyerEmail: event.buyerEmail,
      items: event.items,
    };
  });
Enter fullscreen mode Exit fullscreen mode

The second step waits for the first one to complete, then checks the inventory and provides the relevant information needed to confirm or reject the order. An interesting aspect is that we can use a logger to record the response of each step and async/await pattern really helps us understand what happens in sequence.

  // Step 2: Check inventory
  const availability = await context.step('check-inventory', async () => {
    return event.items.map(item => ({
      itemId: item.itemId,
      available: 100,
      inStock: true,
    }));
  });

  context.logger.info('Inventory checked', { availability });
Enter fullscreen mode Exit fullscreen mode

The third step handles human approval using waitForCallback function. At this stage, in my real-world scenario, an email is sent to the approver (I used SES, but you could just as easily use an SNS topic or any other notification system). However, this is just part of the business logic, which I won’t go into here.

  // Step 3: Wait for human approval (up to 48h, no compute cost)
  const approval = await context.waitForCallback(
    'wait-for-approval',
    async (callbackId) => {
      context.logger.info('Waiting for approval', { callbackId, orderId: order.orderId });
    },
    { timeout: { hours: 48 } }
  );
Enter fullscreen mode Exit fullscreen mode

Once the decision is received, we handle any rejected items and look for suitable alternatives (I retrieved them from DynamoDB, but you can use any database you prefer). While, if the order is accepted, we instead search for complementary items to recommend to the user.
For simplicity, error handling is omitted here, but in a production scenario this should be wrapped in a try/catch block and handled properly.

  // Step 4: Handle approval or rejection
  let status = (JSON.parse(approval)).decision;
  let suggestedItems;
  if (status === 'discard') {
    suggestedItems = await context.step('find-similar', async () => {
      return [{ itemId: 'SIM-1', name: 'Similar Item' }];
    });
  } else {
    suggestedItems = await context.step('find-complementary', async () => {
      return [{ itemId: 'COM-1', name: 'Complementary Item' }];
    });
  }
Enter fullscreen mode Exit fullscreen mode

The business then decided to pause the workflow for at least two days, as they don’t want to bother the user with marketing emails immediately after an order is accepted or rejected. This is a good opportunity to see a wait in action.

  // Step 5: Wait 2 days before marketing follow-up
  await context.wait('wait-two-days', { days: 2 });
Enter fullscreen mode Exit fullscreen mode

Once the wait is over, I generate the email via Amazon Bedrock, using information from previous steps or the database. In practice, I personalize the message based on the approval decision, either suggesting similar products for rejected orders or recommending complementary items for confirmed ones.

  // Step 6: Generate marketing email via Bedrock
  const email = await context.step('generate-email', async () => {
    return 'This is where the email has been generate';
  });
Enter fullscreen mode Exit fullscreen mode

And finally send the email generated and close the function returning the order, status and suggested items.

  // Step 7: Send email
  await context.step('send-email', async () => {
    context.logger.info('Sending email', { email, to: event.buyerEmail });
  });

  return { orderId: order.orderId, status: status, suggested:suggestedItems };
Enter fullscreen mode Exit fullscreen mode

Approach with Step Function

Here's the same workflow implemented using Step Functions and CDK.

First, we need to set up all the required Lambda functions. Doesn’t that feel a bit odd? We want to define the workflow, yet we’re forced to create every individual function before we’ve even written the workflow itself.

export class OrderDispatchStepFunctionStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // =========================
    // Setup Lambda Functions
    // =========================

    const saveOrderFunction = new lambda.Function(this, 'SaveOrderFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'save-order.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const checkInventoryFunction = new lambda.Function(this, 'CheckInventoryFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'check-inventory.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const findSimilarItemsFunction = new lambda.Function(this, 'FindSimilarItemsFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'find-similar-items.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const findComplementaryItemsFunction = new lambda.Function(this, 'FindComplementaryItemsFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'find-complementary-items.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const generateEmailFunction = new lambda.Function(this, 'GenerateEmailFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'generate-email.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const sendEmailFunction = new lambda.Function(this, 'SendEmailFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'send-email.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });
Enter fullscreen mode Exit fullscreen mode

I’ve omitted the permission setup here to keep things simple. Just remember that each Lambda function still needs the appropriate IAM permissions to access the required AWS services. And this is again a lot of boilerplate code.

Finally, we can define our tasks. Again, this feels mostly like boilerplate: just a way to wrap each individual Lambda function.


    // =========================
    // Step Function Tasks
    // =========================

    const saveOrderTask = new tasks.LambdaInvoke(this, 'SaveOrder', {
      lambdaFunction: saveOrderFunction,
      outputPath: '$.Payload',
    });

    const checkInventoryTask = new tasks.LambdaInvoke(this, 'CheckInventory', {
      lambdaFunction: checkInventoryFunction,
      outputPath: '$.Payload',
    });

    const sendApprovalNotification = new tasks.SnsPublish(this, 'SendApprovalNotification', {
      topic: approvalTopic,
      message: sfn.TaskInput.fromJsonPathAt('$'),
    });

    const waitForApproval = new sfn.Wait(this, 'WaitForHumanApproval', {
      time: sfn.WaitTime.duration(cdk.Duration.minutes(5)),
    });

    const findSimilarTask = new tasks.LambdaInvoke(this, 'FindSimilarItems', {
      lambdaFunction: findSimilarItemsFunction,
      outputPath: '$.Payload',
    });

    const findComplementaryTask = new tasks.LambdaInvoke(this, 'FindComplementaryItems', {
      lambdaFunction: findComplementaryItemsFunction,
      outputPath: '$.Payload',
    });

    const waitTwoDays = new sfn.Wait(this, 'WaitTwoDays', {
      time: sfn.WaitTime.duration(cdk.Duration.days(2)),
    });

    const generateEmailTask = new tasks.LambdaInvoke(this, 'GenerateEmail', {
      lambdaFunction: generateEmailFunction,
      outputPath: '$.Payload',
    });

    const sendEmailTask = new tasks.LambdaInvoke(this, 'SendEmail', {
      lambdaFunction: sendEmailFunction,
      outputPath: '$.Payload',
    });
Enter fullscreen mode Exit fullscreen mode

Here we introduce a bit of workflow logic, mainly to define how the approval step should be handled.


    // =========================
    // Approval
    // =========================

    const approvalChoice = new sfn.Choice(this, 'ApprovalDecision');

    const rejectedFlow = findSimilarTask.next(
      new sfn.Pass(this, 'OrderRejected', {
        result: sfn.Result.fromObject({ status: 'rejected' }),
        resultPath: '$.orderStatus',
      })
    );

    const confirmedFlow = findComplementaryTask.next(
      new sfn.Pass(this, 'OrderAccepted', {
        result: sfn.Result.fromObject({ status: 'accepted' }),
        resultPath: '$.orderStatus',
      })
    );

    approvalChoice
      .when(sfn.Condition.stringEquals('$.decision', 'confirm'), confirmedFlow)
      .when(sfn.Condition.stringEquals('$.decision', 'discard'), rejectedFlow)
      .otherwise(rejectedFlow);
Enter fullscreen mode Exit fullscreen mode

And now, we bring it all together into the final, straightforward workflow.

    // =========================
    // Final Workflow
    // =========================

    const definition = saveOrderTask
      .next(checkInventoryTask)
      .next(sendApprovalNotification)
      .next(waitForApproval)
      .next(approvalChoice)
      .next(waitTwoDays)
      .next(generateEmailTask)
      .next(sendEmailTask);

    const stateMachine = new sfn.StateMachine(this, 'OrderDispatchStateMachine', {
      definition,
      timeout: cdk.Duration.days(7),
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

The first thing I immediately notice is the amount of boilerplate required just to prepare each Lambda before even starting to think about the workflow itself. And also, this is just the workflow part, you’ll also have the business logic implemented inside the Lambda functions.

I now have a lot of separate Lambda functions to maintain, and this has always been my main concern with Step Functions, the workflow logic still ends up mixed with pure infrastructure code.

I have an almost Shakespearean dilemma that keeps me up at night: is the workflow a matter of architecture or business?

The business logic is spread across multiple files, and while that’s perfectly fine (separation of concerns is still a best practice and you should implement it also when using Lambda durable functions), it makes it much harder to understand the workflow as you lose the ability to see the entire flow at a glance, and understanding it properly often requires a certain level of expertise, at least with ASL.
You can see it clearly in the CDK code, but here you have it mixed up with pure architecture code.

And do you know what happens when Step Functions doesn’t support something I need? I end up writing that logic directly inside the Lambdas.

This happens when integrating new services that aren’t supported by Step Functions, implementing complex data transformation logic, handling advanced catch/retry scenarios beyond what the service offers, or simply when something is difficult to express in ASL but straightforward to implement in code inside a Lambda function.

Basically I create another step with a Lambda to do this work.

In doing so, I lose all the benefits I chose Step Functions for in the first place: separation of concerns, clear workflow visibility, and predictable orchestration, basically everything that made it the right choice to begin with.

Want you see which is the CDK needed for the Durable Function?

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';

export class DurableFunctionStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create the durable function
    const durableFunction = new lambda.Function(this, 'DurableFunction', {
      runtime: lambda.Runtime.NODEJS_22_X,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('lambda'),
      durableConfig: { executionTimeout: Duration.hours(1), retentionPeriod: Duration.days(30) },
    });

    // Create version and alias
    const version = durableFunction.currentVersion;
    const alias = new lambda.Alias(this, 'ProdAlias', {
      aliasName: 'prod',
      version: version,
    });

  }
}
Enter fullscreen mode Exit fullscreen mode

And this is pure architecture: no workflow logic mixed in. It can live alongside other core architecture components, like DynamoDB, S3, IAM permissions, and so on. Here is a full working architecture sample.

import * as cdk from 'aws-cdk-lib';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as logs from 'aws-cdk-lib/aws-logs';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';
import * as path from 'path';

export class OrderDispatchDurableStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // DynamoDB Tables
    const ordersTable = new dynamodb.Table(this, 'OrdersTable', {
      partitionKey: { name: 'orderId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    const inventoryTable = new dynamodb.Table(this, 'InventoryTable', {
      partitionKey: { name: 'itemId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    inventoryTable.addGlobalSecondaryIndex({
      indexName: 'CategoryIndex',
      partitionKey: { name: 'category', type: dynamodb.AttributeType.STRING },
    });

    // Log Group for Durable Function
    const orchestratorLogGroup = new logs.LogGroup(this, 'OrchestratorLogGroup', {
      logGroupName: '/aws/lambda/order-dispatch-orchestrator',
      retention: logs.RetentionDays.ONE_WEEK,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // Main Orchestrator Durable Function
    const orchestratorFunction = new lambda.Function(this, 'OrchestratorFunction', {
      runtime: lambda.Runtime.NODEJS_24_X,
      handler: 'orchestrator.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
      timeout: cdk.Duration.minutes(15),
      logGroup: orchestratorLogGroup,
      durableConfig: {
        executionTimeout: cdk.Duration.days(7),
        retentionPeriod: cdk.Duration.days(7),
      },
      environment: {
        ORDERS_TABLE: ordersTable.tableName,
        INVENTORY_TABLE: inventoryTable.tableName,
        SENDER_EMAIL: process.env.SENDER_EMAIL || 'noreply@example.com',
      },
    });

    ordersTable.grantReadWriteData(orchestratorFunction);
    inventoryTable.grantReadData(orchestratorFunction);

    orchestratorFunction.addToRolePolicy(new iam.PolicyStatement({
      actions: ['bedrock:InvokeModel'],
      resources: ['*'],
    }));

    orchestratorFunction.addToRolePolicy(new iam.PolicyStatement({
      actions: ['ses:SendEmail', 'ses:SendRawEmail'],
      resources: ['*'],
    }));

    // Create version and alias
    const version = orchestratorFunction.currentVersion;
    const alias = new lambda.Alias(this, 'ProdAlias', {
      aliasName: 'prod',
      version: version,
    });

    // Outputs
    new cdk.CfnOutput(this, 'OrchestratorFunctionArn', {
      value: alias.functionArn,
      description: 'Use this qualified ARN to invoke the durable function',
    });
    new cdk.CfnOutput(this, 'OrdersTableName', {
      value: ordersTable.tableName,
    });
    new cdk.CfnOutput(this, 'InventoryTableName', {
      value: inventoryTable.tableName,
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

I love it. This is just architectural code. The actual workflow logic isn’t included here.

🧪 A crucial point in DevEx: testing

Ok, let's try to go deeper. I've written the code for both solutions, both come with trade-offs. Now I should test it before even thinking about deploying.

This is where Lambda durable functions really stands out for me: it feels as straightforward as using Node's test runner, Jest or any other testing framework we’re already familiar with.

Test locally with node on Durable Functions

Having just a single function is a big advantage because there’s no AWS infrastructure involved, and no need to mock Step Functions. You simply write your tests and run npm test.

Let’s start by importing the necessary libraries.

import { LocalDurableTestRunner, WaitingOperationStatus } from '@aws/durable-execution-sdk-js-testing';
import { OperationType, OperationStatus } from '@aws-sdk/client-lambda';
import { handler } from '../orchestrator';
Enter fullscreen mode Exit fullscreen mode

Then we can create the test suite by:

  • initializing the test environment using the setupTestEnvironment function and passing skipTime: true in Jest’s beforeAll hook.
  • tearing down the test environment using teardownTestEnvironment in Jest’s afterAll hook.
describe('Order Dispatch Durable Function', () => {
  beforeAll(async () => {
    await LocalDurableTestRunner.setupTestEnvironment({ skipTime: true });
  });

  afterAll(async () => {
    await LocalDurableTestRunner.teardownTestEnvironment();
  });
Enter fullscreen mode Exit fullscreen mode

We are now ready to initialize our test. In this case, the scope is completing the workflow with confirmation. Let’s define the runner and connect it to the imported handler.


  it('should execute complete workflow with approval', async () => {
    const runner = new LocalDurableTestRunner({
      handlerFunction: handler,
    });
Enter fullscreen mode Exit fullscreen mode

Next, we define our orderEvent (i.e., the incoming order).

    const orderEvent = {
      buyerEmail: 'customer@example.com',
      items: [{ itemId: 'ITM-1', quantity: 2 }],
    };
Enter fullscreen mode Exit fullscreen mode

We can now start the execution on the runner, passing our orderEvent, and wait for it to complete.

    // Start execution (will pause at callback)
    const executionPromise = runner.run({ payload: orderEvent });
Enter fullscreen mode Exit fullscreen mode

Since we have a human in the loop to simulate, we can use runner.getOperation to get the callback operation and wait until it is STARTED. Then we submit our decision with sendCallbackSuccess and wait for it to be COMPLETED.

    // Get callback operation and wait for it to be ready
    const callbackOp = runner.getOperation('wait-for-approval');
    await callbackOp.waitForData(WaitingOperationStatus.STARTED);

    // Send approval callback
    await callbackOp.sendCallbackSuccess(JSON.stringify({ 'decision': 'confirm' }));

    await callbackOp.waitForData(WaitingOperationStatus.COMPLETED);
Enter fullscreen mode Exit fullscreen mode

Finally, we wait for the execution to finish and verify the expected outcome (in this case, that the order is confirmed).

    // Wait for execution to complete
    const execution = await executionPromise;

    // Verify execution succeeded
    expect(execution.getStatus()).toBe('SUCCEEDED');

    const result = execution.getResult();
    expect(result.orderId).toMatch(/^ORD-/);
    expect(result.status).toBe('confirm');
Enter fullscreen mode Exit fullscreen mode

To complete our test, we can also verify that the other steps executed successfully using runner.getOperation.

    // Verify operations executed
    const saveOrder = runner.getOperation('save-order');
    expect(saveOrder.getType()).toBe(OperationType.STEP);
    expect(saveOrder.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const checkInventory = runner.getOperation('check-inventory');
    expect(checkInventory.getType()).toBe(OperationType.STEP);
    expect(checkInventory.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const waitTwoDays = runner.getOperation('wait-two-days');
    expect(waitTwoDays.getType()).toBe(OperationType.WAIT);
    expect(waitTwoDays.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const generateEmail = runner.getOperation('generate-email');
    expect(generateEmail.getType()).toBe(OperationType.STEP);
    expect(generateEmail.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const sendEmail = runner.getOperation('send-email');
    expect(sendEmail.getType()).toBe(OperationType.STEP);
    expect(sendEmail.getStatus()).toBe(OperationStatus.SUCCEEDED);
  });
});

Enter fullscreen mode Exit fullscreen mode

In the end, writing and testing a durable workflow like this is surprisingly simple and enjoyable. With just a single function and the local test runner, you don’t have to deal with complex AWS infrastructure or mocking, and the code remains clear and easy to follow. It’s genuinely satisfying to see the entire workflow execute and verify each step with minimal setup.

Local testing has always been challenging for Step Functions

Instead, we have a few options when it comes to testing something implemented with Step Functions.

Remotely, by deploying to AWS and testing against real infrastructure.
Locally, using frameworks or tools that simulate Step Functions.
Or Unit testing, by testing each Lambda function individually.

However, even with these approaches, we’re still missing proper end-to-end testing of the entire workflow, which is often the most critical part to validate.

# Option 1: Deploy to AWS and test remotely (slow, costs money)
aws stepfunctions start-execution --state-machine-arn arn:aws:...

# Option 2: Use Step Functions Local (limited, requires Docker)
docker run -p 8083:8083 amazon/aws-stepfunctions-local
# Still need to mock all Lambda functions
# Still need to mock DynamoDB, SNS, SES...

# Option 3: Unit test each Lambda separately
# But you can't test the workflow orchestration!
Enter fullscreen mode Exit fullscreen mode

All these options make rapid iteration much harder. Only the first approach truly gives me confidence that I’ve tested the workflow end-to-end. But this forces me to mentally switch between deployment and testing, which breaks flow and slows down development. For me, that kind of friction is a productivity killer.

There is a clear winner for me here, and it's not Step Functions, while testing continues to improve thanks to AWS folks.

šŸ’» What bothers devs: after deploy ops.

Both the business, and sometimes we developers as well, tend to underestimate the importance of day-to-day operations after the first deployment. Production environments involve change requests, debugging, fixes, and monitoring.

Here’s what the daily development workflow looks like with each tool in this phase.

Durable Functions

What's going on if I receive a change request?

  1. Edit my function
  2. Write/update test
  3. Run npm test (1 second to iterate)
  4. Deploy with cdk deploy (very short time as this is just one function drift to be released)
  5. Only now invoke the function endpoint to be sure everything is ok.

What will I do if I need to debug anything?

  1. Look in console (or get via CLI) CloudWatch logs in a single log group: anyone using CloudWatch should know the nightmare of navigating multiple logs groups. Also, the Lambda durable functions console surfaces logs and execution details without having to jump to CloudWatch, letting you focus on your single Lambda.
  2. See complete execution flow in one place
  3. Replay the execution locally with tests and see what's broken
  4. Fix the bug, run test, deploy.

What if I need to onboard another developer, regardless of seniority?

  1. Share the single function and walk the developer through the code
  2. They will probably understand it quickly, since it’s just a single function.
  3. Hopefully they can contribute within hours, test it locally, give me a PR which contains the "full micro-service workflow" logic without having to change a line of architecture.

Moreover, now that we live in the era of AI coding assistants, AI powered IDEs, and autonomous agents for coding, it has never been easier to onboard new developers. Providing them with precise, focused context around a single Lambda function is undoubtedly one of the most effective ways to get them productive in a very short time.

I wouldn’t be surprised if we soon see dedicated Kiro Power-Ups and SOPs for Durable Functions in the AWS MCP ecosystem.

Step Functions

Let's see the similar dev scenarios with Step Functions

What's going on if I need to implement a change request?

  1. Modify state machine definition in CDK with ASL language
  2. Modify one or multiple Lambda functions
  3. No quick way to test locally (or should have a setup to do it, and so your teammates)
  4. Deploy with cdk deploy (2-3 minutes as the drift would be much more than a single Lambda)
  5. Go and test manually in AWS Console to be sure of the implementation, but get back to the code if anything isn't right (oh my..)

And if I catch an error and I should debug it?

  1. Again open Step Functions execution in AWS Console
  2. Click through each state to see input/output
  3. Open CloudWatch logs for relevant Lambda
  4. Correlate timestamps across services
  5. Maybe use X-Ray for tracing
  6. Fix the bug, redeploy, restart again.

Don't get me wrong, that’s perfectly fine. I genuinely like Step Functions because, despite those "velocity" trade-offs in DevEx, they enforce a proper orchestration of distributed systems. They also provide a clear visualization of the workflow in the console, make it easy to catch errors at the failing state, and help you understand what’s happening inside complex orchestrations, ultimately simplifying what would otherwise be a very intricate system.

But what if I need to onboard someone who isn’t an AWS expert and isn’t very familiar with Step Functions or workflow architecture in general?

I’d have to:

  1. Introduce Amazon States Language
  2. Explain each Lambda functions
  3. Walk through the CDK stack

This means that a developer would typically become productive only after a few days, and it really depends on their seniority and prior experience with AWS. Trust me, this can easily become a waste of time and a nightmare, both from the mentor’s perspective and the learner’s.

From a DevEx perspective, Lambda durable functions are a major step forward.

šŸ¤” So when do Step Functions still make sense?

However, Lambda durable functions won’t always be the right answer.
Step Functions has genuine advantages in two main cases.

Visual workflows matter

One of the biggest advantages of Step Functions is that stakeholders can see the workflow visually: this visual representation is not just a ā€œnice-to-have.ā€ It’s crucial for stakeholder demos, where non-technical team members can quickly understand the workflow and see how processes progress.

It also simplifies compliance reviews as auditors can trace exactly what happens at each step without digging through code.

What about operations monitoring? DevOps and support teams can spot failures or bottlenecks immediately, understand dependencies between steps, and react faster (sometimes without having access to the code itself).

In short, having a clear, visual workflow turns complex orchestration into something everyone can comprehend, communicate about, and trust.

Native AWS Service Integration

Step Functions has native integrations with 200+ AWS services:

// Directly invoke services without Lambda
new tasks.DynamoPutItem(this, 'SaveOrder', {
  table: ordersTable,
  item: { ... }
});

new tasks.SqsSendMessage(this, 'QueueOrder', {
  queue: orderQueue,
  messageBody: sfn.TaskInput.fromObject({ ... })
});

new tasks.EcsRunTask(this, 'ProcessOrder', {
  cluster: ecsCluster,
  taskDefinition: orderProcessor
});
Enter fullscreen mode Exit fullscreen mode

Using CDK you have a lot of options for simple task basically for each AWS Service.

For AWS service orchestrations, this is actually pretty clean.
While if you want to implement something directly in code with a Lambda durable function, obviously you need to use the SDK, and that logic becomes part of your business layer.

🚨 So when should we prefer Lambda durable functions?

First, when you need a code-first philosophy based on widely used languages such as TypeScript or Python.
You write orchestration in the same language as your business logic. No need to learn a domain-specific language.

Also, if you want a simple local development and testing option.
For the first time, you can test complex workflows locally without AWS infrastructure.

There’s no need to learn Amazon States Language (ASL) and what used to feel awkward and complex is now trivial: you can define, modify, and visualize workflows in the code, without diving into verbose JSON or mastering intricate patterns.

As for an example of complex nested workflows:

const results = await context.runInChildContext('parent', async (parent) => {
  const child1 = parent.runInChildContext('child1', async (c1) => {
    const grandchild = c1.runInChildContext('grandchild', async (gc) => {
      // Deeply nested orchestration - easy!
    });
    return grandchild;
  });
  return child1;
});
Enter fullscreen mode Exit fullscreen mode

What about parallel executions?
As simple as using a map

// Process N items in parallel (N determined at runtime)
const items = [1, 2, 3, 4, 5];

const results = await context.map(
  'process-items',
  items,
  async (ctx, item, index) => {
    return await ctx.step(`process-${index}`, async () => 
      processItem(item)
    );
  },
  {
    maxConcurrency: 3,
    completionConfig: {
      minSuccessful: 4,
      toleratedFailureCount: 1
    }
  }
);

results.throwIfError();
const allResults = results.getResults();
Enter fullscreen mode Exit fullscreen mode

Or saga patterns for distributed transactions:

const compensations = [];

try {
  const payment = await context.step('charge-payment', chargeCustomer);
  compensations.push(() => refundCustomer(payment));

  const inventory = await context.step('reserve-inventory', reserveItems);
  compensations.push(() => releaseItems(inventory));

  const shipment = await context.step('create-shipment', shipOrder);
  // All succeeded!
} catch (error) {
  // Automatically compensate in reverse order
  for (const compensate of compensations.reverse()) {
    await context.step('compensate', compensate);
  }
  throw error;
}
Enter fullscreen mode Exit fullscreen mode

Remember you will be an early adopter!

Being on the cutting edge means the community is smaller and there are fewer examples available, but the ecosystem is growing rapidly. Documentation is still maturing, and you may encounter some rough edges along the way.

On the other hand: AWS is actively improving it, you have the opportunity to adopt modern patterns early, you will get skills that will become more valuable over time.

šŸŽÆ My recommendation and final thoughts

For new projects, really consider Lambda durable functions. It's not just hype for a new pattern: the developer experience and local testing capabilities are significant advantages.

Existing Step Functions? No need to rush a migration. Try Lambda durable functions for your next new workflow and compare the experience.

Do you really have to choose between them?
The short answer is no, and you shouldn't.
You can use both solutions depending on your use case and also use a Lambda durable function in a wider orchestration built with Step Functions. This is a great pattern for creating ā€˜leaf’ workflows focused on a specific concern, decoupled from others. You can "enforce" architectural decoupling when needed and benefit from a single Lambda’s advantages when convenient.

Let’s be clear: both tools work.
But they reflect different eras of serverless thinking:

  • Step Functions (2016): It's a safe, proven choice. Visual workflows, mature ecosystem, smooth integration with AWS ecosystem, battle-tested. It is still a very good choice for mature and ops teams.

  • Durable Functions (2025): Code-first, local testing, modern patterns. Good for devs, new workflows, specific ones to integrate in a wider orchestration and time-to-market.

After spending weeks working with both, I can say that, to me, Durable Functions feels like where serverless orchestration should've been all along.

Resources

Also don't miss this awesome presentation video by Michael Gasch and Eric Johnson at the latest re:Invent in Dec 2025.

If you'd like a walkthrough of this excellent presentation, you can find one on re:Post or an autogenerated here on dev.to.

šŸ™‹ Who am I

I'm D. De Sio and I work as a Head of Software Engineering in Eleva.
As of Feb 2026, I’m an AWS Certified Solution Architect Professional and AWS Certified DevOps Engineer Professional, but also a User Group Leader (in Pavia), an AWS Community Builder and, last but not least, a #serverless enthusiast.

For those of you who’ve made it this far, I’m not exactly the person in the image (and for those who know me, just not in this photo!) but it’s always fun and interesting to see how GenAI imagines you.

Top comments (0)