DEV Community

Cover image for Migrating a Lambda to a Durable Lambda Function
iAmSherif 💎
iAmSherif 💎

Posted on

Migrating a Lambda to a Durable Lambda Function

I migrated an AWS Lambda function that simulates a patient health insurance verification workflow into a Durable Lambda execution. The function verifies a patient’s insurance and proceeds to schedule a health-check appointment only when the verification succeeds.

At this point, you probably already understand what AWS Lambda is. This article focuses specifically on what changes when you migrate a Lambda function to a Durable Lambda, why those changes are necessary, and the errors you are likely to encounter if anything is misconfigured.

Prerequisite:
You should already understand how AWS Lambda works.

Before discussing durability, it’s important to understand the workflow itself.

Workflow illustration

In this flow, we can see that the function goes through a series of sequential steps to complete its work:

  1. Validate Patient Information - Checks if patient ID and name are provided
  2. Check Insurance Database - Simulates lookup in insurance database
  3. Verify Insurance Coverage - Gets coverage details and limits
  4. Authorize Health Check Procedures - Determines which procedures are covered
  5. Schedule Health Checks - Creates appointment schedule for authorized procedures.

Each step depends on the successful completion of the previous one. This dependency chain is what makes durability valuable.

Before we dive deep, let's talk a bit about Durable Lambda Function.

What is a Durable Lambda Function?

A Durable Lambda function is still a Lambda function running inside the Lambda runtime. What changes is how execution is managed.

A durable function can run for up to one year, maintaining progress across failures, restarts, and retries, without incurring wait-time charges. This is achieved through a mechanism called a durable execution.

A durable execution uses checkpoints to track progress. If a failure occurs, the function is replayed from the beginning, but completed steps are skipped and their results reused.

Durable Lambda exposes special operations, most notably step and wait.

  • step executes business logic with built-in progress tracking and replay support.

  • wait pauses execution without consuming compute, which is useful for long-running or human-in-the-loop workflows.

Lambda vs Durable Lambda?

This is not a replacement. Durable Lambda still runs on Lambda.
What changes is the execution model. Durable Lambda improves:

  • Visibility into execution flow

  • Recovery from failures

  • Retry behavior

  • Developer experience when working with long or multi-step workflows

You are not switching platforms. You are adopting a stricter execution discipline.

Now that we've had a brief understanding of Durable Lambda function, let's dive into migrating our lambda function to durable lambda.

What the Lambda Function Looked Like

Before durability was introduced, the function followed a familiar sequential pattern. Everything happened inside a single invocation, and the only “state” was whatever lived in memory during execution.

Conceptually, the Lambda looked like this:

exports.handler = async (event) => {
    const body = typeof event.body === 'string' ?JSON.parse(event.body) : event.body;
    const patientId = body.patientId;
    const patientName = body.patientName;

    try {
        // Validate patient information
        if (!patientId || !patientName) {
            throw new Error('Invalid patient information');
        }

        // Check insurance
        const hasInsurance = await checkInsuranceStatus(patientId);

        if (!hasInsurance) {
            return {
                statusCode: 200,
                body: JSON.stringify({
                    patientId,
                    patientName,
                    status: 'REJECTED',
                    message: 'Patient does not have valid health insurance.'
                })
            };
        }

        // Verify coverage
        const coverage = await getInsuranceCoverage(patientId);

        // Authorize procedures
        const authorizedProcedures = await authorizeHealthChecks(coverage);

        // Schedule health checks
        const scheduledAppointments = await scheduleHealthChecks(
            patientId,
            authorizedProcedures
        );

        return {
            statusCode: 200,
            body: JSON.stringify({
                patientId,
                patientName,
                status: 'APPROVED',
                insuranceCoverage: coverage,
                authorizedProcedures,
                scheduledAppointments
            })
        };

    } catch (error) {
        return {
            statusCode: 500,
            body: JSON.stringify({
                patientId,
                patientName,
                status: 'ERROR',
                message: error.message
            })
        };
    }
};
Enter fullscreen mode Exit fullscreen mode

This code is readable and easy to reason about, but it hides a critical weakness.

Every step is tightly coupled to the same execution. If the function times out, crashes, or is retried, everything restarts from the beginning. External systems are called again, and partial progress is lost.

This is the exact problem Durable Lambda solves.

Introducing Durable Execution

Below is the Durable Lambda version of the same function:

exports.handler = withDurableExecution(
    async (event, context) => {
        context.configureLogger({ customLogger, modeAware: false })
        const body = typeof event.body === 'string' ? JSON.parse(event.body) : event.body;
        const patientId = body.patientId;
        const patientName = body.patientName;

        try {
            context.logger.info('Step 1: Validating patient information', { patientId, patientName });

            if (!patientId || !patientName) {
                throw new Error('Invalid patient information');
            }

            const hasInsurance = await context.step('validate-patientid', async (stepContext) => {
                stepContext.logger.info('Validating user claims')
                return await checkInsuranceStatus(patientId);
            })

            if (!hasInsurance) {
                return {
                    statusCode: 200,
                    body: JSON.stringify({
                        patientId,
                        patientName,
                        status: 'REJECTED',
                        message: 'Patient does not have valid health insurance.'
                    })
                };
            }

            const coverage = await context.step('verify-insurance', async (stepContext) => {
                stepContext.logger.info('Verify insurance coverage')
                return await getInsuranceCoverage(patientId);
            })

            const authorizedProcedures = await context.step(
                'authorize-health-checks',
                async (stepContext) => {
                    stepContext.logger.info('Authorize health check procedures')
                    return await authorizeHealthChecks(coverage);
                }
            )

            const scheduledAppointments = await context.step(
                'schedule-health-checks',
                async (stepContext) => {
                    stepContext.logger.info('Schedule health checks')
                    return await scheduleHealthChecks(patientId, authorizedProcedures);
                }
            )

            return {
                statusCode: 200,
                body: JSON.stringify({
                    patientId,
                    patientName,
                    status: 'APPROVED',
                    insuranceCoverage: coverage,
                    authorizedProcedures,
                    scheduledAppointments
                })
            };

        } catch (error) {
            return {
                statusCode: 500,
                body: JSON.stringify({
                    patientId,
                    patientName,
                    status: 'ERROR',
                    message: error.message
                })
            };
        }
    }
);
Enter fullscreen mode Exit fullscreen mode

At first glance, the business logic looks almost identical. That similarity is intentional. Durable Lambda is designed so you do not have to rewrite your domain logic from scratch.

The real change is where durability is introduced.

What Actually Changed?

The most important difference is the withDurableExecution wrapper. This single addition changes the execution model of the function. Instead of treating the handler as a one-shot computation, it turns it into a resumable workflow. The function can now pause, fail, retry, and resume without losing progress.

The second major change is the introduction of context.step from DurableContext. In the Lambda, each async call simply executes and returns. In the durable version, every critical operation is wrapped in a named step. That name is not cosmetic. It becomes the durable checkpoint that allows the runtime to remember, “this work has already been completed.”

For example, insurance validation used to be just this:

await checkInsuranceStatus(patientId);
Enter fullscreen mode Exit fullscreen mode

In the durable version, it becomes:

await context.step('validate-patientid', async () => {
    return await checkInsuranceStatus(patientId);
});
Enter fullscreen mode Exit fullscreen mode

This single change prevents duplicate calls when retries happen. If the function crashes after this step succeeds, Durable Lambda does not re-run it. It replays the stored result and moves forward.

Logging also changes subtly but importantly. Instead of relying on global logging assumptions, logging is now tied to execution context and step context. This is why logging behavior felt different after migration. The function is no longer guaranteed to run once from top to bottom; parts of it may be replayed.

THAT'S ALL FOR THE FUNCTION CODE REFACTORING

Infrastructure Configuration Matters

Durable Lambda is not only about code. It requires explicit infrastructure configuration.
Your lambda function configuration might look like this:

const patientHealthCheckFunction = new Function(this, 'PatientHealthCheckFunction', {
      runtime: Runtime.NODEJS_20_X,
      handler: 'index.handler',
      code: Code.fromAsset('lambda-package'),
      timeout: Duration.seconds(30),
      environment: {
        NODE_ENV: 'production',
        LOG_LEVEL: 'INFO',
      }
    });
Enter fullscreen mode Exit fullscreen mode

In the above configuration, the function uses Lambda's NodeJS 20 runtime, identifies the handler, navigates to the function, set timeout limit, and attaches some env variables that would be used by the function code.
When migrating to Durable Lambda, you must:

  • Use a supported runtime (Node.js 22 or 24)

  • Assign a role that allows checkpointing

  • Configure durableConfig

  • Use versions and aliases

Misconfigurations lead directly to runtime errors.

I will walk you through actual errors that occurs when your durable function isn't configured properly and also we going to discuss each of their fixes.

Error #1: Durable Execution Would Not Start

The first failure happened immediately after deployment:

Unexpected payload provided to start the durable execution. Check your resource configurations to confirm the durability is set.

This error had nothing to do with business logic. Durable Lambda requires its configuration to be explicitly enabled in infrastructure. Without that configuration, the runtime treats the function like a normal Lambda and rejects the durable execution payload.

durableConfig: { executionTimeout: Duration.hours(1), retentionPeriod: Duration.days(30) }
Enter fullscreen mode Exit fullscreen mode

Once configured with an execution timeout and retention period, the function was able to start durable execution correctly.

This was the first signal that Durable Lambda is not just a code change; it is an execution contract enforced at the infrastructure level.

Error #2: Synchronous Invocation Still Has Limits

After enabling durability, the next error appeared:

You cannot synchronously invoke a durable function with an executionTimeout greater than 15 minutes.

When invoked synchronously, a durable function must still complete within 15 minutes. Reducing the execution timeout resolved the issue.

durableConfig: { executionTimeout: Duration.minutes(10), retentionPeriod: Duration.days(30) }
Enter fullscreen mode Exit fullscreen mode

This reinforced an important point: Durable Lambda adds persistence, not infinite runtime.

Error #3: Permission Error

The next failure was caused by permission error:

Failed to checkpoint durable execution. User is not authorized to perform lambda:CheckpointDurableExecution on resource arn:aws:… because no identity-based policy allows the lambda:CheckpointDurableExecution action

Durable Lambda requires explicit IAM permissions to store and retrieve execution state.

    lambdaRole.addToPolicy(
      new iam.PolicyStatement({
        actions: [
          'lambda:CheckpointDurableExecution',
          'lambda:GetDurableExecutionState',
          'lambda:InvokeFunction',
        ],
        resources: [
          `arn:aws:lambda:${this.region}:${this.account}:function:PatientHealthCheckStack*`,
        ],
      })
    );
Enter fullscreen mode Exit fullscreen mode

Without these permissions, durability silently fails.

What changed after migration

Area Before After Durable Lambda
State handling Stateless Persistent
Retries Restart entire flow Resume from failure
Invocation rules Flexible Strict but safer
Versioning Optional Required
Observability Simple Needs discipline

Conclusion

Migrating to Durable Lambda did not make the system simpler; it made it more explicit.

Failures stopped being destructive. Retries stopped being accidental. Progress became something the system could reason about instead of guess. The workflow shifted from “hope this finishes” to “this will eventually complete correctly.”

Durable Lambda is not something you reach for by default. But when your Lambda function starts behaving like a workflow—multi-step, failure-prone, and externally dependent, durability stops being optional.

At that point, it becomes a design decision about correctness.

And in systems like healthcare workflows, correctness is the only acceptable outcome.

——————————————

For more articles, follow my social handles:

Top comments (0)