DEV Community

Cover image for Hands-On with AWS Lambda Durable Functions & Callback βš‘β³πŸš€ - (Let's Build πŸ—οΈ Series)
awedis for AWS Heroes

Posted on

Hands-On with AWS Lambda Durable Functions & Callback βš‘β³πŸš€ - (Let's Build πŸ—οΈ Series)

AWS made many new announcements at re:Invent 2025, and one of the highlights is AWS Lambda Durable Functions: New Lambda capability lets you build applications that coordinate multiple steps reliably over extended periodsβ€”from seconds to up to one yearβ€”without paying for idle compute time when waiting for external events or human decisions.

People who know me are aware that I’m a strong advocate of Step Functions. I love decoupling and building event-driven architectures, so when AWS announced Lambda Durable Functions, I got excited and wanted to try it out 😊

In this article, we’ll create a system to showcase how Durable Functions handle callbacks in real-world workflows.

The main parts of this article:
1- 🧩 Basic concepts
2- πŸ—οΈ Understanding Durable Function Code
3- πŸ§ͺ Human-in-the-loop Approvals (Full Example)
4- πŸ“Š Monitoring


🧩 Basic concepts

When creating a new AWS Lambda function, you’ll now see the option to enable Durable Execution.

Now, let’s select Python 3.14 and create our new function. Once it’s created, you’ll see the default code generated by AWS.

from aws_durable_execution_sdk_python.config import Duration
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution

@durable_step
def my_step(step_context: StepContext, my_arg: int) -> str:
    step_context.logger.info("Hello from my_step")
    return f"from my_step: {my_arg}"

@durable_execution
def lambda_handler(event, context) -> dict:
    msg: str = context.step(my_step(123))

    context.wait(Duration.from_seconds(10))

    context.logger.info("Waited for 10 seconds without consuming CPU.")

    return {
        "statusCode": 200,
        "body": msg,
    }
Enter fullscreen mode Exit fullscreen mode

Once you’ve created the Durable Lambda function, you’ll notice a new tab labeled β€œDurable Executions.”


πŸ—οΈ Understanding Durable Function Code

The withDurableExecution wrapper
Your durable function is wrapped with withDurableExecution. This wrapper enables durable execution by providing the DurableContext object and managing checkpoint operations.

The DurableContext object
Instead of the standard Lambda context, your function receives a DurableContext. This object provides methods for durable operations like step() and wait() that create checkpoints.

Steps and checkpoints
Each context.step() call creates a checkpoint before and after execution. If your function is interrupted, it resumes from the last completed checkpoint. The function doesn't re-execute completed steps. It uses their stored results instead.

Wait operations
The context.wait() call pauses execution without consuming compute resources. When the wait completes, Lambda invokes your function again and replays the checkpoint log, substituting stored values for completed steps.

Replay mechanism
When your function resumes after a wait or interruption, Lambda runs your code from the beginning. However, completed steps don't re-execute. Lambda replays their results from the checkpoint log. This is why your code must be deterministic.


πŸ§ͺ Human-in-the-loop Approvals (Full Example)

How the flow works:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Event / API / Trigger   β”‚
β”‚  (API Gateway, S3, Cron) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Durable Lambda Function β”‚
β”‚  (Orchestrator)          β”‚
β”‚                          β”‚
β”‚  1. Start execution      β”‚
β”‚  2. Do initial steps     β”‚
β”‚  3. Create callback ID   β”‚
β”‚  4. Send human request   β”‚
β”‚     (email, UI, system)  β”‚
β”‚  5. WAIT (paused)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β”‚  Callback ID
              β”‚
              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Human Interaction Layer β”‚
β”‚                          β”‚
β”‚  β€’ Web UI                β”‚
β”‚  β€’ Internal tool         β”‚
β”‚  β€’ Approval system       β”‚
β”‚                          β”‚
β”‚  User approves / rejects β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β”‚ API call with:
              β”‚ - callbackId
              β”‚ - decision
              β”‚
              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  API Gateway             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Callback Lambda         β”‚
β”‚                          β”‚
β”‚  β€’ Validates request     β”‚
β”‚  β€’ Calls                 β”‚
β”‚    send_durable_executionβ”‚
β”‚    _callback_success     β”‚
β”‚    or failure            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Durable Lambda Function β”‚
β”‚  (Resumed)               β”‚
β”‚                          β”‚
β”‚  β€’ Receives callback     β”‚
β”‚  β€’ Continues workflow    β”‚
β”‚  β€’ Approve / Reject path β”‚
β”‚  β€’ Finish execution      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Now, let’s start adding our callback logic, and wait mechanism. The Durable Lambda pauses execution, emits a callback ID, and resumes only when an external systemβ€”or a humanβ€”explicitly signals completion. This pattern enables human approvals, long-running workflows, and event-driven orchestration without polling or managing state machines.

from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.config import CallbackConfig, Duration
from time import sleep

@durable_step
def _send_approval_request(
    step_context: StepContext,
    documentId: str,
    reviewers: str,
    callbackId: str
) -> str:
    # send API call to third party service
    # notify/validate human interaction here

    return f"_send_approval_request: {documentId} {reviewers} {callbackId}"

@durable_step
def _step(step_context: StepContext, my_arg: int) -> str:
    step_context.logger.info("Hey there from _step")
    sleep(5)
    return f"_step: {my_arg}"

@durable_execution
def lambda_handler(event, context: DurableContext) -> dict:
    document_id = event['documentId']
    reviewers = event['reviewers']

    context.step(lambda _: _step(1)(_), name="prepare-document")

    callback = context.create_callback(
        name="approval_callback",
        config=CallbackConfig(
            timeout=Duration.from_hours(24),  # Maximum wait time
            heartbeat_timeout=Duration.from_hours(2),  # Fail if no heartbeat for 2 hours
        ),
    )

    callback_id = callback.callback_id
    context.step(lambda _: _send_approval_request(document_id, reviewers, callback_id)(_), name="send-document")
    print('callback_id -> ', callback_id)

    # Wait for result - execution suspends here
    result = callback.result()
    print('result -> ', result)

    if result == 'approve':
        return {
            'status': 'approve',
            'documentId': document_id
        }

    return {
        'status': 'rejected',
        'documentId': document_id,
    }
Enter fullscreen mode Exit fullscreen mode

πŸ“‹ Note: It’s important to publish new versions of your Durable Functions to be able to test them.


Now, let’s create the Lambda function that will call send_durable_execution_callback_success to resume the Durable Lambda execution.

# Message processor side (separate Lambda or service)
import boto3
import json
from botocore.exceptions import ClientError
print(boto3.__version__)

client = boto3.client('lambda')

def lambda_handler(event, context):
    callback_id = event["callback_id"]
    status = event["status"]

    try:
        client.send_durable_execution_callback_success(
            CallbackId=callback_id,
            Result=status
        )
    except ClientError as e:
        # Notify failure
        return {
            "statusCode": 500,
            "body": json.dumps({
                "error": str(e)
            })
        }
    return {
        "statusCode": 200,
        "message": "Callback Sent"
    }
Enter fullscreen mode Exit fullscreen mode

πŸ“‹ Note: In my case, I needed to install the latest version of boto3 because the default version bundled with Lambda did not include the newer Lambda Durable Functions APIs. This is how my requirements.txt file looks.

boto3==1.42.13
botocore==1.42.13
Enter fullscreen mode Exit fullscreen mode

Once installed, make sure to upload the artifacts to an S3 bucket and reference them from a Lambda layer attached to this function.

I also attached the following permissions to my Lambda function.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:SendDurableExecutionCallbackSuccess",
                "lambda:SendDurableExecutionCallbackFailure"
            ],
            "Resource": "*"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

πŸ“‹ Note: This can be any AWS service (EC2, ECS, etc.). In my case, I’m using a Lambda function, since it can be easily triggered from API Gateway or even tested directly from the AWS console by running a test invocation.

Result:

To test my Durable Function, I need to provide the following input:

{
  "documentId": "a1234",
  "reviewers": "awedis"
}
Enter fullscreen mode Exit fullscreen mode

Perfect! You can see that our Lambda function has executed. By clicking on β€œDurable Execution Details,” you can even view the individual steps that were run.

Now, let’s copy the Callback ID generated by our Durable Function and pass it to the Notify Lambda function.

{
  "callback_id": "<YOUR_CALLBACK_ID>",
  "status": "approve"
}
Enter fullscreen mode Exit fullscreen mode

Once executed, you can return to the Durable Execution page, where you should see that the Durable Lambda function has resumed and successfully completed the approval_callback.

Lambda Callback Responder

Main Durable Function


Why this example is powerful?

  • Human-in-the-loop workflows
  • Long-running processes (hours/days)
  • No Step Functions needed
  • No polling
  • No state machines
  • Fully serverless

πŸ“Š Monitoring

CloudWatch Metrcis

ApproximateRunningDurableExecutions / Number of durable executions in the RUNNING state

ApproximateRunningDurableExecutionsUtilization / Percentage of your account's maximum running durable executions quota currently in use

DurableExecutionDuration / Elapsed wall-clock time in milliseconds that a durable execution remained in the RUNNING state

DurableExecutionStarted / Number of durable executions that started

DurableExecutionStopped / Number of durable executions stopped using the StopDurableExecution API

DurableExecutionSucceeded / Number of durable executions that completed successfully

DurableExecutionFailed / Number of durable executions that completed with a failure

DurableExecutionTimedOut / Number of durable executions that exceeded their configured execution timeout

DurableExecutionOperations / Cumulative number of operations performed within a durable execution (max: 3,000)

DurableExecutionStorageWrittenBytes / Cumulative amount of data in bytes persisted by a durable execution (max: 100 MB)

πŸ“ Conclusion

In my opinion, AWS Lambda Durable Functions are extremely useful and can be applied to many complex serverless, event-driven architectures. When combined with services like SQS, SNS, or AWS Step Functions, they enable you to build powerful, scalable, and elegant workflows.

Happy coding πŸ‘¨πŸ»β€πŸ’»

Github Source Code: https://github.com/awedis/aws-lambda-durable-functions-callback

πŸ’‘ Enjoyed this? Let’s connect and geek out some more on LinkedIn.

Top comments (0)