AWS made many new announcements at re:Invent 2025, and one of the highlights is AWS Lambda Durable Functions: New Lambda capability lets you build applications that coordinate multiple steps reliably over extended periodsβfrom seconds to up to one yearβwithout paying for idle compute time when waiting for external events or human decisions.
People who know me are aware that Iβm a strong advocate of Step Functions. I love decoupling and building event-driven architectures, so when AWS announced Lambda Durable Functions, I got excited and wanted to try it out π
In this article, weβll create a system to showcase how Durable Functions handle callbacks in real-world workflows.
The main parts of this article:
1- π§© Basic concepts
2- ποΈ Understanding Durable Function Code
3- π§ͺ Human-in-the-loop Approvals (Full Example)
4- π Monitoring
π§© Basic concepts
When creating a new AWS Lambda function, youβll now see the option to enable Durable Execution.
Now, letβs select Python 3.14 and create our new function. Once itβs created, youβll see the default code generated by AWS.
from aws_durable_execution_sdk_python.config import Duration
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
@durable_step
def my_step(step_context: StepContext, my_arg: int) -> str:
step_context.logger.info("Hello from my_step")
return f"from my_step: {my_arg}"
@durable_execution
def lambda_handler(event, context) -> dict:
msg: str = context.step(my_step(123))
context.wait(Duration.from_seconds(10))
context.logger.info("Waited for 10 seconds without consuming CPU.")
return {
"statusCode": 200,
"body": msg,
}
Once youβve created the Durable Lambda function, youβll notice a new tab labeled βDurable Executions.β
ποΈ Understanding Durable Function Code
The withDurableExecution wrapper
Your durable function is wrapped with withDurableExecution. This wrapper enables durable execution by providing the DurableContext object and managing checkpoint operations.
The DurableContext object
Instead of the standard Lambda context, your function receives a DurableContext. This object provides methods for durable operations like step() and wait() that create checkpoints.
Steps and checkpoints
Each context.step() call creates a checkpoint before and after execution. If your function is interrupted, it resumes from the last completed checkpoint. The function doesn't re-execute completed steps. It uses their stored results instead.
Wait operations
The context.wait() call pauses execution without consuming compute resources. When the wait completes, Lambda invokes your function again and replays the checkpoint log, substituting stored values for completed steps.
Replay mechanism
When your function resumes after a wait or interruption, Lambda runs your code from the beginning. However, completed steps don't re-execute. Lambda replays their results from the checkpoint log. This is why your code must be deterministic.
π§ͺ Human-in-the-loop Approvals (Full Example)
How the flow works:
ββββββββββββββββββββββββββββ
β Event / API / Trigger β
β (API Gateway, S3, Cron) β
βββββββββββββββ¬βββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββ
β Durable Lambda Function β
β (Orchestrator) β
β β
β 1. Start execution β
β 2. Do initial steps β
β 3. Create callback ID β
β 4. Send human request β
β (email, UI, system) β
β 5. WAIT (paused) β
βββββββββββββββ¬βββββββββββββ
β
β Callback ID
β
βΌ
ββββββββββββββββββββββββββββ
β Human Interaction Layer β
β β
β β’ Web UI β
β β’ Internal tool β
β β’ Approval system β
β β
β User approves / rejects β
βββββββββββββββ¬βββββββββββββ
β
β API call with:
β - callbackId
β - decision
β
βΌ
ββββββββββββββββββββββββββββ
β API Gateway β
βββββββββββββββ¬βββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββ
β Callback Lambda β
β β
β β’ Validates request β
β β’ Calls β
β send_durable_executionβ
β _callback_success β
β or failure β
βββββββββββββββ¬βββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββ
β Durable Lambda Function β
β (Resumed) β
β β
β β’ Receives callback β
β β’ Continues workflow β
β β’ Approve / Reject path β
β β’ Finish execution β
ββββββββββββββββββββββββββββ
Now, letβs start adding our callback logic, and wait mechanism. The Durable Lambda pauses execution, emits a callback ID, and resumes only when an external systemβor a humanβexplicitly signals completion. This pattern enables human approvals, long-running workflows, and event-driven orchestration without polling or managing state machines.
from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step
from aws_durable_execution_sdk_python.execution import durable_execution
from aws_durable_execution_sdk_python.config import CallbackConfig, Duration
from time import sleep
@durable_step
def _send_approval_request(
step_context: StepContext,
documentId: str,
reviewers: str,
callbackId: str
) -> str:
# send API call to third party service
# notify/validate human interaction here
return f"_send_approval_request: {documentId} {reviewers} {callbackId}"
@durable_step
def _step(step_context: StepContext, my_arg: int) -> str:
step_context.logger.info("Hey there from _step")
sleep(5)
return f"_step: {my_arg}"
@durable_execution
def lambda_handler(event, context: DurableContext) -> dict:
document_id = event['documentId']
reviewers = event['reviewers']
context.step(lambda _: _step(1)(_), name="prepare-document")
callback = context.create_callback(
name="approval_callback",
config=CallbackConfig(
timeout=Duration.from_hours(24), # Maximum wait time
heartbeat_timeout=Duration.from_hours(2), # Fail if no heartbeat for 2 hours
),
)
callback_id = callback.callback_id
context.step(lambda _: _send_approval_request(document_id, reviewers, callback_id)(_), name="send-document")
print('callback_id -> ', callback_id)
# Wait for result - execution suspends here
result = callback.result()
print('result -> ', result)
if result == 'approve':
return {
'status': 'approve',
'documentId': document_id
}
return {
'status': 'rejected',
'documentId': document_id,
}
π Note: Itβs important to publish new versions of your Durable Functions to be able to test them.
Now, letβs create the Lambda function that will call send_durable_execution_callback_success to resume the Durable Lambda execution.
# Message processor side (separate Lambda or service)
import boto3
import json
from botocore.exceptions import ClientError
print(boto3.__version__)
client = boto3.client('lambda')
def lambda_handler(event, context):
callback_id = event["callback_id"]
status = event["status"]
try:
client.send_durable_execution_callback_success(
CallbackId=callback_id,
Result=status
)
except ClientError as e:
# Notify failure
return {
"statusCode": 500,
"body": json.dumps({
"error": str(e)
})
}
return {
"statusCode": 200,
"message": "Callback Sent"
}
π Note: In my case, I needed to install the latest version of boto3 because the default version bundled with Lambda did not include the newer Lambda Durable Functions APIs. This is how my
requirements.txtfile looks.
boto3==1.42.13
botocore==1.42.13
Once installed, make sure to upload the artifacts to an S3 bucket and reference them from a Lambda layer attached to this function.
I also attached the following permissions to my Lambda function.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lambda:SendDurableExecutionCallbackSuccess",
"lambda:SendDurableExecutionCallbackFailure"
],
"Resource": "*"
}
]
}
π Note: This can be any AWS service (EC2, ECS, etc.). In my case, Iβm using a Lambda function, since it can be easily triggered from API Gateway or even tested directly from the AWS console by running a test invocation.
Result:
To test my Durable Function, I need to provide the following input:
{
"documentId": "a1234",
"reviewers": "awedis"
}
Perfect! You can see that our Lambda function has executed. By clicking on βDurable Execution Details,β you can even view the individual steps that were run.
Now, letβs copy the Callback ID generated by our Durable Function and pass it to the Notify Lambda function.
{
"callback_id": "<YOUR_CALLBACK_ID>",
"status": "approve"
}
Once executed, you can return to the Durable Execution page, where you should see that the Durable Lambda function has resumed and successfully completed the approval_callback.
Lambda Callback Responder
Main Durable Function
Why this example is powerful?
- Human-in-the-loop workflows
- Long-running processes (hours/days)
- No Step Functions needed
- No polling
- No state machines
- Fully serverless
π Monitoring
CloudWatch Metrcis
ApproximateRunningDurableExecutions / Number of durable executions in the RUNNING state
ApproximateRunningDurableExecutionsUtilization / Percentage of your account's maximum running durable executions quota currently in use
DurableExecutionDuration / Elapsed wall-clock time in milliseconds that a durable execution remained in the RUNNING state
DurableExecutionStarted / Number of durable executions that started
DurableExecutionStopped / Number of durable executions stopped using the StopDurableExecution API
DurableExecutionSucceeded / Number of durable executions that completed successfully
DurableExecutionFailed / Number of durable executions that completed with a failure
DurableExecutionTimedOut / Number of durable executions that exceeded their configured execution timeout
DurableExecutionOperations / Cumulative number of operations performed within a durable execution (max: 3,000)
DurableExecutionStorageWrittenBytes / Cumulative amount of data in bytes persisted by a durable execution (max: 100 MB)
π Conclusion
In my opinion, AWS Lambda Durable Functions are extremely useful and can be applied to many complex serverless, event-driven architectures. When combined with services like SQS, SNS, or AWS Step Functions, they enable you to build powerful, scalable, and elegant workflows.
Happy coding π¨π»βπ»
Github Source Code: https://github.com/awedis/aws-lambda-durable-functions-callback
π‘ Enjoyed this? Letβs connect and geek out some more on LinkedIn.







Top comments (0)