AWS just changed the game for serverless workflows. Here's everything you need to know about Lambda Durable Functions—and why they might replace your Step Functions.
I'll be honest with you: when AWS announced Lambda Durable Functions at re:Invent, I was skeptical. Another workflow orchestration service? Really? We already have Step Functions, and they work just fine.
But after spending a few weeks migrating some of our long-running processes, I'm convinced this is a legitimate game-changer. Let me explain why.
The Problem We've All Been Ignoring
Think about the last time you built a multi-step workflow. Maybe it was an order processing system that waits for payment confirmation. Or a content moderation pipeline with human review steps. Or a data pipeline that processes files uploaded by users throughout the day.
You probably reached for Step Functions, right? I did too. And then I saw the bill.
Here's the thing: Step Functions charge you per state transition. That $25 per million transitions sounds cheap until you realize your approval workflow with six states costs you money every single time it runs—even if it's just sitting there waiting for someone to click "Approve" in an email.
💡 The Real Cost of Waiting
A typical approval workflow with 8 state transitions, running 10,000 times per month, costs you $2.00 in Step Functions charges. It doesn't sound like much, but you're paying for states that do nothing except wait. Lambda Durable Functions? $0.00 for the waiting time.
What Are Lambda Durable Functions, Anyway?
Lambda Durable Functions let you write long-running workflows as regular code—no JSON state machines required. You write normal TypeScript or Python, and AWS handles the orchestration, state persistence, and resumption after pauses.
The magic is in the await statement. When your function awaits a durable task, AWS checkpoints your function's state, shuts it down, and brings it back to life when the task completes. Could be 5 seconds later. Could be 5 months later. You don't pay for the wait.
How Lambda Durable Functions Work
Function Starts → Execute Code → Await Durable Task?
↓ ↓
Continue Checkpoint State
↓ ↓
Complete/Next Step Suspend Function
↓
Wait for Event/Timer
↓
Restore State
↓
Resume Execution
A Real Example: Document Approval Workflow
Let's build something practical. Here's a document approval system that waits for multiple reviewers, sends reminders, and escalates if nobody responds. In Step Functions, this would be 15+ states with complex choice logic. In Durable Functions? It's just code.
import { DurableOrchestration } from '@aws-lambda/durable-functions';
export const documentApprovalWorkflow = new DurableOrchestration(
async (context) => {
const { documentId, reviewers } = context.input;
// Step 1: Send notification to all reviewers
await context.callActivity('sendReviewNotifications', {
documentId,
reviewers
});
// Step 2: Wait for approvals with timeout (7 days)
const approvalTask = context.waitForEvent('approval', 7 * 24 * 60 * 60);
const reminderTask = context.createTimer(3 * 24 * 60 * 60); // 3 days
const winner = await Promise.race([approvalTask, reminderTask]);
if (winner === 'reminder') {
// Send reminder and wait again
await context.callActivity('sendReminderEmails', { reviewers });
const secondApproval = await context.waitForEvent('approval', 4 * 24 * 60 * 60);
if (!secondApproval) {
// Escalate to manager
await context.callActivity('escalateToManager', { documentId });
await context.waitForEvent('managerApproval', 2 * 24 * 60 * 60);
}
}
// Step 3: Process approval
const result = await context.callActivity('processApproval', {
documentId,
approvedAt: new Date().toISOString()
});
return result;
}
);
// External system triggers approval
export const submitApproval = async (workflowId: string, decision: string) => {
await durableClient.raiseEvent(workflowId, 'approval', { decision });
};
Look at that code. It reads like a script you'd write to describe the process to a colleague. "Send notifications, wait for approval, send reminders if nobody responds, escalate if we still don't hear back." That's it.
No state machine JSON. No $.decision == 'approved' choice conditions. Just regular programming logic.
Multi-Step Applications: The Sweet Spot
Durable Functions really shine when you're building applications that have multiple discrete steps, each potentially taking different amounts of time. Here are patterns I've found work incredibly well:
1. The Data Pipeline Pattern
You receive a file upload, process it through multiple transformations, wait for quality checks, and then publish results. Each step might take seconds or hours depending on file size.
2. The Human-in-the-Loop Pattern
This is where Durable Functions absolutely crush Step Functions. Any time you need to wait for a human decision—approvals, content moderation, manual verification—you're waiting potentially hours or days. With Step Functions, you pay for every state transition. With Durable Functions, you pay nothing while waiting.
3. The Scheduled Batch Pattern
Process data in chunks throughout the day, aggregating results, and generating reports. Traditional cron jobs don't maintain state between runs. Durable Functions do.
export const dailyReportWorkflow = new DurableOrchestration(
async (context) => {
const results = [];
// Process batches every 6 hours
for (let i = 0; i < 4; i++) {
const batchResult = await context.callActivity('processBatch', {
batchNumber: i,
timestamp: new Date()
});
results.push(batchResult);
// Wait 6 hours before next batch
if (i < 3) {
await context.createTimer(6 * 60 * 60);
}
}
// Generate final report with all batches
return await context.callActivity('generateReport', { results });
}
);
Lambda Durable Functions vs. Step Functions: The Honest Comparison
Okay, let's talk numbers. When should you use each service?
| Factor | Lambda Durable Functions | Step Functions (Standard) |
|---|---|---|
| Max Duration | 365 days | 365 days |
| Waiting Cost | $0 (state is persisted, function suspended) | Free after first 4,000 transitions/month |
| Execution Cost | Lambda pricing ($0.20 per 1M requests) | $25 per 1M state transitions |
| State Machine | Code-based (TypeScript/Python) | JSON ASL (Amazon States Language) |
| Versioning | Built into code deployment | Manual version management |
| Testing | Standard unit tests, local debugging | Requires Step Functions Local or AWS |
| Visual Editor | None (code only) | Workflow Studio (drag-and-drop) |
| Error Handling | Try-catch blocks | Retry policies in JSON |
Cost Breakdown Example
Scenario: Approval workflow with 8 steps, waiting an average of 48 hours for human response, processing 50,000 documents per month.
Step Functions Cost:
- 50,000 workflows × 8 state transitions = 400,000 transitions
- (400,000 - 4,000 free tier) × $0.000025 = $9.90/month
Durable Functions Cost:
- 50,000 workflows × 3 Lambda invocations (start, resume, complete) = 150,000 requests
- 150,000 × $0.0000002 = $0.03/month
Savings: 99.7% for workflows with long wait times
When NOT to Use Durable Functions
I know I'm sounding like a fanboy, but Durable Functions aren't always the right choice. Here's when Step Functions still win:
You need a visual workflow editor: Non-technical stakeholders who need to understand or modify workflows will appreciate Step Functions' Workflow Studio.
Heavy parallel processing: Step Functions' Map state is optimized for fan-out/fan-in patterns at massive scale. Durable Functions can do parallel tasks, but Step Functions handles 10,000+ parallel branches more elegantly.
AWS service integrations: Step Functions has 220+ direct AWS service integrations. Durable Functions require you to write code for each integration.
Compliance requirements: Some industries require visual audit trails. Step Functions' execution history is more readable for auditors.
Getting Started: Your First Durable Function
The fastest way to start is with the AWS SAM template:
sam init --runtime nodejs20.x --app-template durable-function
cd my-durable-app
sam build && sam deploy --guided
Or deploy with CDK:
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as durable from '@aws-cdk/aws-lambda-durable-functions';
export class DurableStack extends cdk.Stack {
constructor(scope: cdk.App, id: string) {
super(scope, id);
const workflow = new durable.DurableFunction(this, 'MyWorkflow', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('functions/workflow'),
timeout: cdk.Duration.minutes(15),
maxDuration: cdk.Duration.days(365)
});
}
}
Best Practices I've Learned the Hard Way
1. Make your activities idempotent. AWS might retry activities if there's a failure. Design them to handle duplicate calls gracefully.
2. Don't store large data in workflow state. The workflow state is limited to 256 KB. Store large payloads in S3 and pass references.
3. Use correlation IDs. When external systems need to signal your workflow, they'll need the workflow execution ID. Make it something meaningful like order-{orderId} instead of a random UUID.
4. Set realistic timeouts. Your workflow might run for a year, but individual activities should have much shorter timeouts (seconds to minutes).
5. Monitor with CloudWatch. Set up alarms for stuck workflows, failed activities, and unexpected wait times.
The Bottom Line
Lambda Durable Functions are a significant evolution in serverless orchestration. They give you the simplicity of writing workflows as code, the cost savings of not paying for idle time, and the power of running workflows for up to a year.
If you're building new long-running workflows—especially those with human-in-the-loop steps or extended wait times—start with Durable Functions. You'll write less code, pay less money, and sleep better knowing your workflows are running on battle-tested AWS infrastructure.
For existing Step Functions workflows, migrate if your workflows spend most of their time waiting. For fast-moving workflows with lots of branching logic and AWS service integrations, Step Functions might still be your best bet.
The serverless world just got a lot more interesting. Time to build something that runs for a year. 🚀
What workflows are you running that could benefit from Durable Functions? Drop a comment below and let's discuss!


Top comments (0)