DEV Community

Cover image for Implementing Saga Pattern With Lambda Durable Function
Rishi for AWS Community Builders

Posted on • Originally published at tricksumo.com

Implementing Saga Pattern With Lambda Durable Function

When you hit the “Place Order” button, that event triggers a series of steps, including inventory reservation, payment processing, and shipping initialization. Now, suppose your card is charged by the payment service (Stripe 🤔), but the API call to the third-party shipping service failed.

Modern systems don’t live inside a single database anymore. You can’t just rollback everything like a normal database transaction. In this era of distributed services, Saga patterns solve the problem of distributed rollback.

What is the Saga Pattern?

Saga is a sequence of steps carried out in a workflow. For each successful step, there exists a compensating step. As the saga progresses, these steps are stored in a list. Down the lane, if any step fails, all compensating steps are executed in reverse order.

Saga Pattern with AWS Durable Lambda Functions

Because of built-in checkpoints and replay mechanisms, AWS Lambda functions are perfect for implementing the saga pattern.

Each step of the saga can be wrapped in a durable step, allowing independent retry strategies. After each successful step, durable function checkpoints state; hence, on retry, already completed steps are not executed again.

To implement forward steps and compensation, all that we need is a try-catch block. The idea is simple: keep moving forward in the try block. If any step fails, compensating steps run in the catch block.

import {
  withDurableExecution,
  DurableContext,
  createRetryStrategy,
  JitterStrategy,
} from '@aws/durable-execution-sdk-js';
import { randomUUID } from 'crypto';

//Types

interface OrderEvent {
  orderId: string;
  customerId: string;
  items: Array<{ sku: string; qty: number; price: number }>;
  paymentMethod: { type: string; token: string };
  shippingAddress: { street: string; city: string; zip: string };
}

// Custom Error Classes
// Used by retryableErrorTypes - SDK checks instanceof, so real classes are needed

class NetworkError extends Error {
  name = 'NetworkError';
}

class PaymentDeclinedError extends Error {
  name = 'PaymentDeclinedError';
}

// Retry Strategies

/**
 * Exponential backoff with jitter - for external API calls.
 * Attempts: 1s → 2s → 4s → 8s → 16s (random jitter added to avoid thundering herd)
 * Only retries NetworkError - other errors won't be retried.
 */
const apiRetryStrategy = createRetryStrategy({
  maxAttempts: 5,
  initialDelay: { seconds: 1 },
  maxDelay: { seconds: 30 },
  backoffRate: 2.0,
  jitter: JitterStrategy.FULL,
  retryableErrorTypes: [NetworkError], // ← class constructor, not string
});

/**
 * Custom retry for payment - retries network errors but NOT declined cards.
 * Uses a function instead of createRetryStrategy for fine-grained control.
 */
const paymentRetryStrategy = (error: Error, attemptCount: number) => {
  // retryableErrorTypes equivalent - never retry a declined card
  if (error instanceof PaymentDeclinedError) {
    return { shouldRetry: false };
  }

  // maxAttempts: 5 equivalent
  if (attemptCount >= 5) {
    return { shouldRetry: false };
  }

  // initialDelay(1s) + backoffRate(2.0) + maxDelay(30s) + jitter(FULL) equivalent
  const baseDelay = 1 * Math.pow(2.0, attemptCount - 1); // exponential: 1s → 2s → 4s → 8s → 16s
  const capped = Math.min(baseDelay, 30);                 // maxDelay: never exceed 30s
  const jittered = Math.random() * capped;                // FULL jitter: random between 0 and capped
  const seconds = Math.max(1, Math.round(jittered));      // minimum 1s, rounded to whole second

  return { shouldRetry: true, delay: { seconds } };
};

// Handler

export const handler = withDurableExecution(async (event: OrderEvent, context: DurableContext) => {
  context.logger.info('Order processing started', { orderId: event.orderId });

  // Saga compensations array tracks what to undo if something fails later
  const compensations: Array<{ name: string; fn: () => Promise<void> }> = [];

  try {

    // ── Step 1: Validate ────────────────────────────────────────────────────
    // Throws a plain Error for invalid input, "shouldRetry: false" strategy means it fails immediately without retry
    await context.step('validate-order', async () => {
      context.logger.info('Validating order');

      if (!event.orderId || !event.customerId) {
        throw new Error('Order missing required fields');
      }
      if (!event.items || event.items.length === 0) {
        throw new Error('Order has no items');
      }

      const total = event.items.reduce((sum, i) => sum + i.price * i.qty, 0);
      if (total <= 0) {
        throw new Error('Order total must be greater than zero');
      }

      return { valid: true, total };
    },
      { retryStrategy: () => ({ shouldRetry: false }) }
    );

    // ── Step 2: Reserve inventory ───────────────────────────────────────────
    // Retries with exponential backoff, inventory service may be temporarily busy
    const reservation = await context.step(
      'reserve-inventory',
      async () => {
        context.logger.info('Reserving inventory');
        return await callInventoryService(event.orderId, event.items);
      },
      { retryStrategy: apiRetryStrategy }
    );

    // Register compensation, if something fails later, cancel this reservation
    compensations.push({
      name: 'cancel-reservation',
      fn: async () => { await callInventoryService(event.orderId, event.items, 'cancel'); },
    });

    context.logger.info('Inventory reserved', { reservationId: reservation.id });

    // ── Step 3: Generate idempotency key ───────────────────────────────────
    // Key is generated ONCE inside a step then checkpointed and same value returned on every replay.
    // This is the recommended pattern for payment APIs that support idempotency keys.
    // WARNING: Never generate outside a step because it changes on replay, defeating deduplication.
    const idempotencyKey = await context.step('payment-idempotency-key', async () =>
      randomUUID()
    );

    // ── Step 4: Charge payment ──────────────────────────────────────────────
    // AtLeastOnce (default) + idempotency key = safe to retry.
    // Even if Lambda crashes after charge but before checkpoint, the retry sends
    // the same idempotency key - payment provider deduplicates and returns original result.
    // No double charge risk.
    const payment = await context.step(
      'charge-payment',
      async () => {
        context.logger.info('Charging payment', { idempotencyKey });
        return await callPaymentService(event.paymentMethod, getTotalAmount(event), idempotencyKey);
      },
      { retryStrategy: paymentRetryStrategy } // retries NetworkError safely
    );

    // Register compensation - if shipping fails, refund the payment
    compensations.push({
      name: 'refund-payment',
      fn: async () => {
        await callPaymentService(event.paymentMethod, getTotalAmount(event), idempotencyKey, 'refund', payment.id);
      },
    });

    context.logger.info('Payment charged', { paymentId: payment.id });

    // ── Step 4: Create shipment ─────────────────────────────────────────────
    // Retries with exponential backoff, shipping service may be temporarily down
    const shipment = await context.step(
      'create-shipment',
      async () => {
        context.logger.info('Creating shipment');
        return await callShippingService(event.orderId, event.shippingAddress, event.items);
      },
      { retryStrategy: apiRetryStrategy }
    );

    context.logger.info('Shipment created', { trackingId: shipment.trackingId });

    // ── Step 5: Send confirmation ───────────────────────────────────────────
    // Simple fixed-delay retry: email service is reliable, 3 attempts is enough
    await context.step(
      'send-confirmation',
      async () => {
        context.logger.info('Sending confirmation email');
        await callNotificationService(event.customerId, event.orderId, shipment.trackingId);
      },
      {
        retryStrategy: createRetryStrategy({
          maxAttempts: 3,
          initialDelay: { seconds: 2 },
          backoffRate: 1, // fixed delay, no backoff
        }),
      }
    );

    context.logger.info('Order processing complete', { orderId: event.orderId });

    return {
      success: true,
      orderId: event.orderId,
      reservationId: reservation.id,
      paymentId: payment.id,
      trackingId: shipment.trackingId,
    };

  } catch (error) {
    // ── Saga: undo completed steps in reverse order ─────────────────────────
    // Example: payment charged but shipping failed → refund payment, cancel reservation
    context.logger.error('Order failed, running compensations', {
      orderId: event.orderId,
      error: (error as Error).message,
    });

    for (const comp of compensations.reverse()) {
      try {
        // Each compensation is its own durable step, checkpointed and retried
        await context.step(comp.name, async () => comp.fn());
        context.logger.info(`Compensation done: ${comp.name}`);
      } catch (compError) {
        // Log but continue, try all compensations even if one fails
        context.logger.error(`Compensation failed: ${comp.name}`, compError);
      }
    }

    throw error; // re-throw so execution is marked FAILED in console
  }
});

// ─── Simulated Service Calls ──────────────────────────────────────────────────
// In real code these would call actual APIs. Here we simulate occasional failures
// to demonstrate retry behavior.

async function callInventoryService(
  orderId: string,
  items: OrderEvent['items'],
  action = 'reserve'
): Promise<{ id: string }> {
  // 20% chance of network failure - demonstrates retry kicking in
  if (Math.random() < (3 / 5)) throw new NetworkError('Inventory service timeout');
  return { id: `RES-${orderId}-${Date.now()}` };
}

async function callPaymentService(
  method: OrderEvent['paymentMethod'],
  amount: number,
  idempotencyKey: string,
  action = 'charge',
  paymentId?: string
): Promise<{ id: string }> {
  if (method.token === 'DECLINED') throw new PaymentDeclinedError('Card declined');
  if (Math.random() < (2 / 3)) throw new NetworkError('Payment gateway timeout');
  return { id: `PAY-${idempotencyKey}` };
}

async function callShippingService(
  orderId: string,
  address: OrderEvent['shippingAddress'],
  items: OrderEvent['items']
): Promise<{ trackingId: string }> {
  if (Math.random() < (2 / 3)) throw new NetworkError('Shipping service unavailable');
  return { trackingId: `TRACK-${orderId}-${Date.now()}` };
}

async function callNotificationService(
  customerId: string,
  orderId: string,
  trackingId: string
): Promise<void> {
  console.log(`Confirmation sent to ${customerId} for order ${orderId}, tracking: ${trackingId}`);
}

function getTotalAmount(event: OrderEvent): number {
  return event.items.reduce((sum, i) => sum + i.price * i.qty, 0);
}
Enter fullscreen mode Exit fullscreen mode

Let’s run the code!

Copy and paste the above code into the durable Lambda code editor. Then use the commands below to invoke the lambda with different payloads

Success scenario:-

aws lambda invoke \
  --function-name durable-parallel-test:prod \
  --payload '{"orderId":"ORD-001","customerId":"CUST-123","items":[{"sku":"ITEM-1","qty":2,"price":25}],"paymentMethod":{"type":"card","token":"tok_valid"},"shippingAddress":{"street":"123 Main St","city":"Seattle","zip":"98101"}}' \
  --cli-binary-format raw-in-base64-out \
  response.json && cat response.json
Enter fullscreen mode Exit fullscreen mode

Fail scenario:-

aws lambda invoke \
  --function-name durable-parallel-test:prod \
  --payload '{"orderId":"ORD-002","customerId":"CUST-123","items":[{"sku":"ITEM-1","qty":1,"price":50}],"paymentMethod":{"type":"card","token":"DECLINED"},"shippingAddress":{"street":"123 Main St","city":"Seattle","zip":"98101"}}' \
  --cli-binary-format raw-in-base64-out \
  response.json && cat response.json
Enter fullscreen mode Exit fullscreen mode

Invalid input:

aws lambda invoke \
  --function-name durable-parallel-test:prod \
  --payload '{"orderId":"","customerId":"","items":[],"paymentMethod":{"type":"card","token":"tok"},"shippingAddress":{"street":"","city":"","zip":""}}' \
  --cli-binary-format raw-in-base64-out \
  response.json && cat response.json
Enter fullscreen mode Exit fullscreen mode

Conclusion

Saga patterns make rollback to a consistent state very simple in distributed transactions. And with lambda durable functions’ checkpointing and retry capability, it is easy-peasy to implement. Thanks for reading😊

Top comments (0)