1xApi

Posted on Apr 4 • Originally published at 1xapi.com

How to Implement the Saga Pattern for Distributed Transactions in Node.js (2026 Guide)

#node #microservices #architecture #distributed

The hardest problem in distributed systems isn't handling failure — it's handling partial failure. When your order service successfully charges a card but the inventory service crashes before reserving stock, you have a real problem that a single database ROLLBACK can't solve.

The Saga pattern is the industry-standard solution for distributed transactions across microservices. In 2026, with most production Node.js systems running across multiple services, databases, and cloud functions, understanding Sagas is no longer optional — it's a core skill.

This guide shows you how to implement both flavors (choreography and orchestration) in Node.js with working code, Redis-backed state, and proper compensation logic.

What Problem Do Sagas Solve?

In a monolith, you write:

await db.transaction(async (trx) => {
  await chargeCard(userId, amount, trx);
  await reserveInventory(productId, qty, trx);
  await createOrder(userId, productId, trx);
});

If reserveInventory throws, the whole transaction rolls back. Clean.

In microservices, each service owns its own database. There is no shared transaction boundary. If your architecture looks like this:

Client → Order Service (PostgreSQL)
       → Payment Service (Stripe + MySQL)
       → Inventory Service (MongoDB)
       → Notification Service (Redis + Twilio)

You need a way to coordinate multi-step workflows and undo completed steps if a later step fails. That's exactly what the Saga pattern provides.

Saga vs. Two-Phase Commit (2PC)

Before diving into code, here's why Sagas beat 2PC for microservices:

	2PC	Saga
Blocking	Yes (coordinator waits)	No (async steps)
Availability	Low (lock-based)	High
Complexity	High	Medium
Failure recovery	Difficult	Explicit compensations
Node.js support	Poor	Excellent

2PC requires all participants to hold locks during the "prepare" phase. In a distributed Node.js system with network latency, this creates cascading timeouts. Sagas avoid locks entirely by using compensating transactions — explicit undo operations for each step.

Two Saga Flavors

Choreography: Event-Driven Saga

Each service publishes events and listens for events from others. No central coordinator.

OrderService       PaymentService     InventoryService
     |                   |                   |
  ORDER_CREATED ──────►  |                   |
     |            PAYMENT_PROCESSED ────────►|
     |                   |            INVENTORY_RESERVED
     |◄──────────────────────────────────────|
  ORDER_CONFIRMED

Pros: Loose coupling, no single point of failure

Cons: Hard to trace the full workflow, risk of cyclic dependencies

Orchestration: Central Coordinator

A dedicated "saga orchestrator" knows every step and calls services directly.

[Saga Orchestrator]
     ├── Step 1: Call PaymentService.charge()
     ├── Step 2: Call InventoryService.reserve()
     ├── Step 3: Call NotificationService.send()
     └── On failure: Run compensations in reverse

Pros: Easy to trace, single source of truth for workflow state

Cons: Orchestrator becomes a new service to maintain

For most production systems in 2026, orchestration is preferred for complex workflows because observability and debugging are dramatically easier.

Building an Orchestration Saga in Node.js

We'll build an e-commerce order saga: charge payment → reserve inventory → send confirmation. If any step fails, we run compensating transactions in reverse order.

Project Setup

npm init -y
npm install ioredis uuid express

Step 1: Define the Saga Steps

Each step has a execute function and a compensate function:

// saga/orderSagaSteps.js

export const orderSagaSteps = [
  {
    name: 'chargePayment',
    execute: async (context) => {
      const { userId, amount, orderId } = context;
      // In production: call your payment microservice
      const chargeId = `ch_${Date.now()}`;
      console.log(`[Saga] Charging $${amount} for order ${orderId}`);

      // Simulate API call to payment service
      const response = await fetch(`http://payment-service/charge`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ userId, amount, orderId }),
      });

      if (!response.ok) {
        throw new Error(`Payment failed: ${response.statusText}`);
      }

      const { chargeId: id } = await response.json();
      return { chargeId: id }; // Stored in context for compensation
    },
    compensate: async (context) => {
      const { chargeId } = context;
      if (!chargeId) return; // Never executed, nothing to undo
      console.log(`[Saga Compensation] Refunding charge ${chargeId}`);
      await fetch(`http://payment-service/refund/${chargeId}`, { method: 'POST' });
    },
  },
  {
    name: 'reserveInventory',
    execute: async (context) => {
      const { productId, quantity, orderId } = context;
      console.log(`[Saga] Reserving ${quantity}x product ${productId}`);

      const response = await fetch(`http://inventory-service/reserve`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ productId, quantity, orderId }),
      });

      if (!response.ok) {
        throw new Error(`Inventory reservation failed: ${response.statusText}`);
      }

      const { reservationId } = await response.json();
      return { reservationId };
    },
    compensate: async (context) => {
      const { reservationId } = context;
      if (!reservationId) return;
      console.log(`[Saga Compensation] Releasing reservation ${reservationId}`);
      await fetch(`http://inventory-service/release/${reservationId}`, { method: 'POST' });
    },
  },
  {
    name: 'confirmOrder',
    execute: async (context) => {
      const { orderId, userId } = context;
      console.log(`[Saga] Confirming order ${orderId}`);

      await fetch(`http://order-service/confirm/${orderId}`, { method: 'PATCH' });
      await fetch(`http://notification-service/send`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ userId, event: 'ORDER_CONFIRMED', orderId }),
      });
      return { confirmed: true };
    },
    compensate: async (context) => {
      const { orderId } = context;
      console.log(`[Saga Compensation] Cancelling order ${orderId}`);
      await fetch(`http://order-service/cancel/${orderId}`, { method: 'PATCH' });
    },
  },
];

Step 2: The Saga Orchestrator

This is the core engine. It executes steps sequentially, persists state to Redis, and runs compensations in reverse on failure:

// saga/SagaOrchestrator.js
import Redis from 'ioredis';
import { randomUUID } from 'crypto';

const redis = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');
const SAGA_TTL = 60 * 60 * 24; // 24 hours

export class SagaOrchestrator {
  constructor(sagaId, steps) {
    this.sagaId = sagaId || randomUUID();
    this.steps = steps;
  }

  async loadState() {
    const raw = await redis.get(`saga:${this.sagaId}`);
    return raw ? JSON.parse(raw) : null;
  }

  async saveState(state) {
    await redis.setex(`saga:${this.sagaId}`, SAGA_TTL, JSON.stringify(state));
  }

  async execute(initialContext) {
    let state = await this.loadState();

    // Initialize or resume
    if (!state) {
      state = {
        sagaId: this.sagaId,
        status: 'running',
        currentStep: 0,
        context: { ...initialContext },
        completedSteps: [],
        startedAt: new Date().toISOString(),
      };
      await this.saveState(state);
    }

    console.log(`[Saga ${this.sagaId}] Starting from step ${state.currentStep}`);

    // Execute steps starting from where we left off (supports resume on crash)
    for (let i = state.currentStep; i < this.steps.length; i++) {
      const step = this.steps[i];
      console.log(`[Saga ${this.sagaId}] Executing step: ${step.name}`);

      try {
        const result = await step.execute(state.context);

        // Merge results into context for next steps and compensations
        state.context = { ...state.context, ...result };
        state.completedSteps.push(step.name);
        state.currentStep = i + 1;
        await this.saveState(state);

        console.log(`[Saga ${this.sagaId}] Step ${step.name} succeeded`);
      } catch (error) {
        console.error(`[Saga ${this.sagaId}] Step ${step.name} failed: ${error.message}`);
        state.status = 'compensating';
        state.failedStep = step.name;
        state.error = error.message;
        await this.saveState(state);

        // Run compensations in reverse
        await this.compensate(state);
        return { success: false, sagaId: this.sagaId, failedAt: step.name, error: error.message };
      }
    }

    state.status = 'completed';
    state.completedAt = new Date().toISOString();
    await this.saveState(state);

    console.log(`[Saga ${this.sagaId}] Completed successfully`);
    return { success: true, sagaId: this.sagaId, context: state.context };
  }

  async compensate(state) {
    // Run compensations in reverse order of completion
    const stepsToCompensate = [...state.completedSteps].reverse();

    for (const stepName of stepsToCompensate) {
      const step = this.steps.find((s) => s.name === stepName);
      if (!step) continue;

      try {
        console.log(`[Saga ${this.sagaId}] Compensating: ${stepName}`);
        await step.compensate(state.context);
        console.log(`[Saga ${this.sagaId}] Compensated: ${stepName}`);
      } catch (compError) {
        // Compensation failure is critical — alert your on-call team
        console.error(`[Saga ${this.sagaId}] COMPENSATION FAILED for ${stepName}: ${compError.message}`);
        // Store for manual recovery
        await redis.lpush(
          'saga:stuck_compensations',
          JSON.stringify({ sagaId: this.sagaId, step: stepName, error: compError.message, context: state.context })
        );
      }
    }

    const finalState = await this.loadState();
    finalState.status = 'compensated';
    finalState.compensatedAt = new Date().toISOString();
    await this.saveState(finalState);
  }
}

Step 3: Wire It Into Your Express API

// routes/orders.js
import express from 'express';
import { SagaOrchestrator } from '../saga/SagaOrchestrator.js';
import { orderSagaSteps } from '../saga/orderSagaSteps.js';
import { randomUUID } from 'crypto';

const router = express.Router();

router.post('/orders', async (req, res) => {
  const { userId, productId, quantity, amount } = req.body;
  const orderId = randomUUID();
  const sagaId = randomUUID();

  const orchestrator = new SagaOrchestrator(sagaId, orderSagaSteps);

  const result = await orchestrator.execute({
    orderId,
    userId,
    productId,
    quantity,
    amount,
  });

  if (result.success) {
    return res.status(201).json({
      orderId,
      sagaId: result.sagaId,
      status: 'confirmed',
    });
  }

  return res.status(422).json({
    error: 'Order could not be completed',
    reason: result.error,
    sagaId: result.sagaId,
    compensated: true,
  });
});

// Check saga status (useful for debugging and client polling)
router.get('/orders/saga/:sagaId', async (req, res) => {
  const raw = await redis.get(`saga:${req.params.sagaId}`);
  if (!raw) return res.status(404).json({ error: 'Saga not found' });
  res.json(JSON.parse(raw));
});

export default router;

Handling Stuck Compensations

One of the most dangerous scenarios in Saga patterns is a failed compensation — your payment was charged, the inventory step failed, and now your refund call is also failing. You have inconsistent state.

The solution: a compensation recovery queue.

// workers/compensationRecovery.js
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

async function processStuckCompensations() {
  while (true) {
    // Block-pop with 30-second timeout
    const item = await redis.blpop('saga:stuck_compensations', 30);
    if (!item) continue;

    const { sagaId, step, context } = JSON.parse(item[1]);
    console.log(`[Recovery] Retrying compensation for saga ${sagaId}, step ${step}`);

    // Retry with exponential backoff
    let attempt = 0;
    while (attempt < 5) {
      try {
        await step.compensate(context); // Retry
        console.log(`[Recovery] Successfully compensated ${step} for saga ${sagaId}`);
        break;
      } catch (err) {
        attempt++;
        const delay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s, 16s, 32s
        await new Promise((r) => setTimeout(r, delay));
      }
    }

    if (attempt === 5) {
      // Escalate to human: PagerDuty, Slack alert, etc.
      console.error(`[Recovery] MANUAL INTERVENTION REQUIRED for saga ${sagaId}`);
      await redis.lpush('saga:requires_manual_review', item[1]);
    }
  }
}

processStuckCompensations();

Choreography Saga: Event-Driven Alternative

For simpler workflows or when you want maximum decoupling, the choreography approach uses events:

// Using Node.js EventEmitter for in-process (use Redis pub/sub for multi-process)
import { EventEmitter } from 'events';

const sagaBus = new EventEmitter();

// Payment Service listener
sagaBus.on('ORDER_CREATED', async ({ orderId, userId, amount }) => {
  try {
    await chargeUser(userId, amount);
    sagaBus.emit('PAYMENT_PROCESSED', { orderId, userId });
  } catch {
    sagaBus.emit('PAYMENT_FAILED', { orderId, userId, amount });
  }
});

// Inventory Service listener
sagaBus.on('PAYMENT_PROCESSED', async ({ orderId }) => {
  try {
    await reserveStock(orderId);
    sagaBus.emit('INVENTORY_RESERVED', { orderId });
  } catch {
    sagaBus.emit('INVENTORY_FAILED', { orderId });
    sagaBus.emit('PAYMENT_REVERSE', { orderId }); // Trigger compensation
  }
});

// Compensation listener
sagaBus.on('PAYMENT_REVERSE', async ({ orderId }) => {
  console.log(`Refunding payment for order ${orderId}`);
  await issueRefund(orderId);
});

// Trigger the saga
sagaBus.emit('ORDER_CREATED', {
  orderId: 'ord_123',
  userId: 'user_456',
  amount: 49.99,
});

For production, replace EventEmitter with Redis Pub/Sub, BullMQ, or Kafka to persist events across process restarts.

Choreography vs. Orchestration: When to Use Each

Scenario	Use
Complex multi-step workflow (5+ steps)	Orchestration
Need full audit trail / debugging	Orchestration
Loosely coupled, independent services	Choreography
Simple 2–3 step flow	Choreography
Workflows that may need to pause/resume	Orchestration
Temporal-style durable workflows	Orchestration

Production Checklist for Sagas in 2026

1. Make all steps idempotent. If the orchestrator retries a step after a crash, running it twice should be safe. Use the orderId or sagaId as an idempotency key in downstream services.

2. Persist saga state before each step. Don't rely on in-memory state. Our Redis-backed approach handles server restarts and crashes.

3. Monitor saga duration. Long-running sagas should alert if they exceed expected completion time:

// Alert if saga takes > 30 seconds
if (Date.now() - new Date(state.startedAt).getTime() > 30_000) {
  await alertOncall(`Slow saga: ${this.sagaId}`);
}

4. Build a saga dashboard. Query your Redis keys with SCAN saga:* and expose a /admin/sagas endpoint to see in-flight and stuck sagas.

5. Test compensation paths explicitly. Most teams only test the happy path. Write tests that force failures at each step and verify compensations run correctly.

// Example with vitest / jest
it('should refund payment if inventory fails', async () => {
  mockPaymentService.charge.mockResolvedValueOnce({ chargeId: 'ch_123' });
  mockInventoryService.reserve.mockRejectedValueOnce(new Error('Out of stock'));
  mockPaymentService.refund.mockResolvedValueOnce({});

  const result = await orchestrator.execute(testContext);

  expect(result.success).toBe(false);
  expect(mockPaymentService.refund).toHaveBeenCalledWith('ch_123');
});

Conclusion

The Saga pattern solves one of the hardest problems in microservices: keeping data consistent across service boundaries without distributed locks. In 2026, with APIs increasingly split across serverless functions, edge workers, and independent services, Sagas are an essential tool.

Key takeaways:

Orchestration is easier to debug and trace; use it for complex workflows
Choreography offers better decoupling; use it for simple, independent steps
Always implement compensation logic — assume failures will happen
Persist saga state to Redis so you can resume after crashes
Handle stuck compensations explicitly with a recovery queue

If you're building APIs that need to be reliable under failure conditions, consider publishing them on RapidAPI so consumers can integrate them with confidence into their own distributed systems.

DEV Community