The hardest problem in distributed systems isn't handling failure — it's handling partial failure. When your order service successfully charges a card but the inventory service crashes before reserving stock, you have a real problem that a single database ROLLBACK can't solve.
The Saga pattern is the industry-standard solution for distributed transactions across microservices. In 2026, with most production Node.js systems running across multiple services, databases, and cloud functions, understanding Sagas is no longer optional — it's a core skill.
This guide shows you how to implement both flavors (choreography and orchestration) in Node.js with working code, Redis-backed state, and proper compensation logic.
What Problem Do Sagas Solve?
In a monolith, you write:
await db.transaction(async (trx) => {
await chargeCard(userId, amount, trx);
await reserveInventory(productId, qty, trx);
await createOrder(userId, productId, trx);
});
If reserveInventory throws, the whole transaction rolls back. Clean.
In microservices, each service owns its own database. There is no shared transaction boundary. If your architecture looks like this:
Client → Order Service (PostgreSQL)
→ Payment Service (Stripe + MySQL)
→ Inventory Service (MongoDB)
→ Notification Service (Redis + Twilio)
You need a way to coordinate multi-step workflows and undo completed steps if a later step fails. That's exactly what the Saga pattern provides.
Saga vs. Two-Phase Commit (2PC)
Before diving into code, here's why Sagas beat 2PC for microservices:
| 2PC | Saga | |
|---|---|---|
| Blocking | Yes (coordinator waits) | No (async steps) |
| Availability | Low (lock-based) | High |
| Complexity | High | Medium |
| Failure recovery | Difficult | Explicit compensations |
| Node.js support | Poor | Excellent |
2PC requires all participants to hold locks during the "prepare" phase. In a distributed Node.js system with network latency, this creates cascading timeouts. Sagas avoid locks entirely by using compensating transactions — explicit undo operations for each step.
Two Saga Flavors
Choreography: Event-Driven Saga
Each service publishes events and listens for events from others. No central coordinator.
OrderService PaymentService InventoryService
| | |
ORDER_CREATED ──────► | |
| PAYMENT_PROCESSED ────────►|
| | INVENTORY_RESERVED
|◄──────────────────────────────────────|
ORDER_CONFIRMED
Pros: Loose coupling, no single point of failure
Cons: Hard to trace the full workflow, risk of cyclic dependencies
Orchestration: Central Coordinator
A dedicated "saga orchestrator" knows every step and calls services directly.
[Saga Orchestrator]
├── Step 1: Call PaymentService.charge()
├── Step 2: Call InventoryService.reserve()
├── Step 3: Call NotificationService.send()
└── On failure: Run compensations in reverse
Pros: Easy to trace, single source of truth for workflow state
Cons: Orchestrator becomes a new service to maintain
For most production systems in 2026, orchestration is preferred for complex workflows because observability and debugging are dramatically easier.
Building an Orchestration Saga in Node.js
We'll build an e-commerce order saga: charge payment → reserve inventory → send confirmation. If any step fails, we run compensating transactions in reverse order.
Project Setup
npm init -y
npm install ioredis uuid express
Step 1: Define the Saga Steps
Each step has a execute function and a compensate function:
// saga/orderSagaSteps.js
export const orderSagaSteps = [
{
name: 'chargePayment',
execute: async (context) => {
const { userId, amount, orderId } = context;
// In production: call your payment microservice
const chargeId = `ch_${Date.now()}`;
console.log(`[Saga] Charging $${amount} for order ${orderId}`);
// Simulate API call to payment service
const response = await fetch(`http://payment-service/charge`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ userId, amount, orderId }),
});
if (!response.ok) {
throw new Error(`Payment failed: ${response.statusText}`);
}
const { chargeId: id } = await response.json();
return { chargeId: id }; // Stored in context for compensation
},
compensate: async (context) => {
const { chargeId } = context;
if (!chargeId) return; // Never executed, nothing to undo
console.log(`[Saga Compensation] Refunding charge ${chargeId}`);
await fetch(`http://payment-service/refund/${chargeId}`, { method: 'POST' });
},
},
{
name: 'reserveInventory',
execute: async (context) => {
const { productId, quantity, orderId } = context;
console.log(`[Saga] Reserving ${quantity}x product ${productId}`);
const response = await fetch(`http://inventory-service/reserve`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ productId, quantity, orderId }),
});
if (!response.ok) {
throw new Error(`Inventory reservation failed: ${response.statusText}`);
}
const { reservationId } = await response.json();
return { reservationId };
},
compensate: async (context) => {
const { reservationId } = context;
if (!reservationId) return;
console.log(`[Saga Compensation] Releasing reservation ${reservationId}`);
await fetch(`http://inventory-service/release/${reservationId}`, { method: 'POST' });
},
},
{
name: 'confirmOrder',
execute: async (context) => {
const { orderId, userId } = context;
console.log(`[Saga] Confirming order ${orderId}`);
await fetch(`http://order-service/confirm/${orderId}`, { method: 'PATCH' });
await fetch(`http://notification-service/send`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ userId, event: 'ORDER_CONFIRMED', orderId }),
});
return { confirmed: true };
},
compensate: async (context) => {
const { orderId } = context;
console.log(`[Saga Compensation] Cancelling order ${orderId}`);
await fetch(`http://order-service/cancel/${orderId}`, { method: 'PATCH' });
},
},
];
Step 2: The Saga Orchestrator
This is the core engine. It executes steps sequentially, persists state to Redis, and runs compensations in reverse on failure:
// saga/SagaOrchestrator.js
import Redis from 'ioredis';
import { randomUUID } from 'crypto';
const redis = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');
const SAGA_TTL = 60 * 60 * 24; // 24 hours
export class SagaOrchestrator {
constructor(sagaId, steps) {
this.sagaId = sagaId || randomUUID();
this.steps = steps;
}
async loadState() {
const raw = await redis.get(`saga:${this.sagaId}`);
return raw ? JSON.parse(raw) : null;
}
async saveState(state) {
await redis.setex(`saga:${this.sagaId}`, SAGA_TTL, JSON.stringify(state));
}
async execute(initialContext) {
let state = await this.loadState();
// Initialize or resume
if (!state) {
state = {
sagaId: this.sagaId,
status: 'running',
currentStep: 0,
context: { ...initialContext },
completedSteps: [],
startedAt: new Date().toISOString(),
};
await this.saveState(state);
}
console.log(`[Saga ${this.sagaId}] Starting from step ${state.currentStep}`);
// Execute steps starting from where we left off (supports resume on crash)
for (let i = state.currentStep; i < this.steps.length; i++) {
const step = this.steps[i];
console.log(`[Saga ${this.sagaId}] Executing step: ${step.name}`);
try {
const result = await step.execute(state.context);
// Merge results into context for next steps and compensations
state.context = { ...state.context, ...result };
state.completedSteps.push(step.name);
state.currentStep = i + 1;
await this.saveState(state);
console.log(`[Saga ${this.sagaId}] Step ${step.name} succeeded`);
} catch (error) {
console.error(`[Saga ${this.sagaId}] Step ${step.name} failed: ${error.message}`);
state.status = 'compensating';
state.failedStep = step.name;
state.error = error.message;
await this.saveState(state);
// Run compensations in reverse
await this.compensate(state);
return { success: false, sagaId: this.sagaId, failedAt: step.name, error: error.message };
}
}
state.status = 'completed';
state.completedAt = new Date().toISOString();
await this.saveState(state);
console.log(`[Saga ${this.sagaId}] Completed successfully`);
return { success: true, sagaId: this.sagaId, context: state.context };
}
async compensate(state) {
// Run compensations in reverse order of completion
const stepsToCompensate = [...state.completedSteps].reverse();
for (const stepName of stepsToCompensate) {
const step = this.steps.find((s) => s.name === stepName);
if (!step) continue;
try {
console.log(`[Saga ${this.sagaId}] Compensating: ${stepName}`);
await step.compensate(state.context);
console.log(`[Saga ${this.sagaId}] Compensated: ${stepName}`);
} catch (compError) {
// Compensation failure is critical — alert your on-call team
console.error(`[Saga ${this.sagaId}] COMPENSATION FAILED for ${stepName}: ${compError.message}`);
// Store for manual recovery
await redis.lpush(
'saga:stuck_compensations',
JSON.stringify({ sagaId: this.sagaId, step: stepName, error: compError.message, context: state.context })
);
}
}
const finalState = await this.loadState();
finalState.status = 'compensated';
finalState.compensatedAt = new Date().toISOString();
await this.saveState(finalState);
}
}
Step 3: Wire It Into Your Express API
// routes/orders.js
import express from 'express';
import { SagaOrchestrator } from '../saga/SagaOrchestrator.js';
import { orderSagaSteps } from '../saga/orderSagaSteps.js';
import { randomUUID } from 'crypto';
const router = express.Router();
router.post('/orders', async (req, res) => {
const { userId, productId, quantity, amount } = req.body;
const orderId = randomUUID();
const sagaId = randomUUID();
const orchestrator = new SagaOrchestrator(sagaId, orderSagaSteps);
const result = await orchestrator.execute({
orderId,
userId,
productId,
quantity,
amount,
});
if (result.success) {
return res.status(201).json({
orderId,
sagaId: result.sagaId,
status: 'confirmed',
});
}
return res.status(422).json({
error: 'Order could not be completed',
reason: result.error,
sagaId: result.sagaId,
compensated: true,
});
});
// Check saga status (useful for debugging and client polling)
router.get('/orders/saga/:sagaId', async (req, res) => {
const raw = await redis.get(`saga:${req.params.sagaId}`);
if (!raw) return res.status(404).json({ error: 'Saga not found' });
res.json(JSON.parse(raw));
});
export default router;
Handling Stuck Compensations
One of the most dangerous scenarios in Saga patterns is a failed compensation — your payment was charged, the inventory step failed, and now your refund call is also failing. You have inconsistent state.
The solution: a compensation recovery queue.
// workers/compensationRecovery.js
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
async function processStuckCompensations() {
while (true) {
// Block-pop with 30-second timeout
const item = await redis.blpop('saga:stuck_compensations', 30);
if (!item) continue;
const { sagaId, step, context } = JSON.parse(item[1]);
console.log(`[Recovery] Retrying compensation for saga ${sagaId}, step ${step}`);
// Retry with exponential backoff
let attempt = 0;
while (attempt < 5) {
try {
await step.compensate(context); // Retry
console.log(`[Recovery] Successfully compensated ${step} for saga ${sagaId}`);
break;
} catch (err) {
attempt++;
const delay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s, 16s, 32s
await new Promise((r) => setTimeout(r, delay));
}
}
if (attempt === 5) {
// Escalate to human: PagerDuty, Slack alert, etc.
console.error(`[Recovery] MANUAL INTERVENTION REQUIRED for saga ${sagaId}`);
await redis.lpush('saga:requires_manual_review', item[1]);
}
}
}
processStuckCompensations();
Choreography Saga: Event-Driven Alternative
For simpler workflows or when you want maximum decoupling, the choreography approach uses events:
// Using Node.js EventEmitter for in-process (use Redis pub/sub for multi-process)
import { EventEmitter } from 'events';
const sagaBus = new EventEmitter();
// Payment Service listener
sagaBus.on('ORDER_CREATED', async ({ orderId, userId, amount }) => {
try {
await chargeUser(userId, amount);
sagaBus.emit('PAYMENT_PROCESSED', { orderId, userId });
} catch {
sagaBus.emit('PAYMENT_FAILED', { orderId, userId, amount });
}
});
// Inventory Service listener
sagaBus.on('PAYMENT_PROCESSED', async ({ orderId }) => {
try {
await reserveStock(orderId);
sagaBus.emit('INVENTORY_RESERVED', { orderId });
} catch {
sagaBus.emit('INVENTORY_FAILED', { orderId });
sagaBus.emit('PAYMENT_REVERSE', { orderId }); // Trigger compensation
}
});
// Compensation listener
sagaBus.on('PAYMENT_REVERSE', async ({ orderId }) => {
console.log(`Refunding payment for order ${orderId}`);
await issueRefund(orderId);
});
// Trigger the saga
sagaBus.emit('ORDER_CREATED', {
orderId: 'ord_123',
userId: 'user_456',
amount: 49.99,
});
For production, replace EventEmitter with Redis Pub/Sub, BullMQ, or Kafka to persist events across process restarts.
Choreography vs. Orchestration: When to Use Each
| Scenario | Use |
|---|---|
| Complex multi-step workflow (5+ steps) | Orchestration |
| Need full audit trail / debugging | Orchestration |
| Loosely coupled, independent services | Choreography |
| Simple 2–3 step flow | Choreography |
| Workflows that may need to pause/resume | Orchestration |
| Temporal-style durable workflows | Orchestration |
Production Checklist for Sagas in 2026
1. Make all steps idempotent. If the orchestrator retries a step after a crash, running it twice should be safe. Use the orderId or sagaId as an idempotency key in downstream services.
2. Persist saga state before each step. Don't rely on in-memory state. Our Redis-backed approach handles server restarts and crashes.
3. Monitor saga duration. Long-running sagas should alert if they exceed expected completion time:
// Alert if saga takes > 30 seconds
if (Date.now() - new Date(state.startedAt).getTime() > 30_000) {
await alertOncall(`Slow saga: ${this.sagaId}`);
}
4. Build a saga dashboard. Query your Redis keys with SCAN saga:* and expose a /admin/sagas endpoint to see in-flight and stuck sagas.
5. Test compensation paths explicitly. Most teams only test the happy path. Write tests that force failures at each step and verify compensations run correctly.
// Example with vitest / jest
it('should refund payment if inventory fails', async () => {
mockPaymentService.charge.mockResolvedValueOnce({ chargeId: 'ch_123' });
mockInventoryService.reserve.mockRejectedValueOnce(new Error('Out of stock'));
mockPaymentService.refund.mockResolvedValueOnce({});
const result = await orchestrator.execute(testContext);
expect(result.success).toBe(false);
expect(mockPaymentService.refund).toHaveBeenCalledWith('ch_123');
});
Conclusion
The Saga pattern solves one of the hardest problems in microservices: keeping data consistent across service boundaries without distributed locks. In 2026, with APIs increasingly split across serverless functions, edge workers, and independent services, Sagas are an essential tool.
Key takeaways:
- Orchestration is easier to debug and trace; use it for complex workflows
- Choreography offers better decoupling; use it for simple, independent steps
- Always implement compensation logic — assume failures will happen
- Persist saga state to Redis so you can resume after crashes
- Handle stuck compensations explicitly with a recovery queue
If you're building APIs that need to be reliable under failure conditions, consider publishing them on RapidAPI so consumers can integrate them with confidence into their own distributed systems.
Top comments (0)