The saga pattern looks straightforward in diagrams. It becomes genuinely complex the moment you operate it in production.
The central question — orchestration or choreography — carries consequences that ripple through your codebase, your operational posture, and your team's cognitive load for years.
This is not a "use orchestration for complex sagas, choreography for simple ones" post. The real trade-offs are more specific.
The Baseline: What Both Approaches Must Solve
Before choosing an approach, every saga implementation must handle:
- Atomicity at step boundaries — commit the database write and publish the event in the same transaction (transactional outbox or CDC)
- Idempotent consumers — at-least-once delivery means your steps will be invoked more than once
- Compensation correctness — compensating transactions are not rollbacks; they undo changes in a world that has moved on
- Observability — correlation IDs, structured logging, and queryable saga state
These are table stakes, not optional concerns.
Orchestration: Central Control
A dedicated orchestrator drives the saga — it knows the sequence, issues commands, waits for responses, and drives compensation.
Shines when:
- Workflows have complex conditional branching
- Long-running sagas involve human steps or wait states
- Operational visibility and debugging matter most
- Compensation must be guaranteed and sequenced
Breaks down when:
- The orchestrator becomes a throughput bottleneck
- Tight temporal coupling conflicts with event-driven decoupling goals
- Business logic gravitates into the orchestrator (god-object risk)
Choreography: Decentralized Reactions
No central coordinator. Each service listens for events, performs its local transaction, and publishes events that others react to. The saga is an emergent property.
Shines when:
- Services are genuinely independent
- Throughput is high and latency requirements are strict
- The workflow is stable and simple
- Independent deployability is valued over centralized visibility
Breaks down when:
- Saga state is implicit and debugging requires forensic log analysis
- Business logic is distributed across every participating service
- Compensation failures go undetected — no component knows a step was missed
Failure Modes That Catch Teams Off Guard
Lost compensation events — a compensating transaction fails, lands in a DLQ, and the system stays inconsistent until someone investigates.
Pivot transaction ambiguity — misidentifying the point of no return leads to compensating steps that cannot actually be reversed.
Saga timeouts and orphaned state — sagas that time out without completing compensation leave the system in a partially-applied state.
Event schema evolution — a schema change breaks consumers silently, causing sagas to process with incorrect data.
Making the Decision
Most large systems use both — choreography for high-throughput, loosely-coupled flows; orchestration for complex, stateful, business-critical workflows.
The key insight: neither approach eliminates the need for idempotent consumers, transactional outboxes, schema governance, DLQ monitoring, or explicit compensation design. The approach determines where control and visibility live — not whether your system is correct.
Get the baseline right. Then choose the approach that fits your operational context — not the one that looked better in the last conference talk you attended.
Read the Full Article
This is a summary of my deep dive into saga patterns. The full article covers orchestration and choreography in detail with production failure scenarios, compensation strategies, and decision frameworks:
👉 Saga Orchestration vs. Choreography — Full Article
The full article includes:
- Detailed comparison of both approaches with subsections on how they work, where they shine, and where they break down
- Four critical failure modes that affect both approaches
- Practical decision heuristics for choosing the right approach
- Baseline requirements every saga implementation must handle
Top comments (0)