Saga Orchestration vs. Choreography: Making the Right Trade-off in Event-Driven Systems

#distributedsystems #architecture #eventdriven #microservices

The saga pattern looks straightforward in diagrams. It becomes genuinely complex the moment you operate it in production.

The central question — orchestration or choreography — carries consequences that ripple through your codebase, your operational posture, and your team's cognitive load for years.

This is not a "use orchestration for complex sagas, choreography for simple ones" post. The real trade-offs are more specific.

The Baseline: What Both Approaches Must Solve

Before choosing an approach, every saga implementation must handle:

Atomicity at step boundaries — commit the database write and publish the event in the same transaction (transactional outbox or CDC)
Idempotent consumers — at-least-once delivery means your steps will be invoked more than once
Compensation correctness — compensating transactions are not rollbacks; they undo changes in a world that has moved on
Observability — correlation IDs, structured logging, and queryable saga state

These are table stakes, not optional concerns.

Orchestration: Central Control

A dedicated orchestrator drives the saga — it knows the sequence, issues commands, waits for responses, and drives compensation.

Shines when:

Workflows have complex conditional branching
Long-running sagas involve human steps or wait states
Operational visibility and debugging matter most
Compensation must be guaranteed and sequenced

Breaks down when:

The orchestrator becomes a throughput bottleneck
Tight temporal coupling conflicts with event-driven decoupling goals
Business logic gravitates into the orchestrator (god-object risk)

Choreography: Decentralized Reactions

No central coordinator. Each service listens for events, performs its local transaction, and publishes events that others react to. The saga is an emergent property.

Shines when:

Services are genuinely independent
Throughput is high and latency requirements are strict
The workflow is stable and simple
Independent deployability is valued over centralized visibility

Breaks down when:

Saga state is implicit and debugging requires forensic log analysis
Business logic is distributed across every participating service
Compensation failures go undetected — no component knows a step was missed

Failure Modes That Catch Teams Off Guard

Lost compensation events — a compensating transaction fails, lands in a DLQ, and the system stays inconsistent until someone investigates.

Pivot transaction ambiguity — misidentifying the point of no return leads to compensating steps that cannot actually be reversed.

Saga timeouts and orphaned state — sagas that time out without completing compensation leave the system in a partially-applied state.

Event schema evolution — a schema change breaks consumers silently, causing sagas to process with incorrect data.

Making the Decision

Most large systems use both — choreography for high-throughput, loosely-coupled flows; orchestration for complex, stateful, business-critical workflows.

The key insight: neither approach eliminates the need for idempotent consumers, transactional outboxes, schema governance, DLQ monitoring, or explicit compensation design. The approach determines where control and visibility live — not whether your system is correct.

Get the baseline right. Then choose the approach that fits your operational context — not the one that looked better in the last conference talk you attended.

Read the Full Article

This is a summary of my deep dive into saga patterns. The full article covers orchestration and choreography in detail with production failure scenarios, compensation strategies, and decision frameworks:

👉 Saga Orchestration vs. Choreography — Full Article

The full article includes:

Detailed comparison of both approaches with subsections on how they work, where they shine, and where they break down
Four critical failure modes that affect both approaches
Practical decision heuristics for choosing the right approach
Baseline requirements every saga implementation must handle