DEV Community

Cover image for Distributed Transactions (2PC, Saga) in System Design
CodeWithDhanian
CodeWithDhanian

Posted on

Distributed Transactions (2PC, Saga) in System Design

In the complex landscape of modern distributed systems, maintaining data consistency across multiple independent services and databases presents one of the most challenging problems in system design. Distributed transactions provide the foundation for ensuring that operations spanning several resources either succeed completely or fail entirely, preserving the ACID properties of atomicity, consistency, isolation, and durability. This article explores two primary approaches to handling distributed transactions: the Two-Phase Commit protocol, commonly known as 2PC, and the Saga pattern. Each method addresses the coordination of long-running transactions in environments where traditional single-database transactions fall short.

What Are Distributed Transactions

A distributed transaction involves multiple participating resources, such as separate databases, microservices, or external systems, that must coordinate to achieve a unified outcome. Unlike local transactions confined to a single resource, distributed transactions must manage cross-service consistency while dealing with network latency, partial failures, and independent scaling of components.

The core requirement remains the same as in monolithic systems: the entire operation must appear atomic to the end user. If any part fails, all changes must be undone. However, achieving this in a distributed environment introduces significant complexity because each participant operates autonomously, and communication occurs over unreliable networks. System designers must therefore select protocols that balance strong consistency with availability and performance.

The Two-Phase Commit Protocol (2PC)

The Two-Phase Commit protocol, or 2PC, stands as the classic solution for achieving strong consistency in distributed transactions. Introduced in the 1970s, 2PC relies on a central coordinator and multiple participants to ensure all-or-nothing semantics across heterogeneous resources.

Core Components of 2PC

  • Coordinator: The central authority responsible for driving the transaction. It receives the initial transaction request and manages the voting and decision process.
  • Participants: The individual resources (databases or services) that perform local work and respond to the coordinator's instructions.
  • Transaction Manager: Often implemented using standards such as XA (eXtended Architecture) for database interactions.

Phases of the 2PC Protocol

2PC operates in two distinct phases, ensuring safety before any permanent changes occur.

Prepare Phase (Voting Phase):

The coordinator sends a prepare message to all participants. Each participant performs the necessary local operations, acquires locks, writes changes to a durable log, and responds with either ready (vote yes) or abort (vote no). If any participant votes no or fails to respond, the coordinator decides to abort.

Commit Phase (Decision Phase):

If all participants vote ready, the coordinator logs the global commit decision and sends commit messages to every participant. Each participant then applies the changes permanently and releases locks. If the decision is to abort, the coordinator sends rollback messages, and participants undo their local changes using the prepared log entries.

Pseudocode Implementation of 2PC Coordinator

class TwoPhaseCommitCoordinator {
    List<Participant> participants;
    TransactionLog log;

    void beginTransaction(Transaction tx) {
        log.write("BEGIN_TX", tx.id);
        boolean allReady = true;

        // Prepare Phase
        for each participant in participants {
            Response response = participant.prepare(tx);
            if (!response.isReady()) {
                allReady = false;
                break;
            }
        }

        // Decision
        if (allReady) {
            log.write("GLOBAL_COMMIT", tx.id);
            for each participant in participants {
                participant.commit(tx);
            }
        } else {
            log.write("GLOBAL_ABORT", tx.id);
            for each participant in participants {
                participant.rollback(tx);
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Pseudocode for a 2PC Participant

class DatabaseParticipant implements Participant {
    LocalDatabase db;
    UndoLog undoLog;

    Response prepare(Transaction tx) {
        try {
            db.acquireLocks(tx.operations);
            db.executeOperations(tx.operations);  // tentative changes
            undoLog.recordUndoInfo(tx);
            return new Response(true, "READY");
        } catch (Exception e) {
            return new Response(false, "ABORT");
        }
    }

    void commit(Transaction tx) {
        db.makeChangesPermanent(tx);
        db.releaseLocks(tx);
        undoLog.clear(tx);
    }

    void rollback(Transaction tx) {
        db.applyUndo(undoLog.getUndoInfo(tx));
        db.releaseLocks(tx);
        undoLog.clear(tx);
    }
}
Enter fullscreen mode Exit fullscreen mode

These code structures illustrate the blocking nature of 2PC: participants hold locks from the prepare phase until the final decision arrives. The coordinator must persist its decision durably before proceeding, ensuring recoverability after crashes.

Limitations of 2PC

While 2PC guarantees strong consistency, it suffers from several critical drawbacks. The coordinator becomes a single point of failure and a performance bottleneck. The protocol is blocking—if the coordinator fails after the prepare phase, participants remain locked indefinitely until recovery. Network partitions can cause prolonged unavailability. In high-throughput microservices environments, these issues make 2PC impractical for long-lived operations.

The Saga Pattern

The Saga pattern offers a fundamentally different approach to distributed transactions by embracing eventual consistency instead of immediate strong consistency. Originally described in the 1980s for handling long-lived transactions, a Saga decomposes a large distributed transaction into a sequence of smaller, local transactions. Each local transaction has an associated compensating transaction that undoes its effects if later steps fail.

Key Principles of Saga

  • Local Transactions: Each service executes its part independently and commits immediately.
  • Compensating Transactions: Reversible operations that restore the system to a consistent state without global rollback.
  • No Global Lock: Resources remain available throughout the process.
  • Eventual Consistency: The system converges to a consistent state over time rather than instantly.

Two Implementation Styles of Saga

Choreography-Based Saga:

Services communicate directly through events. Each service listens for events from previous steps and publishes its own events upon completion or failure. No central controller exists. This style promotes loose coupling but can become difficult to trace as the number of services grows.

Orchestration-Based Saga:

A central Saga Orchestrator coordinates the flow by sending commands to services and reacting to their responses or events. The orchestrator maintains the overall state and decides the next step or triggers compensation. This approach provides clearer visibility into the transaction flow and simplifies error handling.

Complete Orchestration-Based Saga Example: E-commerce Order Processing

Consider an online store where placing an order involves three services: Order Service, Payment Service, and Inventory Service. The Saga ensures that if payment fails, inventory is not deducted, or if inventory is unavailable, payment is refunded.

Saga Orchestrator Pseudocode (Full Structure)

class OrderSagaOrchestrator {
    OrderService orderService;
    PaymentService paymentService;
    InventoryService inventoryService;
    SagaStateRepository stateRepo;

    void startOrderSaga(OrderRequest request) {
        SagaInstance saga = new SagaInstance(request.orderId);
        stateRepo.save(saga);

        // Step 1: Create Order (local transaction)
        Order order = orderService.createOrder(request);
        saga.updateStep("ORDER_CREATED", order);

        try {
            // Step 2: Process Payment
            Payment payment = paymentService.processPayment(order);
            saga.updateStep("PAYMENT_SUCCESS", payment);

            // Step 3: Reserve Inventory
            InventoryReservation reservation = inventoryService.reserveInventory(order);
            saga.updateStep("INVENTORY_RESERVED", reservation);

            saga.complete();
            return;

        } catch (PaymentFailedException e) {
            // Compensation: Cancel Order
            orderService.cancelOrder(order);
            saga.fail("PAYMENT_FAILED");
        } catch (InventoryUnavailableException e) {
            // Compensation Chain
            paymentService.refundPayment(payment);
            orderService.cancelOrder(order);
            saga.fail("INVENTORY_FAILED");
        }
    }

    // Compensating transaction examples
    void compensatePayment(Payment payment) {
        paymentService.refundPayment(payment);  // idempotent refund
    }

    void compensateOrder(Order order) {
        orderService.cancelOrder(order);  // releases any reservations
    }
}
Enter fullscreen mode Exit fullscreen mode

Service-Level Local Transaction Example (Inventory Service)

class InventoryService {
    InventoryRepository repo;

    InventoryReservation reserveInventory(Order order) {
        // Local transaction - fully committed immediately
        return repo.withinTransaction(() -> {
            Stock stock = repo.findStock(order.productId);
            if (stock.quantity < order.quantity) {
                throw new InventoryUnavailableException();
            }
            stock.quantity -= order.quantity;
            repo.save(stock);
            return new InventoryReservation(order.orderId, order.quantity);
        });
    }

    // Compensating transaction - public and idempotent
    void releaseInventory(InventoryReservation reservation) {
        repo.withinTransaction(() -> {
            Stock stock = repo.findStock(reservation.productId);
            stock.quantity += reservation.quantity;
            repo.save(stock);
        });
    }
}
Enter fullscreen mode Exit fullscreen mode

This full code structure demonstrates how the orchestrator drives the Saga while each service remains responsible only for its local ACID transaction and its compensating transaction. Idempotency keys should be included in every command and compensation to handle retries safely after network failures.

Advantages of the Saga Pattern

Saga excels in microservices because it avoids long-held locks, improves availability, and scales horizontally. Failures trigger targeted compensations rather than global aborts. The pattern naturally fits event-driven architectures and works seamlessly with message queues such as Kafka or RabbitMQ for reliable event delivery.

Choosing Between 2PC and Saga

2PC suits scenarios demanding immediate strong consistency, such as financial systems where partial states are unacceptable. Saga fits better for business processes that tolerate temporary inconsistencies, prioritize high availability, and involve long-running workflows across many services. In practice, many system designs combine both: 2PC for critical synchronous steps within a bounded context and Saga for cross-context orchestration.

The Saga pattern, supported by modern frameworks, has become the de facto standard for distributed transactions in cloud-native microservices due to its resilience and performance characteristics. Proper implementation requires careful design of compensating transactions, idempotency, and comprehensive monitoring of Saga instances to detect and resolve stuck workflows.

Distributed Transactions (2PC, Saga) in System Design remains a cornerstone topic that every professional system designer must master to build reliable, scalable, and maintainable large-scale applications.

Two-Phase Commit vs Saga pattern

System Design Handbook

For more in-depth insights and comprehensive coverage of system design topics, consider purchasing the System Design Handbook at https://codewithdhanian.gumroad.com/l/ntmcf. It will equip you with the knowledge to master complex distributed systems.

Buy me coffee to support my content at: https://ko-fi.com/codewithdhanian

Top comments (0)