Two-Phase Commit Demystified: When Distributed Transactions Are Unavoidable

#distributedsystems #go #backend #architecture

When a banking payment must update inventory in one database and capture funds in another, no single-database transaction can guarantee atomicity across both — that is exactly where two-phase commit becomes necessary.

What We're Building

We are implementing a core logic layer for a high-value e-commerce platform where data integrity is paramount. This system must reserve inventory in one database and capture payment in another simultaneously. If the inventory is reserved but payment fails, refunding becomes a complex operational nightmare involving manual reconciliation and customer dissatisfaction. We need atomicity across distinct database tables that do not share a transaction ID, forcing us to rely on a distributed transaction protocol. A two-phase commit (2PC) ensures that either the order is processed fully or rolled back completely. This architecture assumes a strong consistency requirement, such as financial ledgers or critical reservation slots, justifying the latency cost of synchronous coordination between distributed nodes.

Step 1 — Define the Transaction Coordinator

The coordinator acts as the conductor of the distributed orchestra, managing the state of the transaction across all participants. It holds the write lock for the global transaction ID (XID) and sends instructions to resources. In our design, the coordinator is an internal RPC service written in Go, utilizing context.Context for cancellation. This choice matters because Go's goroutine scheduling and cancellation tokens naturally model the synchronous wait for participant readiness without spawning heavy threads. We define the ResourceManager interface to abstract the underlying database drivers, ensuring loose coupling between the coordinator and specific storage engines.

type ResourceManager interface {
        Prepare(context.Context, string) (string, error)
        Commit(context.Context, string) error
        Rollback(context.Context, string) error
}

Step 2 — Request Prepared States

The coordinator sends a PREPARE message to each registered resource, asking if they can safely commit the changes requested. This phase involves the resources checking internal locks, network connectivity, and storage health. Resources must write their vote to stable storage before responding to ensure they can survive a crash between the vote and the final commit decision. This step effectively pauses the transaction until every participant confirms its ability to participate, which introduces latency but guarantees that no partial updates will be applied.

func (c *Coordinator) prepareAll(ctx context.Context, resources []ResourceManager) int {
        responses := make(chan *prepareResponse, len(resources))
        for _, res := range resources {
            go func(r ResourceManager) {
                status, err := r.Prepare(ctx, c.xid)
                responses <- &prepareResponse{Status: status, Err: err}
            }(res)
        }
        for range resources {
            resp := <-responses
            if resp.Err != nil || resp.Status == Failed {
                return -1
            }
        }
        return 0
}

Step 3 — Aggregate the Vote

Once responses arrive, the coordinator aggregates the votes to determine the final decision. If any resource returns a negative status, the coordinator must abort the transaction immediately to preserve data integrity. This step ensures that a single point of failure in a resource doesn't corrupt the global state, preventing the "dirty read" scenario where data is half-updated. The coordinator acts as a quorum collector, ensuring that a majority of votes match the consensus required before proceeding. It prevents the scenario where one node commits while another fails, which would lead to data inconsistency in a distributed environment.

func (c *Coordinator) decide(ctx context.Context, votes map[string]int) error {
        if len(votes) < 1 || votes["ABORT"] > 0 {
                return errors.New("one resource voted to abort")
        }
        return nil
}

Step 4 — Finalize the Outcome

After voting, the coordinator broadcasts a COMMIT or ROLLBACK instruction based on the aggregate result. In the commit case, resources apply their changes and release locks. The coordinator must persist this decision locally to an atomic log before notifying resources to prevent a state where a resource has committed but the coordinator lost its record, leading to an inconsistent global state upon restart. Conversely, if the vote was negative, the rollback is mandatory. This ensures no data is left in an intermediate state. The protocol requires two stable storage writes (vote and decision) and ensures that no resource moves forward unless the coordinator has a record of the vote.

Performance and Trade-offs

Two-phase commit introduces performance overhead. The coordinator must wait for all participants before proceeding, which creates a bottleneck. In high-throughput e-commerce, this might delay order confirmation. However, the alternative is data inconsistency, which is unacceptable for financial systems. Modern implementations optimize 2PC by using asynchronous networking or batching commits, but the synchronous nature remains. It's crucial to monitor the latency introduced by the preparation phase, as resources holding long-held locks can starve other transactions.

Conclusion

Implementing a two-phase commit protocol is a complex but necessary strategy for systems requiring distributed atomicity. While it introduces latency and complexity, it provides the reliability needed for critical financial or inventory management applications. By carefully coordinating resources and handling failure states, we ensure that the ledger remains consistent across distributed nodes.

Part of the Architecture Patterns series.

Top comments (1)

mote • Apr 12

Good explainer on a protocol that's often hand-waved over in distributed systems tutorials.

One thing worth highlighting that catches teams off guard in production: the coordinator failure window between Phase 1 and Phase 2 is where 2PC earns its "blocking protocol" reputation. If the coordinator crashes after receiving all PREPARED votes but before writing the commit decision to its log, participants are stuck holding locks indefinitely — they've promised to commit, but they don't know if the commit happened. This is the fundamental reason why 2PC is "blocking" and why Paxos/Raft-based alternatives like Google Spanner's TrueTime approach exist.

Your suggestion to use async networking and batch commits as optimizations is correct, but I'd add: write-ahead logging on the coordinator is non-negotiable, not optional. The decision log needs to survive coordinator restart, otherwise you get the indefinite lock scenario above.

For your Go implementation, one subtle thing: when the coordinator sends COMMIT to all participants, the participants should respond with ACK, and the coordinator should retry delivery until all ACKs are received (make-before-break pattern). If you fire-and-forget the commit messages, you can get partial commits under network partition.

The CockroachDB transaction model is a good reference for how this gets handled in practice at scale — they use a hybrid logical clock approach to avoid the coordinator bottleneck entirely in the common case.