Karan Pratap Singh

Posted on Sep 10, 2022 • Originally published at github.com

System Design: Distributed Transactions

#distributedsystems #architecture #tutorial

A distributed transaction is a set of operations on data that is performed across two or more databases. It is typically coordinated across separate nodes connected by a network, but may also span multiple databases on a single server.

Why do we need distributed transactions?

Unlike an ACID transaction on a single database, a distributed transaction involves altering data on multiple databases. Consequently, distributed transaction processing is more complicated, because the database must coordinate the committing or rollback of the changes in a transaction as a self-contained unit.

In other words, all the nodes must commit, or all must abort and the entire transaction rolls back. This is why we need distributed transactions.

Now, let's look at some popular solutions for distributed transactions:

Two-Phase commit

The two-phase commit (2PC) protocol is a distributed algorithm that coordinates all the processes that participate in a distributed transaction on whether to commit or abort (roll back) the transaction.

This protocol achieves its goal even in many cases of temporary system failure and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome.

This protocol requires a coordinator node, which basically coordinates and oversees the transaction across different nodes. The coordinator tries to establish the consensus among a set of processes in two phases, hence the name.

Phases

Two-phase commit consists of the following phases:

Prepare phase

The prepare phase involves the coordinator node collecting consensus from each of the participant nodes. The transaction will be aborted unless each of the nodes responds that they're prepared.

Commit phase

If all participants respond to the coordinator that they are prepared, then the coordinator asks all the nodes to commit the transaction. If a failure occurs, the transaction will be rolled back.

Problems

Following problems may arise in the two-phase commit protocol:

What if one of the nodes crashes?
What if the coordinator itself crashes?
It is a blocking protocol.

Three-phase commit

Three-phase commit (3PC) is an extension of the two-phase commit where the commit phase is split into two phases. This helps with the blocking problem that occurs in the two-phase commit protocol.

Phases

Three-phase commit consists of the following phases:

Prepare phase

This phase is the same as the two-phase commit.

Pre-commit phase

Coordinator issues the pre-commit message and all the participating nodes must acknowledge it. If a participant fails to receive this message in time, then the transaction is aborted.

Commit phase

This step is also similar to the two-phase commit protocol.

Why is the Pre-commit phase helpful?

The pre-commit phase accomplishes the following:

If the participant nodes are found in this phase, that means that every participant has completed the first phase. The completion of prepare phase is guaranteed.
Every phase can now time out and avoid indefinite waits.

Sagas

A saga is a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

Coordination

There are two common implementation approaches:

Choreography: Each local transaction publishes domain events that trigger local transactions in other services.
Orchestration: An orchestrator tells the participants what local transactions to execute.

Problems

The Saga pattern is particularly hard to debug.
There's a risk of cyclic dependency between saga participants.
Lack of participant data isolation imposes durability challenges.
Testing is difficult because all services must be running to simulate a transaction.

This article is part of my open source System Design Course available on Github.

karanpratapsingh / system-design

Learn how to design systems at scale and prepare for system design interviews

System Design

Hey, welcome to the course. I hope this course provides a great learning experience.

This course is also available on my website and as an ebook on leanpub. Please leave a ⭐ as motivation if this was helpful!

Getting Started
- What is system design?
Chapter I
Chapter II
Chapter III
Chapter IV
- …

View on GitHub

DEV Community

System Design: Distributed Transactions

Why do we need distributed transactions?

Two-Phase commit

Phases

Problems

Three-phase commit

Phases

Why is the Pre-commit phase helpful?

Sagas

Coordination

Problems

karanpratapsingh / system-design

Learn how to design systems at scale and prepare for system design interviews

System Design

Table of contents

Top comments (0)

Read next

Provide storage for the IT department testing and training

Exploratory Testing: A Detailed Guide

AWS Architectural Diagrams on a Commit Base: Using AWS PDK Diagram Plugin with Python

Bridging the Gap: A Case Study on Synchronizing Shopify and Microsoft Dynamics GP