mahmoud hossam

Posted on Aug 6, 2023

Transaction Isolation Level in PostgreSQL

#postgres #postgressql #apacheage

Transaction isolation levels in PostgreSQL play a crucial role in ensuring data consistency, integrity, and concurrency control within a multi-user database environment. When multiple transactions are executing concurrently, the isolation levels define the degree to which their interactions are isolated from each other. PostgreSQL provides a range of isolation levels, each with its own set of guarantees and trade-offs, allowing developers to tailor their applications' behavior according to specific requirements.

Isolation levels dictate how transactions observe the changes made by other transactions and how they protect data from concurrent modifications. They prevent common issues such as dirty reads (reading uncommitted data), non-repeatable reads (reading data that has changed between reads), and phantom reads (seeing new rows inserted by other transactions).

Before delving into the topic, let's begin by providing an in-depth exploration of the intricacies surrounding these issues.

1.Dirty read:
A dirty read is a phenomenon that occurs in database systems when one transaction reads data that has been modified by another transaction, but the modifying transaction has not yet been committed. This can lead to inconsistencies and incorrect results, as the reading transaction might base its actions on data that is not guaranteed to be accurate or permanent.

Here's an example scenario involving two transactions and a simple table to illustrate the concept of a dirty read:
Consider a database table named BankAccounts with the following structure:

| AccountNumber | AccountHolder | Balance |
|---------------|---------------|---------|
| 1001          | Alice         | $100    |
| 1002          | Bob           | $150    |

Now, let's explore the concept of a dirty read using two transactions:

Transaction 1 (T1):
1.Transaction 1 begins.
2.T1 reads the current balance of account number 1001 (Alice's account) as $100.
Transaction 2 (T2):
1.Transaction 2 begins.
2.T2 updates the balance of account number 1001 to $200 (increasing Alice's balance).
3.T2 is not yet committed.
Back to Transaction 1 (T1):
3.T1 reads the updated balance of account number 1001 as $200 (a dirty read).
4.T1 performs a calculation based on the read balance and deducts $50 from it.
Transaction 2 (T2):
4.T2 commits the update, making the new balance of account number 1001 officially $200.

In this example, Transaction 1 (T1) performed a dirty read. It read the balance of Alice's account before Transaction 2 (T2) had committed its changes. As a result, T1 made a calculation based on the dirty read, deducting $50 from an inaccurate balance of $200. Once T2 committed its changes, the correct balance became $200, but T1's calculation had already been based on the incorrect, dirty read value.

2.Non-repeatable read:
A non-repeatable read is a phenomenon that occurs in database systems when one transaction reads a data value, and while the transaction is still active, another transaction modifies or updates that value and commits the change. When the first transaction attempts to read the same value again, it encounters a different value than what it initially read. This inconsistency can lead to unexpected behavior and data integrity issues.

To illustrate the concept of a non-repeatable read, let's consider an example involving two transactions and a simple table:

Table: Students

| StudentID | Name   | Age |
|-----------|--------|-----|
| 1         | Alice  | 20  |
| 2         | Bob    | 22  |

Transaction 1 (T1):
1.Transaction 1 begins.
2.T1 reads the age of student with ID 1 (Alice) as 20.
Transaction 2 (T2):
1.Transaction 2 begins.
2.T2 updates the age of student with ID 1 (Alice) to 21.
3.T2 commits the update.
Back to Transaction 1 (T1):
3.T1 reads the age of student with ID 1 (Alice) again and gets the updated value of 21.

In this example, the non-repeatable read phenomenon occurs in Transaction 1 (T1). When T1 initially read the age of Alice (student with ID 1), it obtained a value of 20. However, while T1 was still active, Transaction 2 (T2) modified Alice's age to 21 and committed the change. When T1 attempted to read Alice's age again, it encountered the updated value of 21 instead of the original 20.

Phantom read
A phantom read is a phenomenon in database systems where a transaction retrieves a set of rows based on a certain condition, but while the transaction is still active, another transaction inserts, updates, or deletes rows that match that condition. As a result, when the first transaction re-executes the same query, it observes a different set of rows, as if new rows had "appeared" or "vanished" between its two executions.

Let's illustrate the concept of a phantom read with an example involving two transactions and a simple table:
Table: Products

| ProductID | ProductName  | Price |
|-----------|--------------|-------|
| 1         | Laptop       | $800  |
| 2         | Smartphone   | $400  |

Transaction 1 (T1):
1.Transaction 1 begins.
2.T1 executes a query to retrieve all products with a price less than $500. It gets back one row (Smartphone).
Transaction 2 (T2):
1.Transaction 2 begins.
2.T2 inserts a new product with ProductID 3, ProductName "Tablet", and Price $350.
3.T2 commits the insertion.
Back to Transaction 1 (T1):
3.T1 re-executes the same query to retrieve products with a price less than $500. However, this time it gets back two rows (Smartphone and Tablet), even though no rows were inserted or updated in the products table between T1's two executions.

In this example, the phantom read phenomenon occurs when Transaction 1 (T1) encounters a different set of rows in its second execution of the same query. This is because Transaction 2 (T2) inserted a new row that matched T1's query condition ("Tablet" with a price of $350) between T1's two executions. As a result, T1 observes a "phantom" row that seemed to appear out of nowhere.

Now, it's time to explore the solutions to address these challenges.

Read Committed Isolation Level:

The Read Committed isolation level is a standard level of data isolation in database systems, including PostgreSQL. In this isolation level, each transaction can only read data that has been committed by other transactions. This ensures that any data read by a transaction reflects a consistent and stable state of the database. The Read Committed level aims to strike a balance between data consistency and concurrency, allowing multiple transactions to operate concurrently while preventing certain anomalies.

Solving Dirty Reads:

One of the primary goals of the Read Committed isolation level is to prevent dirty reads. A dirty read occurs when a transaction reads uncommitted changes made by another transaction. By enforcing that transactions can only read committed data, the Read Committed isolation level effectively eliminates the possibility of dirty reads.

the Read Committed isolation level serves as a valuable solution to the issue of dirty reads in a database system. By ensuring that transactions only access data that has been committed by other transactions, it eliminates the risk of reading unconfirmed and potentially inconsistent changes. However, it's important to note that Read Committed does not provide a comprehensive remedy for other anomalies such as non-repeatable reads or phantom reads.

While dirty reads are effectively mitigated by Read Committed, non-repeatable reads and phantom reads may still occur due to concurrent modifications made by other transactions. For scenarios requiring stronger data consistency and stricter control over concurrency, consideration should be given to higher isolation levels like "Repeatable Read" or "Serializable." The choice of isolation level should be based on a thorough understanding of your application's specific requirements and the extent of protection needed against different types of data access anomalies.

Repeatable Read Isolation Level:
The Repeatable Read isolation level in PostgreSQL offers a higher degree of data consistency and concurrency control compared to "Read Committed." It provides a solution not only for dirty reads but also effectively addresses the issue of non-repeatable reads. However, it does not provide complete prevention against phantom reads.

Solving Non-Repeatable Reads:

In the Repeatable Read level, non-repeatable reads are solved through the use of consistent snapshots. Once a transaction reads a value, it continues to see that same value throughout its entire duration, regardless of concurrent modifications made by other transactions. This ensures that any subsequent reads within the same transaction provide consistent results and prevent non-repeatable read anomalies.

The Repeatable Read isolation level does not fully prevent phantom reads.

Serializable isolation level
The Serializable isolation level in PostgreSQL offers the highest level of data consistency and concurrency control. It ensures that transactions execute as if they were running sequentially, preventing dirty reads, non-repeatable reads, and phantom reads.

When a transaction operates at the Serializable level, it obtains strict locks on the data it accesses, ensuring that no other transactions can modify or insert data that would affect its query results. This prevents anomalies by isolating transactions completely from each other. Specifically, to solve phantom reads, Serializabl" employs a technique known as predicate locking. This mechanism places locks on rows that match a transaction's query conditions, effectively blocking other transactions from inserting or modifying rows that would impact the query's outcome. By ensuring that a transaction's snapshot of data remains stable and consistent throughout its execution, Serializable eliminates the possibility of encountering phantom rows, providing a robust solution for applications that require unyielding data accuracy and reliability, even though it may introduce some additional overhead due to strict locking.

REF:
PostgreSQL Documentation

DEV Community

Transaction Isolation Level in PostgreSQL

Top comments (0)

Read next

Aurora Limitless - Creation

Ensuring Data Integrity in Financial Transactions: The PostgreSQL Transaction Solution

Aurora Limitless - Single and Multi-Shard Scan

Cosine Similarity Search on Vectors in Postgres with pgvector