Introduction
In database systems, concurrent transactions can lead to issues that affect data consistency. These issues, known as read phenomena (dirty reads, non-repeatable reads, phantom reads) and serialization anomaly, occur when multiple transactions read and write data simultaneously. This document defines and explains these phenomena with examples to illustrate their impact in a generic database context.
Concurrency Phenomena
1. Dirty Reads
- Definition: A transaction reads data that has been modified by another transaction but not yet committed. If the modifying transaction rolls back, the read data becomes invalid.
-
Example:
- Transaction 1 updates a customer’s balance to 90 but hasn’t committed.
- Transaction 2 reads the balance as 90.
- Transaction 1 rolls back, restoring the balance to 100.
- Issue: Transaction 2 operates on incorrect data (90 instead of 100).
- Impact: Can lead to decisions based on temporary, uncommitted data, causing inconsistencies in applications like financial systems or inventory management.
2. Non-Repeatable Reads
- Definition: A transaction reads the same data twice and gets different results because another transaction modified and committed the data between the reads.
-
Example:
- Transaction 1 reads a product’s stock quantity as 50.
- Transaction 2 updates the stock to 40 and commits.
- Transaction 1 reads the stock again and sees 40.
- Issue: The same query within Transaction 1 yields inconsistent results.
- Impact: Inconsistent reads can cause errors in applications requiring stable data, such as order processing or reporting.
3. Phantom Reads
- Definition: A transaction re-executes a query to retrieve a set of rows and finds different rows (new or missing) because another transaction inserted or deleted rows and committed.
-
Example:
- Transaction 1 queries all orders for a customer, finding 3 orders.
- Transaction 2 inserts a new order for the same customer and commits.
- Transaction 1 re-runs the query and finds 4 orders.
- Issue: The result set changes due to “phantom” rows appearing.
- Impact: Can affect operations that rely on a consistent set of records, such as generating reports or auditing transactions.
4. Serialization Anomaly
-
Definition: A serialization anomaly occurs when the outcome of concurrently executing transactions cannot be explained by any serial (one-at-a-time) execution order of those transactions, even though each transaction is individually consistent. This is specific to the
SERIALIZABLE
isolation level. -
Example:
- Initial State: Three records (A: 100, B: 100, C: 100).
- Transaction 1 updates A to 90 and B to 110.
- Transaction 2 updates B to 90 and C to 110.
- Transaction 3 updates C to 90 and A to 110.
- Expected Outcome (serial execution): Balances remain A: 100, B: 100, C: 100 (net zero change).
- Anomaly: Concurrent execution might result in inconsistent balances (e.g., A: 90, B: 100, C: 110) because transactions interleave in a way that doesn’t match any serial order.
- Issue: The final state is inconsistent with expected business logic, despite each transaction being valid.
- Impact: Can lead to incorrect data states in applications where the combined effect of transactions must align with a specific logical outcome, such as financial ledgers.
SQL Transaction Isolation Levels
Transaction isolation levels, defined by the SQL standard, control how transactions interact with each other and manage concurrency issues like read phenomena and serialization anomalies. The four standard isolation levels, from least to most strict, are described below, including their impact on the phenomena discussed above.
1. READ UNCOMMITTED
- Definition: The least strict isolation level, allowing transactions to read uncommitted changes made by other transactions.
-
Behavior:
- Transactions can see data modified by other transactions before they commit.
- Offers the highest concurrency but the lowest consistency.
-
Impact on Phenomena:
- Dirty Reads: Possible, as uncommitted data is visible.
- Non-Repeatable Reads: Possible, as committed changes from other transactions can alter data between reads.
- Phantom Reads: Possible, as new rows inserted by other transactions can appear.
- Serialization Anomaly: Possible, as there’s no guarantee of serial execution.
-
Example:
- Transaction 1 updates a record but doesn’t commit.
- Transaction 2 reads the uncommitted data, which may be rolled back, leading to a dirty read.
- Use Case: Rarely used in practice due to its high risk of inconsistencies; suitable only for applications where data accuracy is not critical (e.g., approximate analytics).
2. READ COMMITTED
- Definition: Ensures that transactions only read data that has been committed, preventing dirty reads but allowing other concurrency issues.
-
Behavior:
- Each query within a transaction sees only committed data at the time of the query.
- Common default isolation level in many databases, including PostgreSQL.
-
Impact on Phenomena:
- Dirty Reads: Prevented, as only committed data is visible.
- Non-Repeatable Reads: Possible, as data can change between queries within the same transaction.
- Phantom Reads: Possible, as new rows can be inserted by other transactions.
- Serialization Anomaly: Possible, as there’s no guarantee of serial execution.
-
Example:
- Transaction 1 reads a record’s value as 50.
- Transaction 2 updates the value to 40 and commits.
- Transaction 1 reads the same record again and sees 40 (non-repeatable read).
- Use Case: Suitable for applications where dirty reads are unacceptable but some inconsistency (e.g., non-repeatable reads) is tolerable, such as simple reporting systems.
3. REPEATABLE READ
- Definition: Ensures that data read by a transaction remains consistent throughout the transaction, preventing non-repeatable reads but not phantom reads.
-
Behavior:
- Uses a snapshot of committed data at the start of the transaction for all reads, or locks read rows to prevent modifications.
- In PostgreSQL, this is implemented using multiversion concurrency control (MVCC).
-
Impact on Phenomena:
- Dirty Reads: Prevented, as only committed data is visible.
- Non-Repeatable Reads: Prevented, as the transaction sees a consistent snapshot of data.
- Phantom Reads: Possible, as new rows inserted by other transactions may appear in subsequent queries.
- Serialization Anomaly: Possible, as concurrent transactions can still produce non-serializable outcomes.
-
Example:
- Transaction 1 reads all records matching a condition (e.g., 3 orders).
- Transaction 2 inserts a new record matching the condition and commits.
- Transaction 1 re-queries and sees the new record (phantom read).
- Use Case: Suitable for applications requiring consistent reads within a transaction, such as financial calculations or inventory checks, but where phantom reads are acceptable.
4. SERIALIZABLE
- Definition: The strictest isolation level, ensuring that transactions execute as if they were run sequentially, preventing all concurrency anomalies.
-
Behavior:
- Guarantees that the outcome of concurrent transactions matches some serial execution order.
- In PostgreSQL, uses predicate locking and MVCC to detect conflicts, aborting transactions (error code
40001
) if a serialization anomaly is detected.
-
Impact on Phenomena:
- Dirty Reads: Prevented, as only committed data is visible.
- Non-Repeatable Reads: Prevented, as the transaction sees a consistent snapshot.
- Phantom Reads: Prevented, as conflicting inserts/deletes trigger a serialization failure.
- Serialization Anomaly: Prevented, as the database ensures a serializable outcome.
-
Example:
- Transaction 1 and Transaction 2 attempt conflicting updates that would lead to a serialization anomaly (e.g., cyclic updates to records).
- The database detects the conflict and aborts one transaction, requiring a retry.
- Use Case: Ideal for applications requiring strict data consistency, such as financial systems or critical business logic, though it may reduce concurrency due to potential transaction aborts.
Isolation Levels and Concurrency Phenomena Table
This table illustrates the relationship between SQL transaction isolation levels and the concurrency phenomena they allow or prevent.
Phenomenon | READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE |
---|---|---|---|---|
Dirty Read | Yes | No | No | No |
Non-Repeatable Read | Yes | Yes | No | No |
Phantom Read | Yes | Yes | Yes | No |
Serialization Anomaly | Yes | Yes | Yes | No |
Explanation
- Yes: The phenomenon can occur under the specified isolation level.
- No: The phenomenon is prevented under the specified isolation level.
-
Isolation Levels:
- READ UNCOMMITTED: Allows all phenomena due to minimal restrictions.
- READ COMMITTED: Prevents dirty reads but allows others.
- REPEATABLE READ: Prevents dirty and non-repeatable reads but allows phantom reads.
- SERIALIZABLE: Prevents all phenomena, ensuring full consistency.
Summary
- Dirty Reads: Reading uncommitted data (possible in READ UNCOMMITTED).
- Non-Repeatable Reads: Inconsistent results from repeated reads (possible in READ UNCOMMITTED and READ COMMITTED).
- Phantom Reads: Changes in query result sets due to inserts/deletes (possible in READ UNCOMMITTED, READ COMMITTED, and REPEATABLE READ).
- Serialization Anomaly: Non-serializable outcomes from concurrent transactions (possible in all but SERIALIZABLE).
-
Isolation Levels:
- READ UNCOMMITTED: Allows all phenomena; least consistent.
- READ COMMITTED: Prevents dirty reads; moderate consistency.
- REPEATABLE READ: Prevents dirty and non-repeatable reads; higher consistency.
- SERIALIZABLE: Prevents all phenomena; maximum consistency but may impact performance.
Understanding these phenomena and isolation levels is crucial for designing robust database applications, as they guide the choice of isolation level to balance consistency and performance based on application requirements.
Top comments (0)