In the field of database systems, concurrency control is a crucial aspect to ensure that multiple transactions can execute simultaneously without causing inconsistencies or data corruption. To manage this, databases use various isolation levels and techniques like locking mechanisms to protect data integrity. However, these mechanisms may introduce certain issues known as Dirty Read, Non-repeatable Read, and Phantom Read.
๐ญ. ๐๐ถ๐ฟ๐๐ ๐ฅ๐ฒ๐ฎ๐ฑ:
A dirty read occurs when a transaction reads uncommitted data from the database. In other words, it reads data that has been modified by another
transaction but not yet committed. This can lead to incorrect results being displayed to the user. To prevent this issue, databases generally use the Two-Phase Locking (2PL) protocol, which ensures that a transaction holds locks only for the duration of the transaction.
# Example using SQL and Python's psycopg2 library:
import psycopg2
conn = psycopg2.connect(database="your_db", user="your_user", password="your_password")
cursor = conn.cursor()
def transaction1():
    cursor.execute("UPDATE table SET column=column+1 WHERE condition")
    conn.commit()  # commits the changes
def transaction2():
    cursor.execute("SELECT * FROM table")  # reads uncommitted data
    print(cursor.fetchall())
๐ฎ. ๐ก๐ผ๐ป-๐ฟ๐ฒ๐ฝ๐ฒ๐ฎ๐๐ฎ๐ฏ๐น๐ฒ ๐ฅ๐ฒ๐ฎ๐ฑ:
A non-repeatable read occurs when a transaction re-reads data that has been modified by another concurrent transaction. This issue arises because databases do not guarantee that the data will remain unchanged between two reads of the same transaction, even if no other transactions are modifying it. To address this, 2PL is also used to prevent non-repeatable reads.
# Example using SQL and Python's psycopg2 library:
import psycopg2
conn = psycopg2.connect(database="your_db", user="your_user", password="your_password")
cursor = conn.cursor()
def transaction1():
    cursor.execute("SELECT * FROM table WHERE condition")  # reads data
    rows = cursor.fetchall()
    cursor.execute("UPDATE table SET column=column+1 WHERE condition")  # modifies data
    conn.commit()  # commits the changes
    cursor.execute(rows)  # reads data again, but results can be different due to non-repeatable read
๐ฏ. ๐ฃ๐ต๐ฎ๐ป๐๐ผ๐บ ๐ฅ๐ฒ๐ฎ๐ฑ:
A phantom read occurs when a transaction retrieves multiple rows from a query based on certain conditions, but later, when it re-executes the
same query with the same conditions, additional rows appear that were inserted by other concurrent transactions in the meantime. To address this
issue, databases use multiversion concurrency control (MVCC) and version store, which allow multiple versions of data to coexist until a commit
or rollback is performed.
# Example using SQL and Python's psycopg2 library:
import psycopg2
conn = psycopg2.connect(database="your_db", user="your_user", password="your_password")
cursor = conn.cursor()
def transaction1():
    cursor.execute("SELECT * FROM table WHERE condition")  # reads data
    rows = cursor.fetchall()
    cursor.execute("INSERT INTO table (column) VALUES ('some value')")  # inserts new row
    conn.commit()  # commits the changes
    cursor.execute("SELECT * FROM table WHERE condition")  # re-reads data, but results can be different due to phantom read
To avoid these issues, databases use various techniques like 2PL, MVCC, and version store, depending on the isolation level chosen for the transaction. By ensuring data integrity, these mechanisms prevent dirty reads, non-repeatable reads, and phantom reads, allowing transactions to run concurrently without causing inconsistencies or data corruption.
๐ฅ๐ฒ๐ฎ๐น ๐๐ถ๐ณ๐ฒ ๐๐ ๐ฎ๐บ๐ฝ๐น๐ฒ
Imagine you are withdrawing money from your bank account. You have two options for checking the balance: Option A and Option B.
Option A: Dirty Read, Non-repeatable Read
- You check the balance of your account, showing $10,000.
- Another transaction transfers $5,000 from your account to another account without committing yet (dirty read).
- You again check the balance of your account and see $5,000.
- The transfer transaction gets rolled back due to an error.
- If you try to re-check the balance using Option A, it will show $10,000, but it should ideally be $5,000 (non-repeatable read).
Option B: Phantom Read
- You check all transactions in your account from last month, showing a total of $10,000.
- Another transaction makes a deposit of $2,000 into your account without committing yet.
- You try to re-check the transactions using Option B, but it will now show an additional $2,000 deposit (phantom read).
- If you try to re-execute the query, it will show both the original transactions and the new $2,000 deposit (phantom read persists).
In this example, Option A demonstrates dirty reads and non-repeatable reads, while Option B illustrates a phantom read. Databases employ
isolation levels, locking mechanisms, or multiversion concurrency control to prevent these issues and maintain data integrity during concurrent
transactions.
Chapter on Concurrency Controls : https://www.cs.uct.ac.za/mit_notes/database/pdfs/chp13.pdf
 
 
              
 
    
Top comments (0)