DEV Community

Namsi Lydia
Namsi Lydia

Posted on

Concurrency Control Techniques in Postgresql: A Comprehensive Analysis

This article describes the behavior of the PostgreSQL database system when two or more sessions try to access the same data at the same time. The goals in that situation are to allow efficient access for all sessions while maintaining strict data integrity.

What is concurrency control
Concurrency control is the most essential and important aspect of any database system that ensures that transactions can be executed concurrently without leading to data inconsistencies.

Some of the concurrency techniques used by postgresql include:

  1. MVCC(Multi-version concurrency Control)
  2. Explicit Locking
  3. Transaction Isolation

Multi-version Concurrency Control (MVCC)
Unlike traditional database systems which use locks for concurrency control, PostgreSQL maintains data consistency by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session.Mvcc is good with read operations

The main advantage to using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.

Transaction Isolation
Transaction isolation is a concept in most database management systems that defines the extent to which the operations within a transaction are isolated from the effects of other concurrently executing transactions. In other words, it ensures that the intermediate states of a transaction are not visible to other transactions until the transaction is committed.

on the other hand isolation level is a property that can be set for a transaction, and it determines the degree to which a transaction is isolated from the effects of other concurrent transactions.

There are four levels of transaction and they include:

  • Read uncommitted
  • Read committed(used in postgresql)
  • Repeatable read
  • serializable(used in postgresql)

These four levels of transaction isolation can be interfered by three phenomena that must be prevented between concurrent transactions and these phenomena include:

dirty read
A transaction reads data written by a concurrent uncommitted transaction.

non repeatable read
A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).

phantom read
A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.

in postgresql one can request any of the four standard transaction isolation levels. But internally, there are only two distinct isolation levels, which correspond to the levels Read Committed and Serializable this is because when you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really get Serializable, so the actual isolation level may be stricter than what you select,thus the reason that PostgreSQL only provides two isolation levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency control architecture.

Read Committed Isolation Level
In this isolation level when a transaction runs on this isolation level, a SELECT query sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.

The Read Committed isolation level ensures that when you execute a query (e.g., SELECT, UPDATE, DELETE), it only sees a snapshot of the database as of the moment the query starts running. Even within a single transaction, two consecutive SELECT queries might retrieve different data if other transactions commit changes during the execution of the first SELECT.

For commands like UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE, they behave similarly to SELECT in terms of finding target rows. They locate rows committed as of the command's start time, but these rows may have been modified or deleted by other transactions by the time they are found. In such cases, the second transaction must wait for the first one to either commit or roll back. If the first transaction rolls back, the effects are undone, and the second transaction can proceed. If the first transaction commits, the second transaction will either ignore the row (if it was deleted) or apply its operation to the updated version of the row.

Due to this behavior, an updating command may perceive an inconsistent snapshot, showing the effects of concurrent updates on the same rows it's trying to update, but not on other rows in the database. This makes Read Committed mode less suitable for commands with complex search conditions but suitable for simpler cases.

In simpler terms, Read Committed ensures that when you interact with the database, you see a consistent view of the data at the moment your operation begins. However, it may not capture all changes happening concurrently in the database, especially for more complex queries.

Serializable Isolation Level
The Serializable isolation level is the strictest level for database transactions. It simulates the idea of executing transactions one after another, in a serial manner, rather than simultaneously. However, applications using this level must be ready to retry transactions due to potential serialization failures.

In Serializable mode, a SELECT query only sees data committed before the transaction began; it doesn't see uncommitted data or changes committed during the transaction's execution by other transactions. Successive SELECT commands within a single transaction always observe the same data, as they see a snapshot as of the start of the transaction.

Commands like UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE behave similarly to SELECT in terms of finding target rows. They locate rows committed as of the transaction start time, but these rows may have been updated, deleted, or locked by concurrent transactions. In such cases, the serializable transaction waits for the first updating transaction to commit or roll back. If the first transaction rollback, the serializable transaction can proceed; if it commits, the serializable transaction is rolled back with an error message indicating a concurrent update. The application, upon receiving this error, should abort the current transaction and retry it from the beginning.

It's important to note that only updating transactions may need to be retried; read-only transactions won't face serialization conflicts. While Serializable mode ensures a wholly consistent view of the database for each transaction, the application should be prepared to retry transactions when concurrent updates make serial execution impractical.

Serializable mode is recommended when updating transactions are complex enough to potentially yield incorrect results in Read Committed mode. It's typically necessary when a transaction executes several successive commands that must observe identical views of the database.

Below is a sample example implementation of serializable isolation level:


-- Create a sample table
CREATE TABLE account (
    id serial PRIMARY KEY,
    balance integer
);

-- Insert some initial data
INSERT INTO account (balance) VALUES (1000), (2000);

-- Transaction 1
BEGIN;
DECLARE
    new_balance integer;
BEGIN
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    -- Perform the update and store the new balance in a variable
    UPDATE account SET balance = balance - 100 WHERE id = 1 RETURNING balance INTO new_balance;

    -- Check if the balance after the update is negative
    IF new_balance < 0 THEN
        -- Rollback the transaction and raise an exception
        ROLLBACK;
        RAISE EXCEPTION 'Transaction failed: Insufficient balance after deduction';
    END IF;

    -- Perform some other operations

    COMMIT;
EXCEPTION
    WHEN OTHERS THEN
        -- Handle any other exceptions
        ROLLBACK;
        RAISE;
END;

-- Transaction 2
BEGIN;
DECLARE
    new_balance integer;
BEGIN
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    -- Perform the update and store the new balance in a variable
    UPDATE account SET balance = balance + 100 WHERE id = 2 RETURNING balance INTO new_balance;

    -- Check if the balance after the update is negative
    IF new_balance < 0 THEN
        -- Rollback the transaction and raise an exception
        ROLLBACK;
        RAISE EXCEPTION 'Transaction failed: Negative balance not allowed';
    END IF;

    -- Perform some other operations

    COMMIT;
EXCEPTION
    WHEN OTHERS THEN
        -- Handle any other exceptions
        ROLLBACK;
        RAISE;
END;

Enter fullscreen mode Exit fullscreen mode

Explicit Locking
PostgreSQL provides various lock modes to control concurrent access to data in tables. These modes can be used for application-controlled locking in situations where MVCC does not give the desired behavior. Also, most PostgreSQL commands automatically acquire locks of appropriate modes to ensure that referenced tables are not dropped or modified in incompatible ways while the command executes.

There are various techniques used in explicit locking and they include:

  • Table-Level Locks
  • Row-Level Locks
  • Page-Level Locks
  • Deadlocks
  • Advisory Locks

Table-Level Locks
Table-level locks are a type of lock that can be applied to an entire table. These locks are used to control access to a table, preventing multiple transactions from interfering with each other. There are different types of locks in PostgreSQL and they include:

  • Access share lock
  • Row share lock
  • Row exclusive lock
  • Share lock
  • Share update exclusive just to name a few.

Access Share
These locks are acquired on a specific table via the PostgreSQL SELECT command. After acquiring these locks on the table, we are only able to read data from it and not able to edit it.

syntax:

begin;
lock table Username IN ACCESS SHARE MODE;
select * from Username;
Enter fullscreen mode Exit fullscreen mode

Row Share
The SELECT will acquire this PostgreSQL Lock on the table FOR SHARE & SELECT FOR UPDATE statements. This lock conflicts with Exclusive & Access Exclusive modes of Locking.

Syntax:

begin;
lock table Email IN ROW SHARE MODE;
Enter fullscreen mode Exit fullscreen mode

*Row Exclusive
*

The share row exclusive, share, access exclusive, and exclusive modes of PostgreSQL conflict with this lock. The locks on the table will be acquired by UPDATE, DELETE & INSERT statement

syntax:

begin;
lock table Email IN ROW EXCLUSIVE MODE;
Enter fullscreen mode Exit fullscreen mode

Share
This lock is incompatible with the share row exclusive, share, access exclusive, share update exclusive, share, and exclusive modes. This lock mode will obtain locks from PostgreSQL’s create index command.

Syntax:

begin;
lock table Email IN SHARE MODE;

Enter fullscreen mode Exit fullscreen mode

Page Level Locks
Page-level Locks are native to two types. Share & Exclusive locks limit read/write access to table pages in the shared buffer pool and table and row locks. After a row is fetched or updated, these locks are immediately released. Page-level locks are typically not an issue for application developers, although they are listed here for completeness.

Row- Level Locks
Row-level locks are a type of lock that can be applied to individual rows within a table. Row-level locks provide a more granular level of control over concurrent access to data, allowing transactions to lock specific rows rather than the entire table. This can be useful in scenarios where you want to prevent other transactions from modifying specific rows concurrently.

There are two main row-level locks that used in postgresql and the include:

FOR UPDATE
When a transaction acquires a FOR UPDATE lock on a row, it indicates an intention to update the row. Other transactions attempting to acquire a FOR UPDATE lock on the same row will be blocked until the lock is released.
This lock mode is used in situations where a transaction wants to ensure that it has exclusive rights to modify a particular row.

syntax for FOR UPDATE:

BEGIN;
SELECT * FROM your_table WHERE key_column = 'some_value' FOR UPDATE;
-- Perform updates or other operations on the selected row(s)
COMMIT;

FOR SHARE
The FOR SHARE lock mode is less restrictive than FOR UPDATE. It allows multiple transactions to acquire shared locks on the same row simultaneously.
Transactions using FOR SHARE can read the locked row, but they cannot acquire an FOR UPDATE lock on the same row until the lock is released.

syntax for FOR SHARE

BEGIN;
SELECT * FROM your_table WHERE key_column = 'some_value' FOR SHARE;
-- Perform read operations on the selected row(s)
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Data Consistency Checks
Enforcing consistency through Serializable Transactions

When using serializable isolation level, it will avoid creating an unnecessary burden for application programmers if the application software goes through a framework which automatically retries transactions which are rolled back with a serialization failure. It may be a good idea to set default_transaction_isolation to serializable. It would also be wise to take some action to ensure that no other transaction isolation level is used, either inadvertently or to subvert integrity checks, through checks of the transaction isolation level in triggers.

Enforcing consistency through Explicit Blocking Locks

When dealing with non-serializable writes in PostgreSQL, it's crucial to use mechanisms like SELECT FOR UPDATE, SELECT FOR SHARE, or appropriate LOCK TABLE statements to ensure the current validity of a row and protect it from concurrent updates. These techniques help in avoiding conflicts when multiple transactions are attempting to modify the same data simultaneously.

It's important to note that SELECT FOR UPDATE and SELECT FOR SHARE specifically lock the returned rows against concurrent updates, while LOCK TABLE locks the entire table. When migrating applications from other database environments to PostgreSQL, understanding and adapting to these concurrency control mechanisms is essential.

One notable aspect is that using SELECT FOR UPDATE doesn't guarantee that a concurrent transaction won't update or delete a selected row. In PostgreSQL, ensuring such protection involves actually updating the row, even if no changes to the values are needed. SELECT FOR UPDATE temporarily prevents other transactions from acquiring the same lock or executing conflicting operations, but the protection is lifted once the holding transaction commits or rolls back, unless a real UPDATE of the row occurred during the lock.

something to note if you're relying on explicit locking to prevent concurrent changes in PostgreSQL, it's essential to either use Read Committed mode or, in Repeatable Read mode, take precautions to acquire locks before executing queries. In Repeatable Read mode, a transaction's lock ensures that no other transactions modifying the table are active. However, it's crucial to note that if the transaction's snapshot predates obtaining the lock, it might miss some changes that have been committed to the table.

Conclusion
Concurrency control in PostgreSQL plays a pivotal role in upholding data consistency and isolation within a multi-user environment. As a robust relational database management system, PostgreSQL employs a spectrum of techniques, including Multi-version Concurrency Control (MVCC), Transaction Isolation and Explicit Locking to adeptly manage concurrent transactions.

PostgreSQL's concurrency control mechanisms extend beyond mere data manipulation to address critical challenges in complex scenarios. By skillfully managing transactions and utilizing MVCC, PostgreSQL guarantees data consistency and isolation. This is particularly beneficial for applications that require concurrent access to data while upholding the highest standards of integrity. The reliability and versatility of PostgreSQL's concurrency control make it a preferred choice for diverse applications demanding robust concurrent data management.

Top comments (0)